On 20/03/2012 5:10 AM, John Ladasky wrote:
I am trying to import PDB file snapshots from a GROMACS 4.5.4-generated trajectory into other software tools -- specifically, Biopython. I generate the snapshots using trjconv in GROMACS.

I am interested in the water molecules from my solvent box, so I do not discard them. When trjconv prompts me to "Select group for output", I select "Group 0 (System)". However, in downstream applications, I do want to differentiate the solvent atoms from my protein polymer, and ensure that each group of atoms (protein atoms, solvent atoms) is placed in a distinct category.

Biopython's PDB file parser is not cooperating with me. It is attempting to append the water molecules as additional RESIDUES of my polymer. Obviously, this is incorrect. So, where's the problem, Biopython or GROMACS? Looking through the PDB file specification, version 3.2, I found the following passage:

"The ATOM records present the atomic coordinates for standard amino acids and nucleotides. They also present the occupancy and temperature factor for each atom. Non-polymer chemical coordinates use the HETATM record type."

If I am reading this correctly, my solvent atoms should be tagged as "HETATM" rather than as "ATOM". But the files that trjconv produces label every atom as "ATOM", whether it's an atom from the protein or an atom from a water molecule.

Is there any way to make trjconv use "HETATM" for solvent atoms? I do not see anything in the trjconv documentation. I also do not understand why trjconv might produce PDB files which do not adhere to the standard. There may be a good reason, I don't know.

Strict adherence by software to the PDB format is something of an exception rather than the rule. Often you will see TER records and/or chain IDs used to differentiate different parts of the same system. For this kind of reason, most software that claims to read PDB should have some way of making subset selections that are not dependent on the contents of the PDB file. You should consult the Biopython documentation to see how it likes to interpret things, and how you can customize that.

trjconv cannot attempt to guess how all possible pieces of software might like to interpret its results, and so it produces something generic and plausible. Depending how flexible Biopython is, you may need to use a shell script to post-process the trjconv output to do something like Tsjerk suggested, or insert TER records, or change chain IDs. Do read how Biopython works, first.

Mark
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to