On 20/03/2012 5:10 AM, John Ladasky wrote:
I am trying to import PDB file snapshots from a GROMACS
4.5.4-generated trajectory into other software tools -- specifically,
Biopython. I generate the snapshots using trjconv in GROMACS.
I am interested in the water molecules from my solvent box, so I do
not discard them. When trjconv prompts me to "Select group for
output", I select "Group 0 (System)". However, in downstream
applications, I do want to differentiate the solvent atoms from my
protein polymer, and ensure that each group of atoms (protein atoms,
solvent atoms) is placed in a distinct category.
Biopython's PDB file parser is not cooperating with me. It is
attempting to append the water molecules as additional RESIDUES of my
polymer. Obviously, this is incorrect. So, where's the problem,
Biopython or GROMACS? Looking through the PDB file specification,
version 3.2, I found the following passage:
"The ATOM records present the atomic coordinates for standard amino
acids and nucleotides. They also present the occupancy and temperature
factor for each atom. Non-polymer chemical coordinates use the HETATM
record type."
If I am reading this correctly, my solvent atoms should be tagged as
"HETATM" rather than as "ATOM". But the files that trjconv produces
label every atom as "ATOM", whether it's an atom from the protein or
an atom from a water molecule.
Is there any way to make trjconv use "HETATM" for solvent atoms? I do
not see anything in the trjconv documentation. I also do not
understand why trjconv might produce PDB files which do not adhere to
the standard. There may be a good reason, I don't know.
Strict adherence by software to the PDB format is something of an
exception rather than the rule. Often you will see TER records and/or
chain IDs used to differentiate different parts of the same system. For
this kind of reason, most software that claims to read PDB should have
some way of making subset selections that are not dependent on the
contents of the PDB file. You should consult the Biopython documentation
to see how it likes to interpret things, and how you can customize that.
trjconv cannot attempt to guess how all possible pieces of software
might like to interpret its results, and so it produces something
generic and plausible. Depending how flexible Biopython is, you may need
to use a shell script to post-process the trjconv output to do something
like Tsjerk suggested, or insert TER records, or change chain IDs. Do
read how Biopython works, first.
Mark
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists