***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***


On Wednesday 10 January 2007 23:52, Eric Bennett wrote:
> ***  For details on how to be removed from this list visit the  ***
> ***          CCP4 home page http://www.ccp4.ac.uk         ***
> 
> 
> Ethan Merritt wrote:
> 
> >Because it is not necessary to do so.  Storing every H coordinate 
> >generated by the
> >riding model adds no information to the PDB file that is not already 
> >present in the
> >parsimonious description provided by the header record:
> >REMARK   3  HYDROGENS HAVE BEEN ADDED IN THE RIDING POSITIONS
> 
> Hopefully that is tongue in cheek.  If this is your reasoning, why 
> build a PDB file at all?  It contains no information that isn't 
> already present in your original diffraction images plus a basic book 
> on chemical bonding.

Huh?  No, I am absolutely serious.
The PDB file is (or should be, if done right) a description of your 
crystallographic
model.  That is, it contains enough information to re-construct the model you
refined.  In principle it contains a value for every parameter of your model, 
although as we have been discussing some of these are present only by
indirect reference to a well-known set of values.

Some of the parameters in your model are the individual X,Y,Z,B values.
Some of the parameters are global, and apply to all atoms in the structure.
These are by convention listed in the header records, rather than being
distributed over each individual atom.  You could do it either way, but putting
one entry in the header is more parsimonious than adding the same information
in every single ATOM record.  For example, many [most?] refinements include
an overall anisotropic B correction.  By convention this is listed separately
in the header, although it applies to every atom in the structure.  Instead of
doing it this way, we could replace every individual ATOM record with a pair
of ATOM/ANISOU records that turn the individual B into B+Uij(overall).
But this would double the size of the file without actually adding information.

The description of the riding hydrogen model is exactly analogous.
We could have refmac dump every single H atom to the output file
(it's a toggled option in ccp4i).  But this approximately doubles the size of 
the
output file without actually adding information. So instead we put a single
header records that provides the rule for re-generating these positions.


> st non-crystallographer scientists will know 
> what it means if they see hydrogens connected to carbons when they 
> pull up a structure.  If they see:
> REMARK   3  HYDROGENS HAVE BEEN ADDED IN THE RIDING POSITIONS
> 
> they are not going to know what it means, even assuming they know to 
> look for the REMARK 3 line in the PDB file in the first place.

Too bad.
Programs that care, such as docking programs, know to regenerate the
hydrogens before proceeding.

> Data has to be provided in a representation suitable for the target 
> audience,

That is a tangential issue.  When we as crystallographers deposit a
refined model, either as a PDB file or an mmCIF file, we are archiving
the work we have actually done.  It would be improper to omit crucial
information from the file.   Indevidual users who retrieve this archival
model for their own purposes may or may not need every bit of information
it contains, and may or may not understand the bits they *do* need,
but that does absolve us from archiving the complete model. 

> not left in an obscure format just because converting it to  
> a more easily digested form "adds no information".  The target 
> audience for protein structures should be larger than other protein 
> crystallographers.

That may be so, but it is an argument for having a different format
for different audiences, not an argument for 'dumbing down' the format
used to deposit the model in the first place.

Reply via email to