Re: [ccp4bb] PDB format survey?

Warren DeLano Fri, 10 Aug 2007 08:12:26 -0700

Correction:  Scratch what I wrote -- the PDB format does now support a
formal charge field in columns 79-80 (1+,2+,1- etc.).  Hooray!


Thus, adoption of the CONECT valency convention is all it would take for
us to be able to convey chemically-defined structures using the PDB
format.

I'll happily add two-letter chain IDS and hybrid36 to PyMOL but would
really, really like to see valences included as well -- widespread
adoption of that simple convention would represent a major practical
advance for interoperability in structure-based drug discovery.

Cheers,
Warren


> -----Original Message-----
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Warren DeLano
> Sent: Thursday, August 09, 2007 5:53 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] PDB format survey?
> 
> Joe,
> 
> I feel that atom serial numbers are particularly important, 
> since they, combined with CONECT records, provide the only 
> semi-standard convention I know of for reliably encoding bond 
> valences information into a PDB file.  
> 
> single bond = bond listed once
> double bond = bond listed twice
> triple bond = bond listed thrice
> aromatic bond = bond listed four times.
> 
> This is a convention long supported by tools like MacroModel 
> and PyMOL.
> For example, here is formaldehyde, where the bond between 
> atoms 1 and 3 is listed twice:
> 
> HETATM    1  C01 C=O     1       0.000  -0.020   0.000  0.00  0.00
> C
> HETATM    2  N01 C=O     1       1.268  -0.765   0.000  0.00  0.00
> N
> HETATM    3  O02 C=O     1       0.000   1.188   0.000  0.00  0.00
> O
> HETATM    4  H01 C=O     1       1.260  -1.775   0.000  0.00  0.00
> H
> HETATM    5  H02 C=O     1       2.146  -0.266   0.000  0.00  0.00
> H
> HETATM    6  H03 C=O     1      -0.946  -0.562   0.000  0.00  0.00
> H
> CONECT    1    2
> CONECT    1    3
> CONECT    1    3
> CONECT    1    6
> CONECT    2    1    4    5
> CONECT    3    1
> CONECT    3    1
> CONECT    4    2
> CONECT    5    2
> CONECT    6    1
> 
> I second the proposal of treating this field as a unique 
> string rather than a numeric quantity.
> 
> Two letter chain IDs would be fine with me, but I do think we 
> could also make better use of SEGI and/or MODEL to break 
> things up while still preserving the utility of certain other 
> records (SHEET, HELIX, etc.) within their existing column definitions.
> 
> However, we are still lacking a standard way of designating 
> formal charges, So maybe that free column could be better 
> used for encoding a formal charge, such as ["q" "t", "d", 
> "-", "+", "D", "T", "Q"] over the formal charge range of 
> [-4,-3,-2,-1,0,1,2,3,4] -- just an idea :)...
> 
> With valences plus formal charges along with expansion of the 
> cap on atom counts, I think we could support 
> chemically-complete PDB files and push back the date of PDB 
> demise for a few more years!
> 
> A Wiki dedicated to practical PDB file hacks and extensions 
> is a superb idea -- of course, the goal should be to 
> ultimately come up with a single well-defined standard set of 
> hacks we all agree upon by supporting them in our code.
> 
> Cheers,
> Warren 
> 
> -----Original Message-----
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Joe Krahn
> Sent: Thursday, August 09, 2007 1:15 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] PDB format survey?
> 
> Edward A. Berry wrote:
> > Ethan A Merritt wrote:
> >> On Wednesday 08 August 2007 20:47, Ralf W. Grosse-Kunstleve wrote:
> >>> Implementations to generate intuitive, maximally backward 
> compatible 
> >>> numbers can be found here:
> >>>
> >>>   http://cci.lbl.gov/hybrid_36/
> >>
> >> From that URL:
> >>
> >> ATOM  99998  SD  MET L9999      48.231 -64.383  -9.257  1.00 
> >> 11.54           S
> >> ATOM  99999  CE  MET L9999      49.398 -63.242 -10.211  1.00 
> >> 14.60           C
> >> ATOM  A0000  N   VAL LA000      52.228 -67.689 -12.196  1.00  
> >> 8.76           N
> >> ATOM  A0001  CA  VAL LA000      53.657 -67.774 -12.458  1.00  
> >> 3.40           C
> >>
> >> Could you please clarify this example?
> >> Is that "A0000" a hexidecimal number, or is it a decimal 
> number that 
> >> just happens to have an "A" in front of it?
> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of
> hexadecimal,
> >> so I'm guessing it's the former.  But the example is not clear.
> >>
> > I'm guessing the former also. A 5-digit hex number would not be 
> > backwards compatible. With this system legacy programs can 
> still read 
> > the files with 99999 atoms or less, and anything more than 
> that they 
> > couldn't have handled anyway. Very nice!
> > 
> > Ed
> I still prefer the idea of just truncating serial numbers, 
> and using an alternative to CONECT for large structures. 
> Almost nobody uses atomSerial, but it still may be parsed as 
> an integer, where the above idea could cause errors. 
> Furthermore, non-digit encoding still results in another 
> maximum, whereas truncating the numbers has no limit. The 
> truncated serial number is ambiguous only if taken out of 
> context of the
> 
> complete PDB file, but PDB files are by design sequential.
> 
> Another alternative is to define an "atom-serial offset" 
> record. It can define a number which is added to all 
> subsequently parsed atom serial numbers. Every ATOM/HETATM 
> record is then perfectly valid to an older program, but may 
> only be able to handle one chunk of atoms at once.
> 
> Likewise, I like the idea of a ChainID map record, which maps 
> single-letter chainID's to larger named ID's. Each existing 
> PDB record can then be used unchanged, but files can then 
> support very long ChainID
> 
> strings. The only disadvantage is that old PDB readers will 
> get confused, but at least the individual record formats are 
> not changed in a way that makes them crash.
> 
> I think that keeping the old record definitions completely 
> unchanged are
> 
> an important feature to any PDB format revisions. Even if we 
> continue to
> 
> use it for another 20 years, it's primary advantage is that 
> it is a well-established "legacy" format. If we change 
> existing records, we break that one useful feature. 
> Therefore, I think that any changes to existing records 
> should be limited to using characters positions that are 
> currently. (The one exception is that we need to make the HEADER Y2K
> 
> compatible by using a 4-digit year, which means the existing 
> decade+year
> 
> characters have to be moved.)
> 
> Of course, the more important issue is that the final 
> decision needs community involvement, and not just a decision 
> by a small group of RCSB or wwPDB administrators.
> 
> Maybe it would be useful to set up a PDB format "Wiki" where 
> alternatives can be defined, along with advantages and 
> disadvantages. If
> 
> there was sufficient agreement, it could be used as a 
> community tool to put together a draft revision of the next 
> PDB format. With any luck, some RCSB or wwPDB people would 
> participate as well.
> 
> Joe Krahn
> 
> 
> 
>

Re: [ccp4bb] PDB format survey?

Reply via email to