Re: [ccp4bb] PDB format survey?

Warren DeLano Fri, 10 Aug 2007 08:30:43 -0700

That's easy:  Backward compatibility, both in terms of old programs and
old data.


The idea is to maintain as much interoperability as possible.

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Santarsiero, Bernard D.
> Sent: Friday, August 10, 2007 8:17 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: [ccp4bb] PDB format survey?
> 
> Can I ask a dumb question? Just curious...
> 
> Why are we now limited to 80 "columns"? In the old days, that 
> was a limit with Fortran and punched cards. Can a "record" 
> (whatever it's called now) be as long as we wish? Instead of 
> compressing a lot on a PDB record line, can we lengthen it to 
> 130 columns?
> 
> 
> Bernie Santarsiero
> 
> 
> On Fri, August 10, 2007 10:10 am, Warren DeLano wrote:
> > Correction:  Scratch what I wrote -- the PDB format does 
> now support a 
> > formal charge field in columns 79-80 (1+,2+,1- etc.).  Hooray!
> >
> > Thus, adoption of the CONECT valency convention is all it 
> would take 
> > for us to be able to convey chemically-defined structures using the 
> > PDB format.
> >
> > I'll happily add two-letter chain IDS and hybrid36 to PyMOL 
> but would 
> > really, really like to see valences included as well -- widespread 
> > adoption of that simple convention would represent a major 
> practical 
> > advance for interoperability in structure-based drug discovery.
> >
> > Cheers,
> > Warren
> >
> >
> >> -----Original Message-----
> >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] 
> On Behalf Of 
> >> Warren DeLano
> >> Sent: Thursday, August 09, 2007 5:53 PM
> >> To: CCP4BB@JISCMAIL.AC.UK
> >> Subject: Re: [ccp4bb] PDB format survey?
> >>
> >> Joe,
> >>
> >> I feel that atom serial numbers are particularly important, since 
> >> they, combined with CONECT records, provide the only semi-standard 
> >> convention I know of for reliably encoding bond valences 
> information 
> >> into a PDB file.
> >>
> >> single bond = bond listed once
> >> double bond = bond listed twice
> >> triple bond = bond listed thrice
> >> aromatic bond = bond listed four times.
> >>
> >> This is a convention long supported by tools like MacroModel and 
> >> PyMOL.
> >> For example, here is formaldehyde, where the bond between 
> atoms 1 and 
> >> 3 is listed twice:
> >>
> >> HETATM    1  C01 C=O     1       0.000  -0.020   0.000  0.00  0.00
> >> C
> >> HETATM    2  N01 C=O     1       1.268  -0.765   0.000  0.00  0.00
> >> N
> >> HETATM    3  O02 C=O     1       0.000   1.188   0.000  0.00  0.00
> >> O
> >> HETATM    4  H01 C=O     1       1.260  -1.775   0.000  0.00  0.00
> >> H
> >> HETATM    5  H02 C=O     1       2.146  -0.266   0.000  0.00  0.00
> >> H
> >> HETATM    6  H03 C=O     1      -0.946  -0.562   0.000  0.00  0.00
> >> H
> >> CONECT    1    2
> >> CONECT    1    3
> >> CONECT    1    3
> >> CONECT    1    6
> >> CONECT    2    1    4    5
> >> CONECT    3    1
> >> CONECT    3    1
> >> CONECT    4    2
> >> CONECT    5    2
> >> CONECT    6    1
> >>
> >> I second the proposal of treating this field as a unique string 
> >> rather than a numeric quantity.
> >>
> >> Two letter chain IDs would be fine with me, but I do think 
> we could 
> >> also make better use of SEGI and/or MODEL to break things up while 
> >> still preserving the utility of certain other records 
> (SHEET, HELIX, 
> >> etc.) within their existing column definitions.
> >>
> >> However, we are still lacking a standard way of designating formal 
> >> charges, So maybe that free column could be better used 
> for encoding 
> >> a formal charge, such as ["q" "t", "d", "-", "+", "D", 
> "T", "Q"] over 
> >> the formal charge range of [-4,-3,-2,-1,0,1,2,3,4] -- just an idea 
> >> :)...
> >>
> >> With valences plus formal charges along with expansion of 
> the cap on 
> >> atom counts, I think we could support chemically-complete 
> PDB files 
> >> and push back the date of PDB demise for a few more years!
> >>
> >> A Wiki dedicated to practical PDB file hacks and extensions is a 
> >> superb idea -- of course, the goal should be to ultimately come up 
> >> with a single well-defined standard set of hacks we all 
> agree upon by 
> >> supporting them in our code.
> >>
> >> Cheers,
> >> Warren
> >>
> >> -----Original Message-----
> >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] 
> On Behalf Of 
> >> Joe Krahn
> >> Sent: Thursday, August 09, 2007 1:15 PM
> >> To: CCP4BB@JISCMAIL.AC.UK
> >> Subject: Re: [ccp4bb] PDB format survey?
> >>
> >> Edward A. Berry wrote:
> >> > Ethan A Merritt wrote:
> >> >> On Wednesday 08 August 2007 20:47, Ralf W. 
> Grosse-Kunstleve wrote:
> >> >>> Implementations to generate intuitive, maximally backward
> >> compatible
> >> >>> numbers can be found here:
> >> >>>
> >> >>>   http://cci.lbl.gov/hybrid_36/
> >> >>
> >> >> From that URL:
> >> >>
> >> >> ATOM  99998  SD  MET L9999      48.231 -64.383  -9.257  1.00
> >> >> 11.54           S
> >> >> ATOM  99999  CE  MET L9999      49.398 -63.242 -10.211  1.00
> >> >> 14.60           C
> >> >> ATOM  A0000  N   VAL LA000      52.228 -67.689 -12.196  1.00
> >> >> 8.76           N
> >> >> ATOM  A0001  CA  VAL LA000      53.657 -67.774 -12.458  1.00
> >> >> 3.40           C
> >> >>
> >> >> Could you please clarify this example?
> >> >> Is that "A0000" a hexidecimal number, or is it a decimal
> >> number that
> >> >> just happens to have an "A" in front of it?
> >> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of
> >> hexadecimal,
> >> >> so I'm guessing it's the former.  But the example is not clear.
> >> >>
> >> > I'm guessing the former also. A 5-digit hex number would not be 
> >> > backwards compatible. With this system legacy programs can
> >> still read
> >> > the files with 99999 atoms or less, and anything more than
> >> that they
> >> > couldn't have handled anyway. Very nice!
> >> >
> >> > Ed
> >> I still prefer the idea of just truncating serial numbers, 
> and using 
> >> an alternative to CONECT for large structures.
> >> Almost nobody uses atomSerial, but it still may be parsed as an 
> >> integer, where the above idea could cause errors.
> >> Furthermore, non-digit encoding still results in another maximum, 
> >> whereas truncating the numbers has no limit. The truncated serial 
> >> number is ambiguous only if taken out of context of the
> >>
> >> complete PDB file, but PDB files are by design sequential.
> >>
> >> Another alternative is to define an "atom-serial offset"
> >> record. It can define a number which is added to all subsequently 
> >> parsed atom serial numbers. Every ATOM/HETATM record is then 
> >> perfectly valid to an older program, but may only be able 
> to handle 
> >> one chunk of atoms at once.
> >>
> >> Likewise, I like the idea of a ChainID map record, which maps 
> >> single-letter chainID's to larger named ID's. Each existing PDB 
> >> record can then be used unchanged, but files can then support very 
> >> long ChainID
> >>
> >> strings. The only disadvantage is that old PDB readers will get 
> >> confused, but at least the individual record formats are 
> not changed 
> >> in a way that makes them crash.
> >>
> >> I think that keeping the old record definitions completely 
> unchanged 
> >> are
> >>
> >> an important feature to any PDB format revisions. Even if 
> we continue 
> >> to
> >>
> >> use it for another 20 years, it's primary advantage is 
> that it is a 
> >> well-established "legacy" format. If we change existing 
> records, we 
> >> break that one useful feature.
> >> Therefore, I think that any changes to existing records should be 
> >> limited to using characters positions that are currently. (The one 
> >> exception is that we need to make the HEADER Y2K
> >>
> >> compatible by using a 4-digit year, which means the existing
> >> decade+year
> >>
> >> characters have to be moved.)
> >>
> >> Of course, the more important issue is that the final 
> decision needs 
> >> community involvement, and not just a decision by a small group of 
> >> RCSB or wwPDB administrators.
> >>
> >> Maybe it would be useful to set up a PDB format "Wiki" where 
> >> alternatives can be defined, along with advantages and 
> disadvantages. 
> >> If
> >>
> >> there was sufficient agreement, it could be used as a 
> community tool 
> >> to put together a draft revision of the next PDB format. With any 
> >> luck, some RCSB or wwPDB people would participate as well.
> >>
> >> Joe Krahn
> >>
> >>
> >>
> >>
> >
> 
> 
> 
>

Re: [ccp4bb] PDB format survey?

Reply via email to