Re: [ccp4bb] PDB format survey?

Herbert J. Bernstein Fri, 10 Aug 2007 08:51:52 -0700

Actually, everything proposed with break some software.
The real question is one of how much value the
community gains from how much of a change.  mmCIF
was one proposal that would "solve" the problem,
but which met a lot of resistance.  The change
in atom serial numbers to strings is another
possibility.  If you want something in between
that stretches the line, but preserves the
programming style, take a look at:


  http://biomol.dowling.edu/WPDB/

that extends the line and handles 999,999,999 atoms
and 10 character chain names.

  I apologiza for the server that provides sample
runs from the page being down.  We had a couple
of bad power failures, and that machine is
not back in service yet, but the spec is
available.

  Regards,
    Herbet J. Bernstein
=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 [EMAIL PROTECTED]
=====================================================

On Fri, 10 Aug 2007, Warren DeLano wrote:

> That's easy:  Backward compatibility, both in terms of old programs and
> old data.
>
> The idea is to maintain as much interoperability as possible.
>
> > -----Original Message-----
> > From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On
> > Behalf Of Santarsiero, Bernard D.
> > Sent: Friday, August 10, 2007 8:17 AM
> > To: CCP4BB@JISCMAIL.AC.UK
> > Subject: [ccp4bb] PDB format survey?
> >
> > Can I ask a dumb question? Just curious...
> >
> > Why are we now limited to 80 "columns"? In the old days, that
> > was a limit with Fortran and punched cards. Can a "record"
> > (whatever it's called now) be as long as we wish? Instead of
> > compressing a lot on a PDB record line, can we lengthen it to
> > 130 columns?
> >
> >
> > Bernie Santarsiero
> >
> >
> > On Fri, August 10, 2007 10:10 am, Warren DeLano wrote:
> > > Correction:  Scratch what I wrote -- the PDB format does
> > now support a
> > > formal charge field in columns 79-80 (1+,2+,1- etc.).  Hooray!
> > >
> > > Thus, adoption of the CONECT valency convention is all it
> > would take
> > > for us to be able to convey chemically-defined structures using the
> > > PDB format.
> > >
> > > I'll happily add two-letter chain IDS and hybrid36 to PyMOL
> > but would
> > > really, really like to see valences included as well -- widespread
> > > adoption of that simple convention would represent a major
> > practical
> > > advance for interoperability in structure-based drug discovery.
> > >
> > > Cheers,
> > > Warren
> > >
> > >
> > >> -----Original Message-----
> > >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED]
> > On Behalf Of
> > >> Warren DeLano
> > >> Sent: Thursday, August 09, 2007 5:53 PM
> > >> To: CCP4BB@JISCMAIL.AC.UK
> > >> Subject: Re: [ccp4bb] PDB format survey?
> > >>
> > >> Joe,
> > >>
> > >> I feel that atom serial numbers are particularly important, since
> > >> they, combined with CONECT records, provide the only semi-standard
> > >> convention I know of for reliably encoding bond valences
> > information
> > >> into a PDB file.
> > >>
> > >> single bond = bond listed once
> > >> double bond = bond listed twice
> > >> triple bond = bond listed thrice
> > >> aromatic bond = bond listed four times.
> > >>
> > >> This is a convention long supported by tools like MacroModel and
> > >> PyMOL.
> > >> For example, here is formaldehyde, where the bond between
> > atoms 1 and
> > >> 3 is listed twice:
> > >>
> > >> HETATM    1  C01 C=O     1       0.000  -0.020   0.000  0.00  0.00
> > >> C
> > >> HETATM    2  N01 C=O     1       1.268  -0.765   0.000  0.00  0.00
> > >> N
> > >> HETATM    3  O02 C=O     1       0.000   1.188   0.000  0.00  0.00
> > >> O
> > >> HETATM    4  H01 C=O     1       1.260  -1.775   0.000  0.00  0.00
> > >> H
> > >> HETATM    5  H02 C=O     1       2.146  -0.266   0.000  0.00  0.00
> > >> H
> > >> HETATM    6  H03 C=O     1      -0.946  -0.562   0.000  0.00  0.00
> > >> H
> > >> CONECT    1    2
> > >> CONECT    1    3
> > >> CONECT    1    3
> > >> CONECT    1    6
> > >> CONECT    2    1    4    5
> > >> CONECT    3    1
> > >> CONECT    3    1
> > >> CONECT    4    2
> > >> CONECT    5    2
> > >> CONECT    6    1
> > >>
> > >> I second the proposal of treating this field as a unique string
> > >> rather than a numeric quantity.
> > >>
> > >> Two letter chain IDs would be fine with me, but I do think
> > we could
> > >> also make better use of SEGI and/or MODEL to break things up while
> > >> still preserving the utility of certain other records
> > (SHEET, HELIX,
> > >> etc.) within their existing column definitions.
> > >>
> > >> However, we are still lacking a standard way of designating formal
> > >> charges, So maybe that free column could be better used
> > for encoding
> > >> a formal charge, such as ["q" "t", "d", "-", "+", "D",
> > "T", "Q"] over
> > >> the formal charge range of [-4,-3,-2,-1,0,1,2,3,4] -- just an idea
> > >> :)...
> > >>
> > >> With valences plus formal charges along with expansion of
> > the cap on
> > >> atom counts, I think we could support chemically-complete
> > PDB files
> > >> and push back the date of PDB demise for a few more years!
> > >>
> > >> A Wiki dedicated to practical PDB file hacks and extensions is a
> > >> superb idea -- of course, the goal should be to ultimately come up
> > >> with a single well-defined standard set of hacks we all
> > agree upon by
> > >> supporting them in our code.
> > >>
> > >> Cheers,
> > >> Warren
> > >>
> > >> -----Original Message-----
> > >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED]
> > On Behalf Of
> > >> Joe Krahn
> > >> Sent: Thursday, August 09, 2007 1:15 PM
> > >> To: CCP4BB@JISCMAIL.AC.UK
> > >> Subject: Re: [ccp4bb] PDB format survey?
> > >>
> > >> Edward A. Berry wrote:
> > >> > Ethan A Merritt wrote:
> > >> >> On Wednesday 08 August 2007 20:47, Ralf W.
> > Grosse-Kunstleve wrote:
> > >> >>> Implementations to generate intuitive, maximally backward
> > >> compatible
> > >> >>> numbers can be found here:
> > >> >>>
> > >> >>>   http://cci.lbl.gov/hybrid_36/
> > >> >>
> > >> >> From that URL:
> > >> >>
> > >> >> ATOM  99998  SD  MET L9999      48.231 -64.383  -9.257  1.00
> > >> >> 11.54           S
> > >> >> ATOM  99999  CE  MET L9999      49.398 -63.242 -10.211  1.00
> > >> >> 14.60           C
> > >> >> ATOM  A0000  N   VAL LA000      52.228 -67.689 -12.196  1.00
> > >> >> 8.76           N
> > >> >> ATOM  A0001  CA  VAL LA000      53.657 -67.774 -12.458  1.00
> > >> >> 3.40           C
> > >> >>
> > >> >> Could you please clarify this example?
> > >> >> Is that "A0000" a hexidecimal number, or is it a decimal
> > >> number that
> > >> >> just happens to have an "A" in front of it?
> > >> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of
> > >> hexadecimal,
> > >> >> so I'm guessing it's the former.  But the example is not clear.
> > >> >>
> > >> > I'm guessing the former also. A 5-digit hex number would not be
> > >> > backwards compatible. With this system legacy programs can
> > >> still read
> > >> > the files with 99999 atoms or less, and anything more than
> > >> that they
> > >> > couldn't have handled anyway. Very nice!
> > >> >
> > >> > Ed
> > >> I still prefer the idea of just truncating serial numbers,
> > and using
> > >> an alternative to CONECT for large structures.
> > >> Almost nobody uses atomSerial, but it still may be parsed as an
> > >> integer, where the above idea could cause errors.
> > >> Furthermore, non-digit encoding still results in another maximum,
> > >> whereas truncating the numbers has no limit. The truncated serial
> > >> number is ambiguous only if taken out of context of the
> > >>
> > >> complete PDB file, but PDB files are by design sequential.
> > >>
> > >> Another alternative is to define an "atom-serial offset"
> > >> record. It can define a number which is added to all subsequently
> > >> parsed atom serial numbers. Every ATOM/HETATM record is then
> > >> perfectly valid to an older program, but may only be able
> > to handle
> > >> one chunk of atoms at once.
> > >>
> > >> Likewise, I like the idea of a ChainID map record, which maps
> > >> single-letter chainID's to larger named ID's. Each existing PDB
> > >> record can then be used unchanged, but files can then support very
> > >> long ChainID
> > >>
> > >> strings. The only disadvantage is that old PDB readers will get
> > >> confused, but at least the individual record formats are
> > not changed
> > >> in a way that makes them crash.
> > >>
> > >> I think that keeping the old record definitions completely
> > unchanged
> > >> are
> > >>
> > >> an important feature to any PDB format revisions. Even if
> > we continue
> > >> to
> > >>
> > >> use it for another 20 years, it's primary advantage is
> > that it is a
> > >> well-established "legacy" format. If we change existing
> > records, we
> > >> break that one useful feature.
> > >> Therefore, I think that any changes to existing records should be
> > >> limited to using characters positions that are currently. (The one
> > >> exception is that we need to make the HEADER Y2K
> > >>
> > >> compatible by using a 4-digit year, which means the existing
> > >> decade+year
> > >>
> > >> characters have to be moved.)
> > >>
> > >> Of course, the more important issue is that the final
> > decision needs
> > >> community involvement, and not just a decision by a small group of
> > >> RCSB or wwPDB administrators.
> > >>
> > >> Maybe it would be useful to set up a PDB format "Wiki" where
> > >> alternatives can be defined, along with advantages and
> > disadvantages.
> > >> If
> > >>
> > >> there was sufficient agreement, it could be used as a
> > community tool
> > >> to put together a draft revision of the next PDB format. With any
> > >> luck, some RCSB or wwPDB people would participate as well.
> > >>
> > >> Joe Krahn
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
> >
> >
>

Re: [ccp4bb] PDB format survey?

Reply via email to