Re: [ccp4bb] PDB format survey?

Joe Krahn Thu, 09 Aug 2007 13:13:09 -0700

Edward A. Berry wrote:

Ethan A Merritt wrote:

On Wednesday 08 August 2007 20:47, Ralf W. Grosse-Kunstleve wrote:
Implementations to generate intuitive, maximally backward compatible
numbers can be found here:
  http://cci.lbl.gov/hybrid_36/
From that URL:
ATOM 99998 SD MET L9999 48.231 -64.383 -9.257 1.0011.54 SATOM 99999 CE MET L9999 49.398 -63.242 -10.211 1.0014.60 CATOM A0000 N VAL LA000 52.228 -67.689 -12.196 1.008.76 NATOM A0001 CA VAL LA000 53.657 -67.774 -12.458 1.003.40 C
Could you please clarify this example?
Is that "A0000" a hexidecimal number, or is it a decimal number
that just happens to have an "A" in front of it?
[A-Z][0-9999] gives a larger range of values than 5 bytes of hexadecimal,
so I'm guessing it's the former.  But the example is not clear.

I'm guessing the former also. A 5-digit hex number would not be
backwards compatible. With this system legacy programs can still
read the files with 99999 atoms or less, and anything more than
that they couldn't have handled anyway. Very nice!

Ed

I still prefer the idea of just truncating serial numbers, and using analternative to CONECT for large structures. Almost nobody usesatomSerial, but it still may be parsed as an integer, where the aboveidea could cause errors. Furthermore, non-digit encoding still resultsin another maximum, whereas truncating the numbers has no limit. Thetruncated serial number is ambiguous only if taken out of context of thecomplete PDB file, but PDB files are by design sequential.

Another alternative is to define an "atom-serial offset" record. It candefine a number which is added to all subsequently parsed atom serialnumbers. Every ATOM/HETATM record is then perfectly valid to an olderprogram, but may only be able to handle one chunk of atoms at once.

Likewise, I like the idea of a ChainID map record, which mapssingle-letter chainID's to larger named ID's. Each existing PDB recordcan then be used unchanged, but files can then support very long ChainIDstrings. The only disadvantage is that old PDB readers will getconfused, but at least the individual record formats are not changed ina way that makes them crash.

I think that keeping the old record definitions completely unchanged arean important feature to any PDB format revisions. Even if we continue touse it for another 20 years, it's primary advantage is that it is awell-established "legacy" format. If we change existing records, webreak that one useful feature. Therefore, I think that any changes toexisting records should be limited to using characters positions thatare currently. (The one exception is that we need to make the HEADER Y2Kcompatible by using a 4-digit year, which means the existing decade+yearcharacters have to be moved.)

Of course, the more important issue is that the final decision needscommunity involvement, and not just a decision by a small group of RCSBor wwPDB administrators.

Maybe it would be useful to set up a PDB format "Wiki" wherealternatives can be defined, along with advantages and disadvantages. Ifthere was sufficient agreement, it could be used as a community tool toput together a draft revision of the next PDB format. With any luck,some RCSB or wwPDB people would participate as well.


Joe Krahn

Re: [ccp4bb] PDB format survey?

Reply via email to