Actually, everything proposed with break some software. The real question is one of how much value the community gains from how much of a change. mmCIF was one proposal that would "solve" the problem, but which met a lot of resistance. The change in atom serial numbers to strings is another possibility. If you want something in between that stretches the line, but preserves the programming style, take a look at:
http://biomol.dowling.edu/WPDB/ that extends the line and handles 999,999,999 atoms and 10 character chain names. I apologiza for the server that provides sample runs from the page being down. We had a couple of bad power failures, and that machine is not back in service yet, but the spec is available. Regards, Herbet J. Bernstein ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 [EMAIL PROTECTED] ===================================================== On Fri, 10 Aug 2007, Warren DeLano wrote: > That's easy: Backward compatibility, both in terms of old programs and > old data. > > The idea is to maintain as much interoperability as possible. > > > -----Original Message----- > > From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On > > Behalf Of Santarsiero, Bernard D. > > Sent: Friday, August 10, 2007 8:17 AM > > To: CCP4BB@JISCMAIL.AC.UK > > Subject: [ccp4bb] PDB format survey? > > > > Can I ask a dumb question? Just curious... > > > > Why are we now limited to 80 "columns"? In the old days, that > > was a limit with Fortran and punched cards. Can a "record" > > (whatever it's called now) be as long as we wish? Instead of > > compressing a lot on a PDB record line, can we lengthen it to > > 130 columns? > > > > > > Bernie Santarsiero > > > > > > On Fri, August 10, 2007 10:10 am, Warren DeLano wrote: > > > Correction: Scratch what I wrote -- the PDB format does > > now support a > > > formal charge field in columns 79-80 (1+,2+,1- etc.). Hooray! > > > > > > Thus, adoption of the CONECT valency convention is all it > > would take > > > for us to be able to convey chemically-defined structures using the > > > PDB format. > > > > > > I'll happily add two-letter chain IDS and hybrid36 to PyMOL > > but would > > > really, really like to see valences included as well -- widespread > > > adoption of that simple convention would represent a major > > practical > > > advance for interoperability in structure-based drug discovery. > > > > > > Cheers, > > > Warren > > > > > > > > >> -----Original Message----- > > >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] > > On Behalf Of > > >> Warren DeLano > > >> Sent: Thursday, August 09, 2007 5:53 PM > > >> To: CCP4BB@JISCMAIL.AC.UK > > >> Subject: Re: [ccp4bb] PDB format survey? > > >> > > >> Joe, > > >> > > >> I feel that atom serial numbers are particularly important, since > > >> they, combined with CONECT records, provide the only semi-standard > > >> convention I know of for reliably encoding bond valences > > information > > >> into a PDB file. > > >> > > >> single bond = bond listed once > > >> double bond = bond listed twice > > >> triple bond = bond listed thrice > > >> aromatic bond = bond listed four times. > > >> > > >> This is a convention long supported by tools like MacroModel and > > >> PyMOL. > > >> For example, here is formaldehyde, where the bond between > > atoms 1 and > > >> 3 is listed twice: > > >> > > >> HETATM 1 C01 C=O 1 0.000 -0.020 0.000 0.00 0.00 > > >> C > > >> HETATM 2 N01 C=O 1 1.268 -0.765 0.000 0.00 0.00 > > >> N > > >> HETATM 3 O02 C=O 1 0.000 1.188 0.000 0.00 0.00 > > >> O > > >> HETATM 4 H01 C=O 1 1.260 -1.775 0.000 0.00 0.00 > > >> H > > >> HETATM 5 H02 C=O 1 2.146 -0.266 0.000 0.00 0.00 > > >> H > > >> HETATM 6 H03 C=O 1 -0.946 -0.562 0.000 0.00 0.00 > > >> H > > >> CONECT 1 2 > > >> CONECT 1 3 > > >> CONECT 1 3 > > >> CONECT 1 6 > > >> CONECT 2 1 4 5 > > >> CONECT 3 1 > > >> CONECT 3 1 > > >> CONECT 4 2 > > >> CONECT 5 2 > > >> CONECT 6 1 > > >> > > >> I second the proposal of treating this field as a unique string > > >> rather than a numeric quantity. > > >> > > >> Two letter chain IDs would be fine with me, but I do think > > we could > > >> also make better use of SEGI and/or MODEL to break things up while > > >> still preserving the utility of certain other records > > (SHEET, HELIX, > > >> etc.) within their existing column definitions. > > >> > > >> However, we are still lacking a standard way of designating formal > > >> charges, So maybe that free column could be better used > > for encoding > > >> a formal charge, such as ["q" "t", "d", "-", "+", "D", > > "T", "Q"] over > > >> the formal charge range of [-4,-3,-2,-1,0,1,2,3,4] -- just an idea > > >> :)... > > >> > > >> With valences plus formal charges along with expansion of > > the cap on > > >> atom counts, I think we could support chemically-complete > > PDB files > > >> and push back the date of PDB demise for a few more years! > > >> > > >> A Wiki dedicated to practical PDB file hacks and extensions is a > > >> superb idea -- of course, the goal should be to ultimately come up > > >> with a single well-defined standard set of hacks we all > > agree upon by > > >> supporting them in our code. > > >> > > >> Cheers, > > >> Warren > > >> > > >> -----Original Message----- > > >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] > > On Behalf Of > > >> Joe Krahn > > >> Sent: Thursday, August 09, 2007 1:15 PM > > >> To: CCP4BB@JISCMAIL.AC.UK > > >> Subject: Re: [ccp4bb] PDB format survey? > > >> > > >> Edward A. Berry wrote: > > >> > Ethan A Merritt wrote: > > >> >> On Wednesday 08 August 2007 20:47, Ralf W. > > Grosse-Kunstleve wrote: > > >> >>> Implementations to generate intuitive, maximally backward > > >> compatible > > >> >>> numbers can be found here: > > >> >>> > > >> >>> http://cci.lbl.gov/hybrid_36/ > > >> >> > > >> >> From that URL: > > >> >> > > >> >> ATOM 99998 SD MET L9999 48.231 -64.383 -9.257 1.00 > > >> >> 11.54 S > > >> >> ATOM 99999 CE MET L9999 49.398 -63.242 -10.211 1.00 > > >> >> 14.60 C > > >> >> ATOM A0000 N VAL LA000 52.228 -67.689 -12.196 1.00 > > >> >> 8.76 N > > >> >> ATOM A0001 CA VAL LA000 53.657 -67.774 -12.458 1.00 > > >> >> 3.40 C > > >> >> > > >> >> Could you please clarify this example? > > >> >> Is that "A0000" a hexidecimal number, or is it a decimal > > >> number that > > >> >> just happens to have an "A" in front of it? > > >> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of > > >> hexadecimal, > > >> >> so I'm guessing it's the former. But the example is not clear. > > >> >> > > >> > I'm guessing the former also. A 5-digit hex number would not be > > >> > backwards compatible. With this system legacy programs can > > >> still read > > >> > the files with 99999 atoms or less, and anything more than > > >> that they > > >> > couldn't have handled anyway. Very nice! > > >> > > > >> > Ed > > >> I still prefer the idea of just truncating serial numbers, > > and using > > >> an alternative to CONECT for large structures. > > >> Almost nobody uses atomSerial, but it still may be parsed as an > > >> integer, where the above idea could cause errors. > > >> Furthermore, non-digit encoding still results in another maximum, > > >> whereas truncating the numbers has no limit. The truncated serial > > >> number is ambiguous only if taken out of context of the > > >> > > >> complete PDB file, but PDB files are by design sequential. > > >> > > >> Another alternative is to define an "atom-serial offset" > > >> record. It can define a number which is added to all subsequently > > >> parsed atom serial numbers. Every ATOM/HETATM record is then > > >> perfectly valid to an older program, but may only be able > > to handle > > >> one chunk of atoms at once. > > >> > > >> Likewise, I like the idea of a ChainID map record, which maps > > >> single-letter chainID's to larger named ID's. Each existing PDB > > >> record can then be used unchanged, but files can then support very > > >> long ChainID > > >> > > >> strings. The only disadvantage is that old PDB readers will get > > >> confused, but at least the individual record formats are > > not changed > > >> in a way that makes them crash. > > >> > > >> I think that keeping the old record definitions completely > > unchanged > > >> are > > >> > > >> an important feature to any PDB format revisions. Even if > > we continue > > >> to > > >> > > >> use it for another 20 years, it's primary advantage is > > that it is a > > >> well-established "legacy" format. If we change existing > > records, we > > >> break that one useful feature. > > >> Therefore, I think that any changes to existing records should be > > >> limited to using characters positions that are currently. (The one > > >> exception is that we need to make the HEADER Y2K > > >> > > >> compatible by using a 4-digit year, which means the existing > > >> decade+year > > >> > > >> characters have to be moved.) > > >> > > >> Of course, the more important issue is that the final > > decision needs > > >> community involvement, and not just a decision by a small group of > > >> RCSB or wwPDB administrators. > > >> > > >> Maybe it would be useful to set up a PDB format "Wiki" where > > >> alternatives can be defined, along with advantages and > > disadvantages. > > >> If > > >> > > >> there was sufficient agreement, it could be used as a > > community tool > > >> to put together a draft revision of the next PDB format. With any > > >> luck, some RCSB or wwPDB people would participate as well. > > >> > > >> Joe Krahn > > >> > > >> > > >> > > >> > > > > > > > > > > > >