Hi Ethan, > Is that "A0000" a hexidecimal number, or is it a decimal number > that just happens to have an "A" in front of it? > [A-Z][0-9999] gives a larger range of values than 5 bytes of hexadecimal, > so I'm guessing it's the former. But the example is not clear. > > (yes I could download and inspect the source, but I'm lazy tonight)
If you simply click on the Overview and Python reference implementation: (as text) link you'll get the full story immediately in your web browser. For your convenience, it is also attached below. Cheers, Ralf Prototype/reference implementation for encoding and decoding atom serial numbers and residue sequence numbers in PDB files. PDB ATOM and HETATM records reserve columns 7-11 for the atom serial number. This 5-column number is used as a reference in the CONECT records, which also reserve exactly five columns for each serial number. With the decimal counting system only up to 99999 atoms can be stored and uniquely referenced in a PDB file. A simple extension to enable processing of more atoms is to adopt a counting system with more than ten digits. To maximize backward compatibility, the counting system is only applied for numbers greater than 99999. The "hybrid-36" counting system implemented in this file is: ATOM 1 ... ATOM 99999 ATOM A0000 ATOM A0001 ... ATOM A0009 ATOM A000A ... ATOM A000Z ATOM ZZZZZ ATOM a0000 ... ATOM zzzzz I.e. the first 99999 serial numbers are represented as usual. The following atoms use a base-36 system (10 digits + 26 letters) with upper-case letters. 43670016 (26*36**4) additional atoms can be numbered this way. If there are more than 43770015 (99999+43670016) atoms, a base-36 system with lower-case letters is used, allowing for 43670016 additional atoms. I.e. in total 87440031 (99999+2*43670016) atoms can be stored and uniquely referenced via CONECT records. The counting system is designed to avoid lower-case letters until the range of numbers addressable by upper-case letters is exhausted. Importantly, with this counting system the distinction between "traditional" and "extended" PDB files becomes evident only if there are more than 99999 atoms to be stored. Programs that are updated to support the hybrid-36 counting system will continue to interoperate with programs that do not as long as there are less than 100000 atoms. PDB ATOM and HETATM records also reserve columns 23-26 for the residue sequence number. This 4-column number is used as a reference in other record types (SSBOND, LINK, HYDBND, SLTBRG, CISPEP), which also reserve exactly four columns for each sequence number. With the decimal counting system only up to 9999 residues per chain can be stored and uniquely referenced in a PDB file. If the hybrid-36 system is adopted, 1213056 (26*36**3) additional residues can be numbered using upper-case letters, and the same number again using lower-case letters. I.e. in total each chain may contain up to 2436111 (9999+2*1213056) residues that can be uniquely referenced from the other record types given above. The implementation in this file should run with Python 2.2 or higher. There are no other requirements. Run this script without arguments to obtain usage examples. Note that there are only about 60 lines of "real" code. The rest is documentation and unit tests. To update an existing program to support the hybrid-36 counting system, simply replace the existing read/write source code for integer values with equivalents of the hy36decode() and hy36encode() functions below. This file is unrestricted Open Source (cctbx.sf.net). Please send corrections and enhancements to [EMAIL PROTECTED] . See also: http://cci.lbl.gov/hybrid_36/ http://www.pdb.org/ "Dictionary & File Formats" Ralf W. Grosse-Kunstleve, Feb 2007. ____________________________________________________________________________________ Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting