I suspect this will be throwing fuel on the fire, but what is so great about the PDB format (any version) besides familiarity? It seems to me to be outdated, inadequate and generally mis-used by all. I say scrap it, make a clean break and devote everyone's energies to making a format that will work for everyone. (granted: it is inexcusable for the RCSB to be developing new formats without the input from affected parties). mmCIF seems like a good idea that has not gotten the attention it needs (and deserves) to be formulated to meet everyone's needs. As for the "legacy program" argument: that's what translation programs like OpenBabel are for (or even a very simple python/perl/your-favorite-hammer script). Perhaps even the RCSB could be convinced to offer several formats for download......oh, wait - they already do.....
Ducking behind my asbestos-free, all-natural organic firewall, -Tom -----Original Message----- From: CCP4 bulletin board on behalf of Ethan Merritt Sent: Wed 8/1/2007 3:06 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] PDB format survey? On Wednesday 01 August 2007 14:10, Joe Krahn wrote: > In addition to questions about the PDB standard, it is probably > important to consider mmCIF. One thing I don't like about it is that > columns can be randomized (i.e. X, Y, and Z can be in any column), but > the mmCIF standards people have no interest in defining a more strict > standard that would require files to be as human readable as RCSB's > mmCIF files. The important thing about mmCIF is not the precise file format, which is ultimately not of interest except as a parsible exchange medium, but rather the existence of the mmCIF dictionaries. A more productive discussion may be to revisit the definition of what information we as a community expect to be captured in the PDB database. The question of export formats is secondary. > Does this sound useful, or have most people given up on having any > influence on standards? Or, should the structural biology software > developers get together and just make our own OpenPDB format? As discussed at the PDB group discussion at the ACA meeting, some new depositions are not representable in the PDB format (including v3). Examples include: - very large structures, for which the current 80 column PDB format runs out of space for atom numbers (4 columns -> max 9999) or for chain ids (1 column -> single char A-Z 0-9) [don't ask my why they don't want lower case] - new classes of experiment (SAXS, EM) - new classes of model (TLS or normal-mode displacements, ensemble models, envelope representations) I am inclined to say that there should be a fork into two distinct formats, used for different purposes. The 80 column PDB format should be frozen, preferably at the pre-version3 state. Freezing it would allow legacy programs to continue to read old PDB files without modification. These programs will not be able to handle certain classes of new structures, but this would be true in any case for legacy code. Churn in the 80 column PDB format would aggravate rather than relieve this limitation. This branch would serve the general community who are primarily viewers of previously deposited structures, and any programs not currently being maintained. Currently-maintained programs should move to mmCIF or XML, whichever is convenient. These formats are intrinsically open-ended, and can handle the problematic structures mentioned above so long as the corresponding mmCIF dictionaries are updated to define the relevant entities. The wwwPDB database is already capable of exporting to any PDB, XML, or mmCIF format. So this would really be a change on the user side more than on the database side. The barrier to converting programs to mmCIF is lower than you might think. Several mmCIF parsing libraries are available to allow currently maintained programs to offer mmCIF input/output if they do not already do so. One such is the mmlib library developed by Jay Painter and hosted on SourceForge: http://pymmlib.sourceforge.net/ J Painter and EA Merritt J. Appl. Cryst. 37, 174-178, (2004). "mmLib Python toolkit for manipulating annotated structural models of biological macromolecules". -- Ethan A Merritt This email (including any attachments) may contain material that is confidential and privileged and is for the sole use of the intended recipient. Any review, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Exelixis, Inc. reserves the right, to the extent and under circumstances permitted by applicable law, to retain, monitor and intercept e-mail messages to and from its systems.