On Wednesday 01 August 2007 14:10, Joe Krahn wrote: > In addition to questions about the PDB standard, it is probably > important to consider mmCIF. One thing I don't like about it is that > columns can be randomized (i.e. X, Y, and Z can be in any column), but > the mmCIF standards people have no interest in defining a more strict > standard that would require files to be as human readable as RCSB's > mmCIF files.
The important thing about mmCIF is not the precise file format, which is ultimately not of interest except as a parsible exchange medium, but rather the existence of the mmCIF dictionaries. A more productive discussion may be to revisit the definition of what information we as a community expect to be captured in the PDB database. The question of export formats is secondary. > Does this sound useful, or have most people given up on having any > influence on standards? Or, should the structural biology software > developers get together and just make our own OpenPDB format? As discussed at the PDB group discussion at the ACA meeting, some new depositions are not representable in the PDB format (including v3). Examples include: - very large structures, for which the current 80 column PDB format runs out of space for atom numbers (4 columns -> max 9999) or for chain ids (1 column -> single char A-Z 0-9) [don't ask my why they don't want lower case] - new classes of experiment (SAXS, EM) - new classes of model (TLS or normal-mode displacements, ensemble models, envelope representations) I am inclined to say that there should be a fork into two distinct formats, used for different purposes. The 80 column PDB format should be frozen, preferably at the pre-version3 state. Freezing it would allow legacy programs to continue to read old PDB files without modification. These programs will not be able to handle certain classes of new structures, but this would be true in any case for legacy code. Churn in the 80 column PDB format would aggravate rather than relieve this limitation. This branch would serve the general community who are primarily viewers of previously deposited structures, and any programs not currently being maintained. Currently-maintained programs should move to mmCIF or XML, whichever is convenient. These formats are intrinsically open-ended, and can handle the problematic structures mentioned above so long as the corresponding mmCIF dictionaries are updated to define the relevant entities. The wwwPDB database is already capable of exporting to any PDB, XML, or mmCIF format. So this would really be a change on the user side more than on the database side. The barrier to converting programs to mmCIF is lower than you might think. Several mmCIF parsing libraries are available to allow currently maintained programs to offer mmCIF input/output if they do not already do so. One such is the mmlib library developed by Jay Painter and hosted on SourceForge: http://pymmlib.sourceforge.net/ J Painter and EA Merritt J. Appl. Cryst. 37, 174-178, (2004). "mmLib Python toolkit for manipulating annotated structural models of biological macromolecules". -- Ethan A Merritt