Re: [ccp4bb] PDB format survey?

Thomas Stout Wed, 01 Aug 2007 15:41:18 -0700

I suspect this will be throwing fuel on the fire, but what is so great about 
the PDB format (any version) besides familiarity?  It seems to me to be 
outdated, inadequate and generally mis-used by all.  I say scrap it, make a 
clean break and devote everyone's energies to making a format that will work 
for everyone. (granted: it is inexcusable for the RCSB to be developing new 
formats without the input from affected parties).   mmCIF seems like a good 
idea that has not gotten the attention it needs (and deserves) to be formulated 
to meet everyone's needs.  As for the "legacy program" argument: that's what 
translation programs like OpenBabel are for (or even a very simple 
python/perl/your-favorite-hammer script).  Perhaps even the RCSB could be 
convinced to offer several formats for download......oh, wait - they already 
do.....

Ducking behind my asbestos-free, all-natural organic firewall,
-Tom

-----Original Message-----
From: CCP4 bulletin board on behalf of Ethan Merritt
Sent: Wed 8/1/2007 3:06 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] PDB format survey?

On Wednesday 01 August 2007 14:10, Joe Krahn wrote:
> In addition to questions about the PDB standard, it is probably
> important to consider mmCIF. One thing I don't like about it is that
> columns can be randomized (i.e. X, Y, and Z can be in any column), but
> the mmCIF standards people have no interest in defining a more strict
> standard that would require files to be as human readable as RCSB's
> mmCIF files.

The important thing about mmCIF is not the precise file format,
which is ultimately not of interest except as a parsible exchange
medium, but rather the existence of the mmCIF dictionaries.

A more productive discussion may be to revisit the definition
of what information we as a community expect to be captured in the
PDB database.  The question of export formats is secondary.

> Does this sound useful, or have most people given up on having any
> influence on standards? Or, should the structural biology software
> developers get together and just make our own OpenPDB format?

As discussed at the PDB group discussion at the ACA meeting, some new
depositions are not representable in the PDB format (including v3).

Examples include:
- very large structures, for which the current 80 column PDB format
  runs out of space for atom numbers (4 columns -> max 9999)
  or for chain ids (1 column -> single char A-Z 0-9)
  [don't ask my why they don't want lower case]
- new classes of experiment (SAXS, EM)
- new classes of model (TLS or normal-mode displacements,
  ensemble models, envelope representations)

I am inclined to say that there should be a fork into two distinct
formats, used for different purposes.

The 80 column PDB format should be frozen, preferably at the
pre-version3 state. Freezing it would allow legacy programs to continue
to read old PDB files without modification. These programs will not be
able to handle certain classes of new structures, but this would be true
in any case for legacy code.  Churn in the 80 column PDB format would
aggravate rather than relieve this limitation. This branch would serve
the general community who are primarily viewers of previously deposited
structures, and any programs not currently being maintained.

Currently-maintained programs should move to mmCIF or XML, whichever
is convenient.  These formats are intrinsically open-ended, and can
handle the problematic structures mentioned above so long as the
corresponding mmCIF dictionaries are updated to define the relevant
entities.

The wwwPDB database is already capable of exporting to any PDB, XML,
or mmCIF format. So this would really be a change on the user
side more than on the database side. 

The barrier to converting programs to mmCIF is lower than you
might think.  Several mmCIF parsing libraries are available to
allow currently maintained programs to offer mmCIF input/output
if they do not already do so.  One such is the mmlib library
developed by Jay Painter and hosted on SourceForge:

    http://pymmlib.sourceforge.net/

    J Painter and EA Merritt
    J. Appl. Cryst. 37, 174-178, (2004).
    "mmLib Python toolkit for manipulating annotated structural
     models of biological macromolecules".  

-- 
Ethan A Merritt

This email (including any attachments) may contain material
that is confidential and privileged and is for the sole use of
the intended recipient. Any review, reliance or distribution by
others or forwarding without express permission is strictly
prohibited. If you are not the intended recipient, please
contact the sender and delete all copies.

Exelixis, Inc. reserves the right, to the extent and under
circumstances permitted by applicable law, to retain, monitor
and intercept e-mail messages to and from its systems.

Re: [ccp4bb] PDB format survey?

Reply via email to