Re: [ccp4bb] mmCIF as working format?

Eugene Krissinel Tue, 06 Aug 2013 08:40:13 -0700

Dear All,

The following post is on behalf of Kim Henrick, where I add some editing and 
thoughts of my own with his permission, so I refer to both of us in the text.


As Herbert notices quite rightly, we need to respect each other working 
preferences and needs. I should think that much of effort going from CCP4, PDB, 
Phenix and other players on this ground is devoted to exactly that. Here, 
providing a suitable format is an extremely important and challenging 
endeavour, which will have a significant impact on the field. It may look like 
a distraction from doing science "right here, right now", but in fact, it is 
aimed at preventing a far greater distraction, which would emerge if the issue 
remains unattended and various format solutions appear here and there in 
uncontrollable manner. I hope that it may be agreed that maintaining one 
format, albeit not 100% suitable for just everyone, is preferable to 
maintaining N formats, each one 100% suitable for 1 out of N groups of 
crystallographers, with N(N-1) converters between them.

But in reality, where such a dissatisfaction with mmCIF originates from?


1) "I CANNOT GREP IT"

- this is not correct. By a special agreement, atom site loop in mmCIF look 
essentially the same as in PDB, or wPDB. Left-most columns are predetermined, 
compliant with PDB and made mandatory, while optional (custom or meta-data) 
columns are shifted to the right. In fact, all existing grep scripts should 
work on mmCIF already today without any modification or after a minimal 
modification.


2) "I CANNOT READ IT WITH MY EYE, HATE IT"

- again, coordinate data in mmCIF is just as readable as in the PDB, because it 
is arranged in exactly the same way, line-wise with keywords. Admittedly, this 
is done by agreement, rather than enforced by the format, in order to minimise 
distraction for all those reading PDB files from their screens, and those keen 
on greping them. What else many people do read? Yes, REMARKs are more readable 
in the PDB, but they are much more difficult to parse, and they are not 
suitable for keeping metadata with relational links. If one is so keen to 
human-reading REMARKS from PDB files, I suggest that this is done more 
conveniently using PDB web-pages, which are designed specially for that and 
offer many other useful options.


3) "I CANNOT CHANGE A CHAIN ID IN IT USING MY FAVOURITE EDITOR"

- not correct. You CAN do that in the same way as in the PDB. You CAN change 
anything in the file -- of course, if you know what you do. Because atom data 
in mmCIF is formatted in the same way as in PDB, I do not need to give you any 
instructions. By doing this, you will probably break meta data in other parts 
of mmCIF file, but this cannot be helped. Did you always care about metadata 
when changing chain ID by hand in a PDB file? I only can add that hacking into 
data is the last thing that can be advisable or that I would consider to be a 
honour to teach to students. Admittedly, we may lack tools to edit files in a 
safe way at the moment, but a constructive approach here is to raise an issue 
and suggest specifications for such tools, if not to offer your own one(s).


4) "MY OWN PROGRAMS READ/WRITE ONLY PDB FORMAT AND I DON'T HAVE TIME TO REWRITE 
THEM"

- all CCP4, Phenix and the PDB provide format converters. It takes only a 
superficial python wrap-up around your code to teach it mmCIF, which involves 
only extra file operations, quite cheap these days. In fact, this is what CCP4 
will do to its own codes that are considered not suitable for rewriting to 
mmCIF, so we are all in the same boat here. Certainly, this solution will 
remain applicable only where your program was applicable before. If your code 
cannot cope with new extended data, e.g., 100 protein chains, this is not a 
format issue.


5) "MMCIF IS NOT SUPPORTED BY MOLECULAR VIEWERS"

- this is not correct. PDB has consulted major viewer vendors and the agreement 
has been reached. All viewers coming from CCP4, Phenix and the PDB will be 
mmCIF-compatible. In fact, this is so right now, just try Coot or CCP4mg on 
mmCIF files, and if something does not work :), please report to us.


6) "MMCIF IS NOTORIOUSLY DIFFICULT FOR READING AND WRITING IN APPLICATIONS"

- not if you are using an API from one of CCP4, Phenix or the PDB. Really, 
there is no honour or achievement in writing a yet another parser, 
reader/writer. We work hard to provide you with tools for doing something 
better than primitive coding. Do you code from scratch when reading JPG or PNG, 
or when printing into PDF? What about MTZ, still a low-level reading, too?


7) "I DO NOT LIKE YOUR APIs"

- APIs are expandable and much more flexible than formats. It takes only 
collaboration spirit and effort to improve them. They can be always kept 
backward-compatible.


8) "YOU HAVE DONE EVERYTHING WITHOUT ASKING ME, AGAIN"

- this has nothing to do with advantages or disadvantages of mmCIF. It took 
almost 20 years of discussions around mmCIF, so it is not fair to say that 
absolutely nothing was done. However, 20 years is long enough to realise that a 
100% ideal solution is not reachable, while there is no time left, a solution 
is indeed needed. We will try to minimise the impact on end-users, who use the 
software to solve structures, and if you can anticipate that something 
particular will be severely impaired by format change, please let us know.

Is there something I forgot in this list?

Many thanks to everybody,

Eugene


On 6 Aug 2013, at 03:10, Herbert J. Bernstein wrote:

> Dear Colleagues,
> 
> This exchange is a wonderful illustration of the simple fact that different 
> scientists
> work differently, favoring different approach and different tools. For some, 
> the latest
> and greatest formats and support systems are what they need to be productive. 
> For
> a surprising large number of others, change to new methods is a pointless 
> distraction
> from doing good science. What we need to do as a community is not to tell one
> another how they _must_ do their work, but to listen to one another, being 
> helpful
> where we can, and showing mutual respect where we cannot.
> 
> To this end, Frances and I have revived an old idea from 2006 of creating a 
> format
> that looks much like the old PDB format but is 132 columns wide with more 
> characters
> allotted to fields that need them. We re-enabled the WPDB server at
> http://biomol.dowling.edu/wpdb which can produce either a 132-column 'PDB' 
> entry or
> an 80 column PDB entry based on the mmCIF files on the wwPDB server. This 
> allows
> people who work best with tools such as grep and a simple fixed-field format 
> to have
> most of the newer, larger PDB entries in a wide version of the PDB format. If 
> you don't
> need it, or don't like it, you should not use it. If you have need for it, 
> and need some
> things changed, send us an email, and we'll see what we can do to oblige.
> 
> Right now it is on an old, slow server. If there is significant use, I'll 
> move it
> to something bigger and faster.
> 
> Regards,
> Herbert and Frances Bernstein
> 
> 
> On 8/5/13 4:05 PM, Boaz Shaanan wrote:
>> 
>> 
>> /Boaz Shaanan, Ph.D.
>> Dept. of Life Sciences
>> Ben-Gurion University of the Negev
>> Beer-Sheva 84105
>> Israel
>> 
>> E-mail: bshaa...@bgu.ac.il
>> Phone: 972-8-647-2220 Skype: boaz.shaanan
>> Fax: 972-8-647-2992 or 972-8-646-1710 /
>> //
>> //
>> /
>> 
>> /
>> ------------------------------------------------------------------------
>> *From:* Nat Echols [nathaniel.ech...@gmail.com]
>> *Sent:* Monday, August 05, 2013 10:45 PM
>> *To:* בעז שאנן
>> *Cc:* CCP4BB@JISCMAIL.AC.UK
>> *Subject:* Re: [ccp4bb] mmCIF as working format?
>> 
>> On Mon, Aug 5, 2013 at 12:37 PM, Boaz Shaanan <bshaa...@bgu.ac.il 
>> <mailto:bshaa...@bgu.ac.il>> wrote:
>> 
>>    There seems to be some kind of a gap between users and developers
>>    as far the eagerness to abandon PDB in favour of mmCIF. I myself
>>    fully agree with Jeffrey about the ease of manipulating PDB's
>>    during work, particularly when encountering unusual circumstances
>>    (and there are many of those, as we all know). And how about
>>    non-crystallographers that are using PDB's for visualization and
>>    understanding how their proteins work? I teach many such students
>>    and it's fairly easy to explain to them where to look in the PDB
>>    for particular pieces of information relevant to the structure. I
>>    can't imagine how they'll cope with the cryptic mmCIF format.
>> 
>> 
>> >I think the only gap is between developers and *expert* users - most of the 
>> >community simply wants tools and formats that work with a >minimum of 
>> >fiddling.
>> 
>> That assumes that you can offer such software, but can you? I doubt that 
>> this goal is reachable (in fact our daily experience proves just that), with 
>> all due respect to you developers.
>> 
>> >Again, if users are having to examine the raw PDB records visually to find 
>> >information, this is a failure of the software.
>> It's not raw, it's easily readable text, very easy to interpret with very 
>> little effort.
>> 
>> Anyway, this discussion is a waste of time. The decision has been taken, 
>> mmCIF will prevail and we (expert and non-expert users) have to swallow the 
>> pill.
>> 
>> Boaz
>> 
>> -Nat

Re: [ccp4bb] mmCIF as working format?

Reply via email to