Dear Phil,

Your observation that the refinement details in PDB format REMARKs are difficult to interpret and compare is well taken. Each refinement package produces its own set of refinement results calculated in its own way. Both the calculation and presentation of this information in PDB format differs between programs, and even between program versions. The lack of standardization in how refinement information is reported is confusing to many PDB users.

In the spirit of supporting innovation, the PDB has historically tried to accommodate this diversity by providing program- and version- specific REMARK 3 formats. However, the field of structural biology has matured considerably in the past few decades, and time-tested, consensus, and best-practice approaches can now be defined in many cases. In our view, adopting such approaches (rather than accomodating every variant ever implemented) would be the best way to serve the interests of both non-expert user communities and the experimental structural biology community.

As an illustration, it is interesting to note that there are at least 20 different types of R-values reported in the current archive. The subtle differences in these quantities may be of interest in understanding the evolution of refinement methodology. However, we believe that a smaller, common set of well-defined data items describing refinement results would be more useful to the broader community of PDB users.

To this end, the wwPDB maintains an Exchange Data Dictionary of community-vetted definitions and examples of each data item in the PDB archive. This is an extensible dictionary that grows with new technologies and science. For instance, wwPDB has used this extensibility to capture and define all the various R-values. While the dictionary technology provides a framework for definition and standardization, this only addresses part of the problem.

Even though we have precise definitions for the wide range of R-value types, R-value comparisons between entries is still complicated because the values are not uniformly populated across the archive. To fully address the problem, we not only need the standardization provided by the dictionary technology but also the cooperation of the software package developers in producing a common set of statistics and diagnostics. This does not preclude reporting new and novel data items, but these should be provided in addition to a common core of data results.

Further information about the PDB Exchange Data Dictionary can be found at our dictionary resource site, http://mmcif.pdb.org/

Correspondence information between our PDB Exchange Data Dictionary and items in the current PDB format is also available at
http://mmcif.pdb.org/dictionaries/pdb-correspondence/pdb2mmcif-2010.html

Sincerely,

Christine Zardecki
for the wwPDB



From: Phil Jeffrey <pjeff...@princeton.edu>
Date: May 19, 2010 4:02:22 PM EDT
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3
Reply-To: Phil Jeffrey <pjeff...@princeton.edu>



Compare these two lines from phenix.refine:
REMARK   3   NUMBER OF REFLECTIONS             : 46001
REMARK   3   FREE R VALUE TEST SET COUNT      : 2339

with those from refmac, ostensibly using the same data and start pdb:
REMARK   3   NUMBER OF REFLECTIONS             :   43672
REMARK   3   FREE R VALUE TEST SET COUNT      :  2339


I know there are 46011 reflections with |F|>0 in the files I used.
phenix.refine removes 10 of these as outliers. The 46001 remaining reported in REMARK 3 *include* the test set.

With REFMAC, 43672+2339=46011 so it appears that Refmac reports just the *working* set count in that first line, excluding the test set.

Is this is a bug with one program or the other, or a bug in the PDB definition of REMARK 3 ? http://www.wwpdb.org/documentation/ format23/remark3.html

This appears to be a source of inconsistency.

phenix.refine 1.6-289
refmac5 5.4.0077      (I'm apparently a Luddite)

Phil Jeffrey
Princeton



Reply via email to