Phil,

A few points:


1. The recent erroneous structures that I'm aware of had either extremely
high free R values ( 40 - 45%) or unreasonably low free R values (low
twenties) despite the presence of > 80% solvent and gaps in crystal packing.

2. At low resolution (3.5 A and lower) , maps can be very difficult to
interpret.  In fact, it can be impossible to correct the model based on
low resolution maps alone. Yet, the statistics (free R value, geometry, ...)
will ultimately tell you if you have a better model.  The problem is how
to get to the better model.  High resolution structures of fragments of the
molecule may be required to ultimately obtain a better structure of the full-length
molecule.


3. The journals and funding agencies should make it
the responsibility of the authors to maintain archives of
their raw diffraction images, and they should make them available upon
request as is customary with other published materials.

Axel






Phil Evans wrote:
I worry a bit about some of this discussion, in that I wouldn't like the free-R-factor police to get too powerful. I imagine that many of us have struggled with datasets which are sub-optimal for all sorts of reasons (all crystals are multiple/split/twinned; substantial disordered regions; low resolution, etc) - and it is not possible to get better data. I have certainly fought hard to get free-R below (the magic) 30%, when I know the structure is _essentially_ right, but the details are a little blurred in places, even when I have done the best I can. Anyway the important things are not the statistics, but the maps.

Does this make the structure unpublishable? No, provided that we remember a basic tenet of science, that the conclusions drawn should be supported by the evidence available. With limited data, the conclusions may be more limited, but still often illuminate the biology, which is the reason for solving the structure in the first place.

The evidence should be available to readers & referees, so deposition at least structure factors should be compulsory (why isn't it already?). Unmerged data or images would be nice, but I doubt that many people would use them (great for developers though)

Phil

On 20 Aug 2007, at 08:24, George M. Sheldrick wrote:

Dear Alex,

Of course a simplified one page summary would not be the last word, but I
think that it would be a big step in the right direction. For example a
value of Rfree that is 'too good' because the reflection set for it has
been chosen wrongly can be detected statistically (Tickle et al., Acta
D56 (2000) 443-450). And it would be not be too difficult to distinguish
between three possible causes of incomplete data: (a) there is a dead
cone of data because it was a single scan of a low symmetry crystal,
(b) a large number of 'overloads' were rejected (they would all have
fairly low resolution and high Fc values) or (c) the missing reflections
are fairly randomly distributed because they have been removed by hand to
improve the R-values. I think that there is a very good case for making
this Rinformation available to referees in an easily comprehensible form.

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Sun, 19 Aug 2007, Alexander Aleshin wrote:

I do not think the small molecule approach proposed by George Sheldrick
is sufficient for validation of protein structures, as misrepresentation
of experimental statistics/resolution is hard to detect with it, and
these factors appear to play crucial role in defining the fate of many
hot structures.

The bad statistics hurts publication more than mistakes in a model, and
improving the experiment is often too hard. "I know my structure is
right. Why should I spend another year growing better crystals only to
make the statistics look right?" - sounds as a strong argument for a
desperate researcher. Making up an artificial data set overkills the
task. There are easier and "less amoral" ways such as rejection of
outliers and incorrect assignment of the Rfree test set. Ironically, an
undereducated crystallographer may not recognize wrongdoing in such data
treatment, which makes it even more likely to occur.

Do I sound paranoid? And please do not suggest that I have shared
personal experiences.


Alex Aleshin


On Sat, 18 Aug 2007, George M. Sheldrick wrote:

There are good reasons for preserving frames, but most of all for the
crystals that appeared to diffract but did not lead to a successful
structure solution, publication, and PDB deposition. Maybe in the
future
there will be improved data processing software (for example to
integrate
non-merohedral twins) that will enable good structures to be obtained
from
such data. At the moment most such data is thrown away. However,
forcing
everyone to deposit their frames each time they deposit a structure
with
the PDB would be a thorough nuisance and major logistic hassle.

It is also a complete illusion to believe that the reviewers for
Nature
etc. would process or even look at frames, even if they could download

them with the manuscript.

For small molecules, many journals require an 'ORTEP plot' to be
submitted
with the paper. As older readers who have experienced Dick Harlow's
'ORTEP
of the year' competition at ACA Meetings will remember, even a viewer
with little experience of small-molecule crystallography can see from
the
ORTEP plot within seconds if something is seriously wrong, and many
non-crystallographic referees for e.g. the journal Inorganic Chemistry

can even make a good guess as to what is wrong (e.g wrong element
assigned
to an atom). It would be nice if we could find something similar for
macromolecules that the author would have to submit with the paper.
One
immediate bonus is that the authors would look at it carefully
themselves before submitting, which could lead to an improvement of
the
quality of structures being submitted. My suggestion is that the wwPDB

might provide say a one-page diagnostic summary when they allocate
each
PDB ID that could be used for this purpose.

A good first pass at this would be the output that the MolProbity
server
http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It

starts with a few lines of summary in which bad things are marked red
and the structure is assigned to a pecentile: a percentile of 6% means

that 93% of the sturcture in the PDB with a similar resolution are
'better' and 5% are 'worse'. This summary can be understood with very
little crystallographic background and a similar summary can
of course be produced for NMR structures. The summary is followed by
diagnostics for each residue, normally if the summary looks good it
would not be necessary for the editor or referee to look at the rest.

Although this server was intended to help us to improve our structures

rather than detect manipulated or fabricated data, I asked it for a
report on 2HR0 to see what it would do (probably many other people
were
trying to do exactly the same, the server was slower than usual).
Although the structure got poor marks on most tests, MolProbity
generously assigned it overall to the 6th pecentile, I suppose that
this is about par for structures submitted to Nature (!). However
there
was one feature that was unlike anything I have ever seen before
although I have fed the MolProbity server with some pretty ropey PDB
files in the past: EVERY residue, including EVERY WATER molecule, made

either at least one bad contact or was a Ramachandran outlier or was a

rotamer outlier (or more than one of these). This surely would ring
all the alarm bells!

So I would suggest that the wwPDB could coordinate, with the help of
the
validation experts, software to produce a short summary report that
would be automatically provided in the same email that allocates the
PDB
ID. This email could make the strong recommendation that the report
file
be submitted with the publication, and maybe in the fullness of time
even the Editors of high profile journals would require this report
for
the referees (or even read it themselves!). To gain acceptance for
such
a procedure the report would have to be short and comprehensible to
non-crystallographers; the MolProbity summary is an excellent first
pass in this respect, but (partially with a view to detecting
manipulation of the data) a couple of tests could be added based on
the
data statistics as reported in the PDB file or even better the
reflection data if submitted). Most of the necessary software already
exists, much of it produced by regular readers of this bb, it just
needs
to be adapted so that the results can be digested by referees and
editors with little or no crystallographic experience. And most
important,
a PDB ID should always be released only in combination with such a
summary.

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582




--
Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Web:    http://atb.slac.stanford.edu
Email: [EMAIL PROTECTED] Phone: +1 650-736-1031
Fax:    +1 650-745-1463

Reply via email to