Could you please expand on your statement that "small-molecule data has essentially no weak
spots."? The small molecule data sets I've worked with have had large numbers of "unobserved"
reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those
"weak" spots or not? Ron
On Sun, 6 Mar 2011, James Holton wrote:
I should probably admit that I might be indirectly responsible for the
resurgence of this I/sigma > 3 idea, but I never intended this in the way
described by the original poster's reviewer!
What I have been trying to encourage people to do is calculate R factors
using only hkls for which the signal-to-noise ratio is > 3. Not refinement!
Refinement should be done against all data. I merely propose that weak data
be excluded from R-factor calculations after the
refinement/scaling/mergeing/etc. is done.
This is because R factors are a metric of the FRACTIONAL error in something
(aka a "% difference"), but a "% error" is only meaningful when the thing
being measured is not zero. However, in macromolecular crystallography, we
tend to measure a lot of "zeroes". There is nothing wrong with measuring
zero! An excellent example of this is confirming that a systematic absence
is in fact "absent". The "sigma" on the intensity assigned to an absent spot
is still a useful quantity, because it reflects how confident you are in the
measurement. I.E. a sigma of "10" vs "100" means you are more sure that the
intensity is zero. However, there is no "R factor" for systematic absences.
How could there be! This is because the definition of "% error" starts to
break down as the "true" spot intensity gets weaker, and it becomes
completely meaningless when the "true" intensity reaches zero.
Historically, I believe the widespread use of R factors came about because
small-molecule data has essentially no weak spots. With the exception of
absences (which are not used in refinement), spots from "salt crystals" are
strong all the way out to edge of the detector, (even out to the "limiting
sphere", which is defined by the x-ray wavelength). So, when all the data
are strong, a "% error" is an easy-to-calculate quantity that actually
describes the "sigma"s of the data very well. That is, sigma(I) of strong
spots tends to be dominated by things like beam flicker, spindle stability,
shutter accuracy, etc. All these usually add up to ~5% error, and indeed
even the Braggs could typically get +/-5% for the intensity of the diffracted
rays they were measuring. Things like Rsym were therefore created to check
that nothing "funny" happened in the measurement.
For similar reasons, the quality of a model refined against all-strong data
is described very well by a "% error", and this is why the refinement R
factors rapidly became popular. Most people intuitively know what you mean
if you say that your model fits the data to "within 5%". In fact, a widely
used criterion for the correctness of a "small molecule" structure is that
the refinement R factor must be LOWER than Rsym. This is equivalent to
saying that your curve (model) fit your data "to within experimental error".
Unfortunately, this has never been the case for macromolecular structures!
The problem with protein crystals, of course, is that we have lots of "weak"
data. And by "weak", I don't mean "bad"! Yes, it is always nicer to have
more intense spots, but there is nothing shameful about knowing that certain
intensities are actually very close to zero. In fact, from the point of view
of the refinement program, isn't describing some high-angle spot as: "zero,
plus or minus 10", better than "I have no idea"? Indeed, several works
mentioned already as well as the "free lunch algorithm" have demonstrated
that these "zero" data can actually be useful, even if it is well beyond the
"resolution limit".
So, what do we do? I see no reason to abandon R factors, since they have
such a long history and give us continuity of criteria going back almost a
century. However, I also see no reason to punish ourselves by including lots
of zeroes in the denominator. In fact, using weak data in an R factor
calculation defeats their best feature. R factors are a very good estimate
of the fractional component of the total error, provided they are calculated
with strong data only.
Of course, with strong and weak data, the best thing to do is compare the
model-data disagreement with the magnitude of the error. That is, compare
|Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement programs do
this! And I say the more data the merrier.
-James Holton
MAD Scientist
On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:
hi
Recently on a paper I submitted, it was the editor of the journal who
wanted exactly the same thing. I never argued with the editor about this
(should have maybe), but it could be one cause of the epidemic that Bart
Hazes saw....
best regards
Marjolein
On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:
Dear all,
I got a reviewer comment that indicate the "need to refine the structures
at an appropriate resolution (I/sigmaI of>3.0), and re-submit the revised
coordinate files to the PDB for validation.". In the manuscript I present
some crystal structures determined by molecular replacement using the same
protein in a different space group as search model. Does anyone know the
origin or the theoretical basis of this "I/sigmaI>3.0" rule for an
appropriate resolution?
Thanks,
Bye,
Roberto.
Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it