3.0 rule

Ronald E Stenkamp Sun, 06 Mar 2011 11:23:21 -0800

Could you please expand on your statement that "small-molecule data has essentially no weak 
spots."?  The small molecule data sets I've worked with have had large numbers of "unobserved" 
reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections).  Would you consider those 
"weak" spots or not?  Ron


On Sun, 6 Mar 2011, James Holton wrote:

I should probably admit that I might be indirectly responsible for theresurgence of this I/sigma > 3 idea, but I never intended this in the waydescribed by the original poster's reviewer!
What I have been trying to encourage people to do is calculate R factorsusing only hkls for which the signal-to-noise ratio is > 3. Not refinement!Refinement should be done against all data. I merely propose that weak databe excluded from R-factor calculations after therefinement/scaling/mergeing/etc. is done.
This is because R factors are a metric of the FRACTIONAL error in something(aka a "% difference"), but a "% error" is only meaningful when the thingbeing measured is not zero. However, in macromolecular crystallography, wetend to measure a lot of "zeroes". There is nothing wrong with measuringzero! An excellent example of this is confirming that a systematic absenceis in fact "absent". The "sigma" on the intensity assigned to an absent spotis still a useful quantity, because it reflects how confident you are in themeasurement. I.E. a sigma of "10" vs "100" means you are more sure that theintensity is zero. However, there is no "R factor" for systematic absences.How could there be! This is because the definition of "% error" starts tobreak down as the "true" spot intensity gets weaker, and it becomescompletely meaningless when the "true" intensity reaches zero.
Historically, I believe the widespread use of R factors came about becausesmall-molecule data has essentially no weak spots. With the exception ofabsences (which are not used in refinement), spots from "salt crystals" arestrong all the way out to edge of the detector, (even out to the "limitingsphere", which is defined by the x-ray wavelength). So, when all the dataare strong, a "% error" is an easy-to-calculate quantity that actuallydescribes the "sigma"s of the data very well. That is, sigma(I) of strongspots tends to be dominated by things like beam flicker, spindle stability,shutter accuracy, etc. All these usually add up to ~5% error, and indeedeven the Braggs could typically get +/-5% for the intensity of the diffractedrays they were measuring. Things like Rsym were therefore created to checkthat nothing "funny" happened in the measurement.
For similar reasons, the quality of a model refined against all-strong datais described very well by a "% error", and this is why the refinement Rfactors rapidly became popular. Most people intuitively know what you meanif you say that your model fits the data to "within 5%". In fact, a widelyused criterion for the correctness of a "small molecule" structure is thatthe refinement R factor must be LOWER than Rsym. This is equivalent tosaying that your curve (model) fit your data "to within experimental error".Unfortunately, this has never been the case for macromolecular structures!
The problem with protein crystals, of course, is that we have lots of "weak"data. And by "weak", I don't mean "bad"! Yes, it is always nicer to havemore intense spots, but there is nothing shameful about knowing that certainintensities are actually very close to zero. In fact, from the point of viewof the refinement program, isn't describing some high-angle spot as: "zero,plus or minus 10", better than "I have no idea"? Indeed, several worksmentioned already as well as the "free lunch algorithm" have demonstratedthat these "zero" data can actually be useful, even if it is well beyond the"resolution limit".
So, what do we do? I see no reason to abandon R factors, since they havesuch a long history and give us continuity of criteria going back almost acentury. However, I also see no reason to punish ourselves by including lotsof zeroes in the denominator. In fact, using weak data in an R factorcalculation defeats their best feature. R factors are a very good estimateof the fractional component of the total error, provided they are calculatedwith strong data only.
Of course, with strong and weak data, the best thing to do is compare themodel-data disagreement with the magnitude of the error. That is, compare|Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement programs dothis! And I say the more data the merrier.
-James Holton
MAD Scientist


On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:
hi
Recently on a paper I submitted, it was the editor of the journal whowanted exactly the same thing. I never argued with the editor about this(should have maybe), but it could be one cause of the epidemic that BartHazes saw....
best regards

Marjolein

On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:
Dear all,
I got a reviewer comment that indicate the "need to refine the structuresat an appropriate resolution (I/sigmaI of>3.0), and re-submit the revisedcoordinate files to the PDB for validation.". In the manuscript I presentsome crystal structures determined by molecular replacement using the sameprotein in a different space group as search model. Does anyone know theorigin or the theoretical basis of this "I/sigmaI>3.0" rule for anappropriate resolution?
Thanks,
Bye,
Roberto.


Roberto Battistutta
Associate Professor
Department of Chemistry
University of Padua
via Marzolo 1, 35131 Padova - ITALY
tel. +39.049.8275265/67
fax. +39.049.8275239
roberto.battistu...@unipd.it
www.chimica.unipd.it/roberto.battistutta/
VIMM (Venetian Institute of Molecular Medicine)
via Orus 2, 35129 Padova - ITALY
tel. +39.049.7923236
fax +39.049.7923250
www.vimm.it

Re: [ccp4bb] I/sigmaI of >3.0 rule

Reply via email to