Hi James, May I just offer a short counter-argument to your case for not including weak reflections in the merging residuals?
Unlike many people I rather like Rmerge, not because it tells you how good the data are, but because it gives you a clue as to how well the unmerged measurements agree with one another. It's already been mentioned on this thread that Rmerge is ~ 0.8 / <I/sigma> which means that the inverse is also true - an Rmerge of 0.8 indicates that the average measurement in the shell has an I/sigma of ~ 1 (presuming there are sufficient multiple measurements - if the multiplicity is < 3 or so this can be nonsense) This does not depend on the error model or the multiplicity. It just talks about the average. Now, if we exclude all measurements with an I/sigma of less than three we have no idea of how strong the reflections in the shell are on average. We're just top-slicing the good reflections and asking how well they agree. Well, with an I/sigma > 3 I would hope they agree rather well if your error model is reasonable. It would suddenly become rare to see an Rmerge > 0.3 in the outer shell. I like Rpim. It tells you how good the average measurement should be provided you have not too much radiation damage. However, without Rmerge I can't get a real handle on how well the measurements agree. Personally, what I would like to see is the full contents of the Scala log file available as graphs along with Rd from xdsstat and some other choice statistics so you can get a relatively complete picture, however I appreciate that this is unrealistic :o) Just my 2c. Cheerio, Graeme On 8 March 2011 20:07, James Holton <jmhol...@lbl.gov> wrote: > Although George does not mention anything about data reduction programs, I > take from his description that common small-molecule data processing > packages (SAINT, others?), have also been modernized to record all data (no > I/sigmaI > 2 or 3 cutoff). I agree with him that this is a good thing! And > it is also a good thing that small-molecule refinement programs use all > data. I just don't think it is a good idea to use all data in R factor > calculations. > > Like Ron, I will probably be dating myself when I say that when I first got > into the macromolecular crystallography business, it was still commonplace > to use a 2-3 sigma spot intensity cutoff. In fact, this is the reason why > the PDB wants to know your "completeness" in the outermost resolution shell > (in those days, the outer resolution was defined by where completeness drops > to ~80% after the 3 sigma spot cutoff). My experience with this, however, > was brief, as the maximum-likelihood revolution was just starting to take > hold, and the denzo manual specifically stated that only bad people use > sigma cutoffs > -3.0. Nevertheless, like many crystallographers from this > era, I have fond memories of the REALLY low R factors you can get by using > this arcane and now reviled practice. Rsym values of 1-2% were common. > > It was only recently that I learned enough about statistics to understand > the wisdom of my ancestors and that a 3-sigma cutoff is actually the "right > thing to do" if you want to measure a fractional error (like an R factor). > That is all I'm saying. > > -James Holton > MAD Scientist > > > On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote: > >> My small molecule experience is old enough (maybe 20 years) that I doubt >> if it's even close to representing current practices (best or otherwise). >> Given George's comments, I suspect (and hope) that less-than cutoffs are >> historical artifacts at this point, kept around in software for making >> comparisons with older structure determinations. But a bit of scanning of >> Acta papers and others might be necessary to confirm that. Ron >> >> >> On Sun, 6 Mar 2011, James Holton wrote: >> >> >>> Yes, I would classify anything with I/sigmaI < 3 as "weak". And yes, of >>> course it is possible to get "weak" spots from small molecule crystals. >>> After all, there is no spot so "strong" that it cannot be defeated by a >>> sufficient amount of background! I just meant that, relatively speaking, >>> the intensities diffracted from a small molecule crystal are orders of >>> magnitude brighter than those from a macromolecular crystal of the same >>> size, and even the same quality (the 1/Vcell^2 term in Darwin's formula). >>> >>> I find it interesting that you point out the use of a 2 sigma(I) >>> intensity cutoff for small molecule data sets! Is this still common >>> practice? I am not a card-carrying "small molecule crystallographer", so >>> I'm not sure. However, if that is the case, then by definition there are no >>> "weak" intensities in the data set. And this is exactly the kind of data >>> you want for least-squares refinement targets and computing "% error" >>> quality metrics like R factors. For likelihood targets, however, the "weak" >>> data are actually a powerful restraint. >>> >>> -James Holton >>> MAD Scientist >>> >>> On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote: >>> >>>> Could you please expand on your statement that "small-molecule data has >>>> essentially no weak spots."? The small molecule data sets I've worked with >>>> have had large numbers of "unobserved" reflections where I used 2 sigma(I) >>>> cutoffs (maybe 15-30% of the reflections). Would you consider those "weak" >>>> spots or not? Ron >>>> >>>> On Sun, 6 Mar 2011, James Holton wrote: >>>> >>>> I should probably admit that I might be indirectly responsible for the >>>>> resurgence of this I/sigma > 3 idea, but I never intended this in the way >>>>> described by the original poster's reviewer! >>>>> >>>>> What I have been trying to encourage people to do is calculate R >>>>> factors using only hkls for which the signal-to-noise ratio is > 3. Not >>>>> refinement! Refinement should be done against all data. I merely propose >>>>> that weak data be excluded from R-factor calculations after the >>>>> refinement/scaling/mergeing/etc. is done. >>>>> >>>>> This is because R factors are a metric of the FRACTIONAL error in >>>>> something (aka a "% difference"), but a "% error" is only meaningful when >>>>> the thing being measured is not zero. However, in macromolecular >>>>> crystallography, we tend to measure a lot of "zeroes". There is nothing >>>>> wrong with measuring zero! An excellent example of this is confirming >>>>> that >>>>> a systematic absence is in fact "absent". The "sigma" on the intensity >>>>> assigned to an absent spot is still a useful quantity, because it reflects >>>>> how confident you are in the measurement. I.E. a sigma of "10" vs "100" >>>>> means you are more sure that the intensity is zero. However, there is no >>>>> "R >>>>> factor" for systematic absences. How could there be! This is because the >>>>> definition of "% error" starts to break down as the "true" spot intensity >>>>> gets weaker, and it becomes completely meaningless when the "true" >>>>> intensity >>>>> reaches zero. >>>>> >>>>> Historically, I believe the widespread use of R factors came about >>>>> because small-molecule data has essentially no weak spots. With the >>>>> exception of absences (which are not used in refinement), spots from "salt >>>>> crystals" are strong all the way out to edge of the detector, (even out to >>>>> the "limiting sphere", which is defined by the x-ray wavelength). So, >>>>> when >>>>> all the data are strong, a "% error" is an easy-to-calculate quantity that >>>>> actually describes the "sigma"s of the data very well. That is, sigma(I) >>>>> of >>>>> strong spots tends to be dominated by things like beam flicker, spindle >>>>> stability, shutter accuracy, etc. All these usually add up to ~5% error, >>>>> and indeed even the Braggs could typically get +/-5% for the intensity of >>>>> the diffracted rays they were measuring. Things like Rsym were therefore >>>>> created to check that nothing "funny" happened in the measurement. >>>>> >>>>> For similar reasons, the quality of a model refined against all-strong >>>>> data is described very well by a "% error", and this is why the >>>>> refinement R >>>>> factors rapidly became popular. Most people intuitively know what you >>>>> mean >>>>> if you say that your model fits the data to "within 5%". In fact, a >>>>> widely >>>>> used criterion for the correctness of a "small molecule" structure is that >>>>> the refinement R factor must be LOWER than Rsym. This is equivalent to >>>>> saying that your curve (model) fit your data "to within experimental >>>>> error". >>>>> Unfortunately, this has never been the case for macromolecular structures! >>>>> >>>>> The problem with protein crystals, of course, is that we have lots of >>>>> "weak" data. And by "weak", I don't mean "bad"! Yes, it is always nicer >>>>> to >>>>> have more intense spots, but there is nothing shameful about knowing that >>>>> certain intensities are actually very close to zero. In fact, from the >>>>> point of view of the refinement program, isn't describing some high-angle >>>>> spot as: "zero, plus or minus 10", better than "I have no idea"? Indeed, >>>>> several works mentioned already as well as the "free lunch algorithm" have >>>>> demonstrated that these "zero" data can actually be useful, even if it is >>>>> well beyond the "resolution limit". >>>>> >>>>> So, what do we do? I see no reason to abandon R factors, since they >>>>> have such a long history and give us continuity of criteria going back >>>>> almost a century. However, I also see no reason to punish ourselves by >>>>> including lots of zeroes in the denominator. In fact, using weak data in >>>>> an >>>>> R factor calculation defeats their best feature. R factors are a very >>>>> good >>>>> estimate of the fractional component of the total error, provided they are >>>>> calculated with strong data only. >>>>> >>>>> Of course, with strong and weak data, the best thing to do is compare >>>>> the model-data disagreement with the magnitude of the error. That is, >>>>> compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement >>>>> programs do this! And I say the more data the merrier. >>>>> >>>>> >>>>> -James Holton >>>>> MAD Scientist >>>>> >>>>> >>>>> On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote: >>>>> >>>>>> hi >>>>>> >>>>>> Recently on a paper I submitted, it was the editor of the journal who >>>>>> wanted exactly the same thing. I never argued with the editor about this >>>>>> (should have maybe), but it could be one cause of the epidemic that Bart >>>>>> Hazes saw.... >>>>>> >>>>>> >>>>>> best regards >>>>>> >>>>>> Marjolein >>>>>> >>>>>> On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote: >>>>>> >>>>>> Dear all, >>>>>>> I got a reviewer comment that indicate the "need to refine the >>>>>>> structures at an appropriate resolution (I/sigmaI of>3.0), and >>>>>>> re-submit the >>>>>>> revised coordinate files to the PDB for validation.". In the manuscript >>>>>>> I >>>>>>> present some crystal structures determined by molecular replacement >>>>>>> using >>>>>>> the same protein in a different space group as search model. Does anyone >>>>>>> know the origin or the theoretical basis of this "I/sigmaI>3.0" rule >>>>>>> for an >>>>>>> appropriate resolution? >>>>>>> Thanks, >>>>>>> Bye, >>>>>>> Roberto. >>>>>>> >>>>>>> >>>>>>> Roberto Battistutta >>>>>>> Associate Professor >>>>>>> Department of Chemistry >>>>>>> University of Padua >>>>>>> via Marzolo 1, 35131 Padova - ITALY >>>>>>> tel. +39.049.8275265/67 >>>>>>> fax. +39.049.8275239 >>>>>>> roberto.battistu...@unipd.it >>>>>>> www.chimica.unipd.it/roberto.battistutta/ >>>>>>> VIMM (Venetian Institute of Molecular Medicine) >>>>>>> via Orus 2, 35129 Padova - ITALY >>>>>>> tel. +39.049.7923236 >>>>>>> fax +39.049.7923250 >>>>>>> www.vimm.it >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>