Phil, I did not know issues that were discussed in this thread and so could not understand your explanation. Thanks to Dale Tronrud's email, which chewed it up for me, I now understand what was going on. I am totally with you on this matter. Actually, I was preaching same things, just calling them with different names: I am not a methods developer and my language is "fool" with working-class jargons... Alex
On Jun 2, 2012, at 11:00 PM, aaleshin wrote: > Could you please give me a reference to the "K & D paper"? Without reading > it, I do not see a problem with Rmerge going to infinity in high resolution > shells. Indeed, I was taught at school that the crystallographic resolution > is defined as a minimal distance between two peaks that can be distinguished > in the electron density map. I was also taught that under "normal conditions" > this would occur when the data are collected up to the shell, in which Rmerge > = 0.5. One can collect more data (up to Rmerge=1.0 or even 100) but the > resolution of the electron density map will not change significantly. > > I solved several structures of my own, and this simple rule worked every > time. It failed only when the diffraction was very anisotropic, because the > resolution was not uniform in all directions. But this obstacle can be > easily overcome by presenting the resolution as a tensor with eigenvalues > defined in the same simple rule (Rmerge = 0.5). > > Now, why such a simple method for estimation of the data resolution should be > abandoned? Is <I/sigI> a much better criterion than Rmerge? Lets look at > the definitions: > > I is measured as a number of detector counts in the reflection minus > background counts. > sigI is measured as sq. root of I plus standard deviation (SD) for the > background plus various deviations from ideal experiment (like noise from > satellite crystals). > Obviously, sigI cannot be measured accurately. Moreover, the 'resolution' is > related to errors in the structural factors, which are average from several > measurements. Errors in their scaling would affect the 'resolution', and > <I/sigI> does not detect them, but Rmerge does! > > Rmerge = < (I - <I>) / n*<I> > where n is the number of measurements for the > same structural factor (data redundancy). When n -> infinity, > Rmerge = < sigI/I >. From my experience, redundancy = 3-4 gives a very good > agreement between Rmerge and <sigI/I>. If <sigI/I> is significantly lower > than > Rmerge, it means that the symmetry related reflections did not merge well. > Under those conditions, Rmerge becomes a much better criterion for estimation > of the 'resolution' than <sigi/I>. > > I AGREE THAT Rmerge=0.5 SHOULD NOT BE A CRITERION FOR DATA TRUNCATION. But we > need a commonly accepted criterion to estimate the resolution, and Rmerge=0.5 > is tested by the time. If someone decides to use <I/sigI> instead of Rmerge, > fine, let it be 2.0. I do not know how it translates into CC... > Alternatively, the resolution could be estimated from the electron density > maps. But we need the commonly accepted rule how to do it, and it should be > related to the old Rmerge=0.5 rule. > > I hope everyone agrees that the resolution should not be dead.. > > Alex > > > > On Jun 1, 2012, at 11:19 AM, Phil Evans wrote: > >> As the K & D paper points out, as the signal/noise declines at higher >> resolution, Rmerge goes up to infinity, so there is no sensible way to set a >> limiting value to determine "resolution". >> >> That is not to say that Rmerge has no use: as you say it's a reasonably good >> metric to plot against image number to detect a problem. It just not a >> suitable metric for deciding resolution >> >> I/sigI is pretty good for this, even though the sigma estimates are not very >> reliable. CC1/2 is probably better since it is independent of sigmas and has >> defined values from 1.0 down to 0.0 as signal/noise decreases. But we should >> be careful of any dogma which says what data we should discard, and what the >> cutoff limits should be: I/sigI > 3,2, or 1? CC1/2 > 0.2, 0.3, 0.5 ...? >> Usually it does not make a huge difference, but why discard useful data? >> Provided the data are properly weighted in refinement by weights >> incorporating observed sigmas (true in Refmac, not true in phenix.refine at >> present I believe), adding extra weak data should do no harm, at least out >> to some point. Program algorithms are improving in their treatment of weak >> data, but are by no means perfect. >> >> One problem as discussed earlier in this thread is that we have got used to >> the idea that nominal resolution is a single number indicating the quality >> of a structure, but this has never been true, irrespective of the cutoff >> method. Apart from the considerable problem of anisotropy, we all need to >> note the wisdom of Ethan Merritt >> >>> "We should also encourage people not to confuse the quality of >>> the data with the quality of the model." >> >> Phil >> >> >> >> On 1 Jun 2012, at 18:59, aaleshin wrote: >> >>> Please excuse my ignorance, but I cannot understand why Rmerge is >>> unreliable for estimation of the resolution? >>> I mean, from a theoretical point of view, <1/sigma> is indeed a better >>> criterion, but it is not obvious from a practical point of view. >>> >>> <1/sigma> depends on a method for sigma estimation, and so same data >>> processed by different programs may have different <1/sigma>. Moreover, >>> HKL2000 allows users to adjust sigmas manually. Rmerge estimates sigmas >>> from differences between measurements of same structural factor, and hence >>> is independent of our preferences. But, it also has a very important >>> ability to validate consistency of the merged data. If my crystal changed >>> during the data collection, or something went wrong with the >>> diffractometer, Rmerge will show it immediately, but <1/sigma> will not. >>> >>> So, please explain why should we stop using Rmerge as a criterion of data >>> resolution? >>> >>> Alex >>> Sanford-Burnham Medical Research Institute >>> 10901 North Torrey Pines Road >>> La Jolla, California 92037 >>> >>> >>> >>> On Jun 1, 2012, at 5:07 AM, Ian Tickle wrote: >>> >>>> On 1 June 2012 03:22, Edward A. Berry <ber...@upstate.edu> wrote: >>>>> Leo will probably answer better than I can, but I would say I/SigI counts >>>>> only >>>>> the present reflection, so eliminating noise by anisotropic truncation >>>>> should >>>>> improve it, raising the average I/SigI in the last shell. >>>> >>>> We always include unmeasured reflections with I/sigma(I) = 0 in the >>>> calculation of the mean I/sigma(I) (i.e. we divide the sum of >>>> I/sigma(I) for measureds by the predicted total no of reflections incl >>>> unmeasureds), since for unmeasureds I is (almost) completely unknown >>>> and therefore sigma(I) is effectively infinite (or at least finite but >>>> large since you do have some idea of what range I must fall in). A >>>> shell with <I/sigma(I)> = 2 and 50% completeness clearly doesn't carry >>>> the same information content as one with the same <I/sigma(I)> and >>>> 100% complete; therefore IMO it's very misleading to quote >>>> <I/sigma(I)> including only the measured reflections. This also means >>>> we can use a single cut-off criterion (we use mean I/sigma(I) > 1), >>>> and we don't need another arbitrary cut-off criterion for >>>> completeness. As many others seem to be doing now, we don't use >>>> Rmerge, Rpim etc as criteria to estimate resolution, they're just too >>>> unreliable - Rmerge is indeed dead and buried! >>>> >>>> Actually a mean value of I/sigma(I) of 2 is highly statistically >>>> significant, i.e. very unlikely to have arisen by chance variations, >>>> and the significance threshold for the mean must be much closer to 1 >>>> than to 2. Taking an average always increases the statistical >>>> significance, therefore it's not valid to compare an _average_ value >>>> of I/sigma(I) = 2 with a _single_ value of I/sigma(I) = 3 (taking 3 >>>> sigma as the threshold of statistical significance of an individual >>>> measurement): that's a case of "comparing apples with pears". In >>>> other words in the outer shell you would need a lot of highly >>>> significant individual values >> 3 to attain an overall average of 2 >>>> since the majority of individual values will be < 1. >>>> >>>>> F/sigF is expected to be better than I/sigI because dx^2 = 2Xdx, >>>>> dx^2/x^2 = 2dx/x, dI/I = 2* dF/F (or approaches that in the limit . . .) >>>> >>>> That depends on what you mean by 'better': every metric must be >>>> compared with a criterion appropriate to that metric. So if we are >>>> comparing I/sigma(I) with a criterion value = 3, then we must compare >>>> F/sigma(F) with criterion value = 6 ('in the limit' of zero I), in >>>> which case the comparison is no 'better' (in terms of information >>>> content) with I than with F: they are entirely equivalent. It's >>>> meaningless to compare F/sigma(F) with the criterion value appropriate >>>> to I/sigma(I): again that's "comparing apples and pears"! >>>> >>>> Cheers >>>> >>>> -- Ian >