Re: [ccp4bb] @Phil-2:Death of Rmerge

aaleshin Sun, 03 Jun 2012 13:45:49 -0700

Phil,
I did not know issues that were discussed in this thread and so could not 
understand your explanation.
Thanks to Dale Tronrud's email, which chewed it up for me, I now understand 
what was going on. I am totally with you on this matter. Actually, I was 
preaching same things, just calling them with different names: I am not a 
methods developer and my language  is "fool" with working-class jargons...
 
Alex


On Jun 2, 2012, at 11:00 PM, aaleshin wrote:

> Could you please give me a reference to the "K & D paper"? Without reading 
> it, I do not see a problem with Rmerge going to infinity in high resolution 
> shells. Indeed, I was taught at school that the crystallographic resolution 
> is defined as a minimal distance between two peaks that can be distinguished 
> in the electron density map. I was also taught that under "normal conditions" 
> this would occur when the data are collected up to the shell, in which Rmerge 
> = 0.5. One can collect more data (up to Rmerge=1.0 or even 100) but the 
> resolution of the electron density map will not change significantly. 
> 
> I solved several structures of my own, and this simple rule worked every 
> time. It failed only when the diffraction was very anisotropic, because the 
> resolution was not uniform in all directions.  But this obstacle can be 
> easily overcome by presenting the resolution as a tensor with eigenvalues  
> defined in the same simple rule (Rmerge = 0.5).
> 
> Now, why such a simple method for estimation of the data resolution should be 
> abandoned? Is <I/sigI>  a much better criterion than Rmerge?  Lets look at 
> the definitions:
> 
> I is measured as a number of detector counts in the reflection minus 
> background counts. 
> sigI is measured as sq. root of I plus standard deviation (SD) for the 
> background plus various deviations from ideal experiment (like noise from 
> satellite crystals). 
> Obviously, sigI cannot be measured accurately. Moreover, the 'resolution' is 
> related to errors in the structural factors, which are  average from several 
> measurements. Errors in their scaling would affect the 'resolution', and 
> <I/sigI> does not detect them, but Rmerge does!
> 
> Rmerge =  < (I - <I>) / n*<I> > where n is the number of measurements for the 
> same structural factor (data redundancy). When n -> infinity, 
> Rmerge = < sigI/I >. From my experience, redundancy = 3-4 gives a very good 
> agreement between Rmerge and <sigI/I>. If <sigI/I> is significantly lower 
> than 
> Rmerge, it means that the symmetry related reflections did not merge well. 
> Under those conditions, Rmerge becomes a much better criterion for estimation 
> of the 'resolution'  than <sigi/I>.
> 
> I AGREE THAT Rmerge=0.5 SHOULD NOT BE A CRITERION FOR DATA TRUNCATION. But we 
> need a commonly accepted criterion to estimate the resolution, and Rmerge=0.5 
> is tested by the time. If someone decides to use <I/sigI> instead of Rmerge, 
> fine, let it be 2.0.  I do not know how it translates into CC...  
> Alternatively, the resolution could be estimated from the electron density 
> maps. But we need the commonly accepted rule how to do it, and it should be 
> related to the old Rmerge=0.5 rule. 
> 
> I hope everyone agrees that the resolution should not be dead..
> 
> Alex
> 
> 
> 
> On Jun 1, 2012, at 11:19 AM, Phil Evans wrote:
> 
>> As the K & D paper points out, as the signal/noise declines at higher 
>> resolution, Rmerge goes up to infinity, so there is no sensible way to set a 
>> limiting value to determine "resolution".
>> 
>> That is not to say that Rmerge has no use: as you say it's a reasonably good 
>> metric to plot against image number to detect a problem. It just not a 
>> suitable metric for deciding resolution
>> 
>> I/sigI is pretty good for this, even though the sigma estimates are not very 
>> reliable. CC1/2 is probably better since it is independent of sigmas and has 
>> defined values from 1.0 down to 0.0 as signal/noise decreases. But we should 
>> be careful of any dogma which says what data we should discard, and what the 
>> cutoff limits should be: I/sigI > 3,2, or 1? CC1/2 > 0.2, 0.3, 0.5 ...? 
>> Usually it does not make a huge difference, but why discard useful data? 
>> Provided the data are properly weighted in refinement by weights 
>> incorporating observed sigmas (true in  Refmac, not true in phenix.refine at 
>> present I believe), adding extra weak data should do no harm, at least out 
>> to some point. Program algorithms are improving in their treatment of weak 
>> data, but are by no means perfect.
>> 
>> One problem as discussed earlier in this thread is that we have got used to 
>> the idea that nominal resolution is a single number indicating the quality 
>> of a structure, but this has never been true, irrespective of the cutoff 
>> method. Apart from the considerable problem of anisotropy, we all need to 
>> note the wisdom of Ethan Merritt
>> 
>>> "We should also encourage people not to confuse the quality of 
>>> the data with the quality of the model."
>> 
>> Phil
>> 
>> 
>> 
>> On 1 Jun 2012, at 18:59, aaleshin wrote:
>> 
>>> Please excuse my ignorance, but I cannot understand why Rmerge is 
>>> unreliable for estimation of the resolution?
>>> I mean, from a theoretical point of view, <1/sigma> is indeed a better 
>>> criterion, but it is not obvious from a practical point of view.
>>> 
>>> <1/sigma> depends on a method for sigma estimation, and so same data 
>>> processed by different programs may have different <1/sigma>. Moreover, 
>>> HKL2000 allows users to adjust sigmas manually. Rmerge estimates sigmas 
>>> from differences between measurements of same structural factor, and hence 
>>> is independent of our preferences.  But, it also has a very important 
>>> ability to validate consistency of the merged data. If my crystal changed 
>>> during the data collection, or something went wrong with the 
>>> diffractometer, Rmerge will show it immediately, but <1/sigma>  will not.
>>> 
>>> So, please explain why should we stop using Rmerge as a criterion of data 
>>> resolution? 
>>> 
>>> Alex
>>> Sanford-Burnham Medical Research Institute
>>> 10901 North Torrey Pines Road
>>> La Jolla, California 92037
>>> 
>>> 
>>> 
>>> On Jun 1, 2012, at 5:07 AM, Ian Tickle wrote:
>>> 
>>>> On 1 June 2012 03:22, Edward A. Berry <ber...@upstate.edu> wrote:
>>>>> Leo will probably answer better than I can, but I would say I/SigI counts
>>>>> only
>>>>> the present reflection, so eliminating noise by anisotropic truncation
>>>>> should
>>>>> improve it, raising the average I/SigI in the last shell.
>>>> 
>>>> We always include unmeasured reflections with I/sigma(I) = 0 in the
>>>> calculation of the mean I/sigma(I) (i.e. we divide the sum of
>>>> I/sigma(I) for measureds by the predicted total no of reflections incl
>>>> unmeasureds), since for unmeasureds I is (almost) completely unknown
>>>> and therefore sigma(I) is effectively infinite (or at least finite but
>>>> large since you do have some idea of what range I must fall in).  A
>>>> shell with <I/sigma(I)> = 2 and 50% completeness clearly doesn't carry
>>>> the same information content as one with the same <I/sigma(I)> and
>>>> 100% complete; therefore IMO it's very misleading to quote
>>>> <I/sigma(I)> including only the measured reflections.  This also means
>>>> we can use a single cut-off criterion (we use mean I/sigma(I) > 1),
>>>> and we don't need another arbitrary cut-off criterion for
>>>> completeness.  As many others seem to be doing now, we don't use
>>>> Rmerge, Rpim etc as criteria to estimate resolution, they're just too
>>>> unreliable - Rmerge is indeed dead and buried!
>>>> 
>>>> Actually a mean value of I/sigma(I) of 2 is highly statistically
>>>> significant, i.e. very unlikely to have arisen by chance variations,
>>>> and the significance threshold for the mean must be much closer to 1
>>>> than to 2.  Taking an average always increases the statistical
>>>> significance, therefore it's not valid to compare an _average_ value
>>>> of I/sigma(I) = 2 with a _single_ value of I/sigma(I) = 3 (taking 3
>>>> sigma as the threshold of statistical significance of an individual
>>>> measurement): that's a case of "comparing apples with pears".  In
>>>> other words in the outer shell you would need a lot of highly
>>>> significant individual values >> 3 to attain an overall average of 2
>>>> since the majority of individual values will be < 1.
>>>> 
>>>>> F/sigF is expected to be better than I/sigI because dx^2 = 2Xdx,
>>>>> dx^2/x^2 = 2dx/x, dI/I = 2* dF/F  (or approaches that in the limit . . .)
>>>> 
>>>> That depends on what you mean by 'better': every metric must be
>>>> compared with a criterion appropriate to that metric. So if we are
>>>> comparing I/sigma(I) with a criterion value = 3, then we must compare
>>>> F/sigma(F) with criterion value = 6 ('in the limit' of zero I), in
>>>> which case the comparison is no 'better' (in terms of information
>>>> content) with I than with F: they are entirely equivalent.  It's
>>>> meaningless to compare F/sigma(F) with the criterion value appropriate
>>>> to I/sigma(I): again that's "comparing apples and pears"!
>>>> 
>>>> Cheers
>>>> 
>>>> -- Ian
>

Re: [ccp4bb] @Phil-2:Death of Rmerge

Reply via email to