Re: [ccp4bb] Rmergicide Through Programming

Kay Diederichs Fri, 07 Jul 2017 12:13:46 -0700

James,

I cannot follow you. "n approaches 1" can only mean n = 2 because n is integer. 
And for n=2 the sqrt(n/(n-1)) factor is well-defined. For n=1, neither 
contributions to Rmeas nor Rmerge nor to any other precision indicator can be 
calculated anyway, because there's nothing this measurement can be compared 
against.


just my 2 cents,

Kay

On Fri, 7 Jul 2017 10:57:17 -0700, James Holton <jmhol...@slac.stanford.edu> 
wrote:

>I happen to be one of those people who think Rmerge is a very useful 
>statistic.  Not as a method of evaluating the resolution limit, which is 
>mathematically ridiculous, but for a host of other important things, 
>like evaluating the performance of data collection equipment, and 
>evaluating the isomorphism of different crystals, to name a few.
>
>I like Rmerge because it is a simple statistic that has a simple formula 
>and has not undergone any "corrections".  Corrections increase 
>complexity, and complexity opens the door to manipulation by the 
>desperate and/or misguided.  For example, overzealous outlier rejection 
>is a common way to abuse R factors, and it is far too often swept under 
>the rug, sometimes without the user even knowing about it.  This is 
>especially problematic when working in a regime where the statistic of 
>interest is unstable, and for R factors this is low intensity data.  
>Rejecting just the right "outliers" can make any R factor look a lot 
>better.  Why would Rmeas be any more unstable than Rmerge?  Look at the 
>formula. There is an "n-1" in the denominator, where n is the 
>multiplicity.  So, what happens when n approaches 1 ?  What happens when 
>n=1? This is not to say Rmerge is better than Rmeas. In fact, I believe 
>the latter is generally superior to the first, unless you are working 
>near n = 1. The sqrt(n/(n-1)) is trying to correct for bias in the R 
>statistic, but fighting one infinity with another infinity is a 
>dangerous game.
>
>My point is that neither Rmerge nor Rmeas are easily interpreted without 
>knowing the multiplicity.  If you see Rmeas = 10% and the multiplicity 
>is 10, then you know what that means.  Same for Rmerge, since at n=10 
>both stats have nearly the same value.  But if you have Rmeas = 45% and 
>multiplicity = 1.05, what does that mean?  Rmeas will be only 33% if the 
>multiplicity is rounded up to 1.1. This is what I mean by "numerical 
>instability", the value of the R statistic itself becomes sensitive to 
>small amounts of noise, and behaves more and more like a random number 
>generator. And if you have Rmeas = 33% and no indication of 
>multiplicity, it is hard to know what is going on.  I personally am a 
>lot more comfortable seeing qualitative agreement between Rmerge and 
>Rmeas, because that means the numerical instability of the multiplicity 
>correction didn't mess anything up.
>
>Of course, when the intensity is weak R statistics in general are not 
>useful.  Both Rmeas and Rmerge have the sum of all intensities in the 
>denominator, so when the bin-wide sum approaches zero you have another 
>infinity to contend with.  This one starts to rear its ugly head once 
>I/sigma drops below about 3, and this is why our ancestors always 
>applied a sigma cutoff before computing an R factor.  Our small-molecule 
>colleagues still do this!  They call it "R1".  And it is an excellent 
>indicator of the overall relative error.  The relative error in the 
>outermost bin is not meaningful, and strangely enough nobody ever 
>reported the outer-resolution Rmerge before 1995.
>
>For weak signals, Correlation Coefficients are better, but for strong 
>signals CC pegs out at >95%, making it harder to see relative errors.  
>I/sigma is what we'd like to know, but the value of "sigma" is still 
>prone to manipulation by not just outlier rejection, but massaging the 
>so-called "error model".  Suffice it to say, crystallographic data 
>contain more than one type of error.  Some sources are important for 
>weak spots, others are important for strong spots, and still others are 
>only apparent in the mid-range.  Some sources of error are only 
>important at low multiplicity, and others only manifest at high 
>multiplicity. There is no single number that can be used to evaluate all 
>aspects of data quality.
>
>So, I remain a champion of reporting Rmerge.  Not in the high-angle bin, 
>because that is essentially a random number, but overall Rmerge and 
>low-angle-bin Rmerge next to multiplicity, Rmeas, CC1/2 and other 
>statistics is the only way you can glean enough information about where 
>the errors are coming from in the data.  Rmeas is a useful addition 
>because it helps us correct for multiplicity without having to do math 
>in our head.  Users generally thank you for that. Rmerge, however, has 
>served us well for more than half a century, and I believe Uli Arndt 
>knew what he was doing.  I hope we all know enough about history to 
>realize that future generations seldom thank their ancestors for 
>"protecting" them from information.
>
>-James Holton
>MAD Scientist
>
>
>On 7/5/2017 10:36 AM, Graeme Winter wrote:
>> Frank,
>>
>> you are asking me to remove features that I like, so I would feel that the 
>> challenge is for you to prove that this is harmful however:
>>
>>   - at the minimum, I find it a useful check sum that the stats are 
>> internally consistent (though I interpret it for lots of other reasons too)
>>   - it is faulty I agree, but (with caveats) still useful IMHO
>>
>> Sorry for being terse, but I remain to be convinced that removing it 
>> increases the amount of information
>>
>> CC’ing BB as requested
>>
>> Best wishes Graeme
>>
>>
>>> On 5 Jul 2017, at 17:17, Frank von Delft <frank.vonde...@sgc.ox.ac.uk> 
>>> wrote:
>>>
>>> You keep not answering the challenge.
>>>
>>> It's really simple:  what information does Rmerge provide that Rmeas 
>>> doesn't.
>>>
>>> (If you answer, email to the BB.)
>>>
>>>
>>> On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote:
>>>> Dear Frank,
>>>>
>>>> You are forcefully arguing essentially that others are wrong if we feel an 
>>>> existing statistic continues to be useful, and instead insist that it be 
>>>> outlawed so that we may not make use of it, just in case someone 
>>>> misinterprets it.
>>>>
>>>> Very well
>>>>
>>>> I do however express disquiet that we as software developers feel 
>>>> browbeaten to remove the output we find useful because “the community” 
>>>> feel that it is obsolete.
>>>>
>>>> I feel that Jacob’s short story on this thread illustrates that educating 
>>>> the next generation of crystallographers to understand what all of the 
>>>> numbers mean is critical, and that a numerological approach of trying to 
>>>> optimise any one statistic is essentially doomed. Precisely the same 
>>>> argument could be made for people cutting the “resolution” at the wrong 
>>>> place in order to improve the average I/sig(I) of the data set.
>>>>
>>>> Denying access to information is not a solution to misinterpretation, from 
>>>> where I am sat, however I acknowledge that other points of view exist.
>>>>
>>>> Best wishes Graeme
>>>>
>>>>
>>>> On 5 Jul 2017, at 12:11, Frank von Delft 
>>>> <frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>> wrote:
>>>>
>>>>
>>>> Graeme, Andrew
>>>>
>>>> Jacob is not arguing against an R-based statistic;  he's pointing out that 
>>>> leaving out the multiplicity-weighting is prehistoric (Diederichs & 
>>>> Karplus published it 20 years ago!).
>>>>
>>>> So indeed:   Rmerge, Rpim and I/sigI give different information.  As you 
>>>> say.
>>>>
>>>> But no:   Rmerge and Rmeas and Rcryst do NOT give different information.  
>>>> Except:
>>>>
>>>>    * Rmerge is a (potentially) misleading version of Rmeas.
>>>>
>>>>    * Rcryst and Rmerge and Rsym are terms that no longer have significance 
>>>> in the single cryo-dataset world.
>>>>
>>>> phx.
>>>>
>>>>
>>>>
>>>> On 05/07/2017 09:43, Andrew Leslie wrote:
>>>>
>>>> I would like to support Graeme in his wish to retain Rmerge in Table 1, 
>>>> essentially for exactly the same reasons.
>>>>
>>>> I also strongly support Francis Reyes comment about the usefulness of 
>>>> Rmerge at low resolution, and I would add to his list that it can also, in 
>>>> some circumstances, be more indicative of the wrong choice of symmetry 
>>>> (too high) than the statistics that come from POINTLESS (excellent though 
>>>> that program is!).
>>>>
>>>> Andrew
>>>> On 5 Jul 2017, at 05:44, Graeme Winter 
>>>> <graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote:
>>>>
>>>> HI Jacob
>>>>
>>>> Yes, I got this - and I appreciate the benefit of Rmeas for dealing with 
>>>> measuring agreement for small-multiplicity observations. Having this *as 
>>>> well* is very useful and I agree Rmeas / Rpim / CC-half should be the 
>>>> primary “quality” statistics.
>>>>
>>>> However, you asked if there is any reason to *keep* rather than 
>>>> *eliminate* Rmerge, and I offered one :o)
>>>>
>>>> I do not see what harm there is reporting Rmerge, even if it is just used 
>>>> in the inner shell or just used to capture a flavour of the data set 
>>>> overall. I also appreciate that Rmeas converges to the same value for 
>>>> large multiplicity i.e.:
>>>>
>>>>                                             Overall  InnerShell  OuterShell
>>>> Low resolution limit                       39.02     39.02      1.39
>>>> High resolution limit                       1.35      6.04      1.35
>>>>
>>>> Rmerge  (within I+/I-)                     0.080     0.057     2.871
>>>> Rmerge  (all I+ and I-)                    0.081     0.059     2.922
>>>> Rmeas (within I+/I-)                       0.081     0.058     2.940
>>>> Rmeas (all I+ & I-)                        0.082     0.059     2.958
>>>> Rpim (within I+/I-)                        0.013     0.009     0.628
>>>> Rpim (all I+ & I-)                         0.009     0.007     0.453
>>>> Rmerge in top intensity bin                0.050        -         -
>>>> Total number of observations             1265512     16212     53490
>>>> Total number unique                        17515       224      1280
>>>> Mean((I)/sd(I))                             29.7     104.3       1.5
>>>> Mn(I) half-set correlation CC(1/2)         1.000     1.000     0.778
>>>> Completeness                               100.0      99.7     100.0
>>>> Multiplicity                                72.3      72.4      41.8
>>>>
>>>> Anomalous completeness                     100.0     100.0     100.0
>>>> Anomalous multiplicity                      37.2      42.7      21.0
>>>> DelAnom correlation between half-sets      0.497     0.766    -0.026
>>>> Mid-Slope of Anom Normal Probability       1.039       -         -
>>>>
>>>> (this is a good case for Rpim & CC-half as resolution limit criteria)
>>>>
>>>> If the statistics you want to use are there & some others also, what is 
>>>> the pressure to remove them? Surely we want to educate on how best to 
>>>> interpret the entire table above to get a fuller picture of the overall 
>>>> quality of the data? My 0th-order request would be to publish the three 
>>>> shells as above ;o)
>>>>
>>>> Cheers Graeme
>>>>
>>>>
>>>>
>>>> On 4 Jul 2017, at 22:09, Keller, Jacob 
>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>
>>>> I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. Rmeas is 
>>>> simply (Rmerge * sqrt(n/n-1)) where n is the number of measurements of 
>>>> that reflection. It's merely a way of correcting for the 
>>>> multiplicity-related artifact of Rmerge, which is becoming even more of a 
>>>> problem with data sets of increasing variability in multiplicity. Consider 
>>>> the case of comparing a data set with a multiplicity of 2 versus one of 
>>>> 100: equivalent data quality would yield Rmerges diverging by a factor of 
>>>> ~1.4. But this has all been covered before in several papers. It can be 
>>>> and is reported in resolution bins, so can used exactly as you say. So, 
>>>> why not "disappear" Rmerge from the software?
>>>>
>>>> The only reason I could come up with for keeping it is historical reasons 
>>>> or comparisons to previous datasets, but anyway those comparisons would be 
>>>> confounded by variabities in multiplicity and a hundred other things, so 
>>>> come on, developers, just comment it out!
>>>>
>>>> JPK
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac.uk> 
>>>> [mailto:graeme.win...@diamond.ac.uk]
>>>> Sent: Tuesday, July 04, 2017 4:37 PM
>>>> To: Keller, Jacob 
>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>>
>>>> Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk>
>>>> Subject: Re: [ccp4bb] Rmergicide Through Programming
>>>>
>>>> HI Jacob
>>>>
>>>> Unbiased estimate of the true unmerged I/sig(I) of your data (I find this 
>>>> particularly useful at low resolution) i.e. if your inner shell Rmerge is 
>>>> 10% your data agree very poorly; if 2% says your data agree very well 
>>>> provided you have sensible multiplicity… obviously depends on sensible 
>>>> interpretation. Rpim hides this (though tells you more about the quality 
>>>> of average measurement)
>>>>
>>>> Essentially, for I/sig(I) you can (by and large) adjust your sig(I) values 
>>>> however you like if you were so inclined. You can only adjust Rmerge by 
>>>> excluding measurements.
>>>>
>>>> I would therefore defend that - amongst the other stats you enumerate 
>>>> below - it still has a place
>>>>
>>>> Cheers Graeme
>>>>
>>>> On 4 Jul 2017, at 14:10, Keller, Jacob 
>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>
>>>> Rmerge does contain information which complements the others.
>>>>
>>>> What information? I was trying to think of a counterargument to what I 
>>>> proposed, but could not think of a reason in the world to keep reporting 
>>>> it.
>>>>
>>>> JPK
>>>>
>>>>
>>>> On 4 Jul 2017, at 12:00, Keller, Jacob 
>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>>
>>>>  wrote:
>>>>
>>>> Dear Crystallographers,
>>>>
>>>> Having been repeatedly chagrinned about the continued use and reporting of 
>>>> Rmerge rather than Rmeas or similar, I thought of a potential way to 
>>>> promote the change: what if merging programs would completely omit 
>>>> Rmerge/cryst/sym? Is there some reason to continue to report these stats, 
>>>> or are they just grandfathered into the software? I doubt that any journal 
>>>> or crystallographer would insist on reporting Rmerge per se. So, I wonder 
>>>> what developers would think about commenting out a few lines of their 
>>>> code, seeing what happens? Maybe a comment to the effect of "Rmerge is now 
>>>> deprecated; use Rmeas" would be useful as well. Would something 
>>>> catastrophic happen?
>>>>
>>>> All the best,
>>>>
>>>> Jacob Keller
>>>>
>>>> *******************************************
>>>> Jacob Pearson Keller, PhD
>>>> Research Scientist
>>>> HHMI Janelia Research Campus / Looger lab
>>>> Phone: (571)209-4000 x3159
>>>> Email: 
>>>> kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>
>>>> *******************************************
>>>>
>>>>
>>>> --
>>>> This e-mail and any attachments may contain confidential, copyright and or 
>>>> privileged material, and are for the use of the intended addressee only. 
>>>> If you are not the intended addressee or an authorised recipient of the 
>>>> addressee please notify us of receipt by returning the e-mail and do not 
>>>> use, copy, retain, distribute or disclose the information in or attached 
>>>> to the e-mail.
>>>> Any opinions expressed within this e-mail are those of the individual and 
>>>> not necessarily of Diamond Light Source Ltd.
>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>>> attachments are free from viruses and we cannot accept liability for any 
>>>> damage which you may sustain as a result of software viruses which may be 
>>>> transmitted in or with the message.
>>>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>>>> and Wales with its registered office at Diamond House, Harwell Science and 
>>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: [ccp4bb] Rmergicide Through Programming

Reply via email to