Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming

Vellieux Frédéric Mon, 10 Jul 2017 23:16:35 -0700

Hello,

I think this needs a little bit of crystarchaeology.


Rmerge and Rsym used to be different. This was at a time when data sets were 
typically collected from several crystals. Pre-cryo cooling, with data recorded 
on photographic film (Arndt-Wonacott cameras).

Rmerge = agreement R-factor from data from several crystals;
Rsym = agreement R-factor from symmetry-equivalents within one crystal.

[I just type "agreement R-factor" in order not to have to type the formulae]

At that time, people were confused about these two terms.

Nowadays both are (used as) synonyms.

Cheers,

Fred.

-----Original Message-----
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Phil Evans
Sent: Monday, July 10, 2017 5:43 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming

What is the difference between Rmerge and Rsym - I thought they were the same?
Rrim == Rmeas I think

Phil



> On 10 Jul 2017, at 15:18, John Berrisford <j...@ebi.ac.uk> wrote:
>
> Dear Herman
>
> The new PDB deposition system (OneDep) allows you to enter values for Rmerge, 
> Rsym, Rpim, Rrim and / or CC half. If, during deposition, you do not provide 
> a value for any of these metrics then we will ask you for a value for one of 
> them.
>
> Also, PDB format is a legacy format for the PDB. In 2014 mmCIF became the 
> archive format for the PDB and some large entries are no longer distributed 
> in PDB format. mmCIF is not limited by the constraints of punch cards.
>
> Please see
> https://www.wwpdb.org/documentation/file-formats-and-the-pdb
>
> Regards
>
> John
>
> PDBe
>
>
>
> On 10/07/2017 09:26, herman.schreu...@sanofi.com wrote:
>> Dear All,
>>
>> For me this whole discussion is an example of a large number of people 
>> barking at the wrong tree. The real issue is not whether data processing 
>> programs print amongst many quality indicators an Rmerge as well, but the 
>> fact that the PDB and many journals still insist on using the Rmerge as 
>> primary quality indicator. As long as this is true, novice scientist might 
>> be led to believe that Rmerge is the most important quality indicator. As 
>> soon as the PDB and the journals request some other indicator, this will be 
>> over. So that is where we should direct our efforts to.
>>
>> I don't understand at all, why the PDB still insists on an obsolete quality 
>> indicator. However, the PDB format for the coordinates also dates back to 
>> the 1960's to be used with punch cards.
>>
>> My 2 cents.
>> Herman
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] Im Auftrag
>> von Edward A. Berry
>> Gesendet: Samstag, 8. Juli 2017 22:31
>> An: CCP4BB@JISCMAIL.AC.UK
>> Betreff: Re: [ccp4bb] Rmergicide Through Programming
>>
>> But R-merge is not really narrower as a fraction of the mean value- it just 
>> gets smaller proportionantly as all the numbers get smaller:
>> RMSD of .0043 for R-meas multiplied by factor of 0.022/.027 gives 0.0035 
>> which is the RMSD for Rmerge. The same was true in the previous example. You 
>> could multiply R-meas by .5 or .2 and get a sharper distribution yet! And 
>> that factor would be constant, where this only applies for super-low 
>> redundancy.
>>
>> On 07/08/2017 03:23 PM, James Holton wrote:
>>> The expected distribution of Rmeas values is still wider than that of 
>>> Rmerge for data with I/sigma=30 and average multiplicity=2.0. Graph 
>>> attached.
>>>
>>> I expect that anytime you incorporate more than one source of information 
>>> you run the risk of a noisier statistic because every source of information 
>>> can contain noise.  That is, Rmeas combines information about multiplicity 
>>> with the absolute deviates in the data to form a statistic that is more 
>>> accurate that Rmerge, but also (potentially) less precise.
>>>
>>> Perhaps that is what we are debating here?  Which is better? accuracy or 
>>> precision?  Personally, I prefer to know both.
>>>
>>> -James Holton
>>> MAD Scientist
>>>
>>> On 7/8/2017 11:02 AM, Frank von Delft wrote:
>>>> It is quite easy to end up with low multiplicities in the low resolution 
>>>> shell, especially for low symmetry and fast-decaying crystals.
>>>>
>>>> It is this scenario where Rmerge (lowres) is more misleading than Reas.
>>>>
>>>> phx
>>>>
>>>>
>>>> On 08/07/2017 17:31, James Holton wrote:
>>>>> What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
>>>>> multiplicity?
>>>>>
>>>>> -James Holton
>>>>> MAD Scientist
>>>>>
>>>>> On 7/8/2017 9:15 AM, Frank von Delft wrote:
>>>>>> Anyway, back to reality:  does anybody still use R statistics to 
>>>>>> evaluate anything other than /strong/ data?  Certainly I never look at 
>>>>>> it except for the low-resolution bin (or strongest reflections). 
>>>>>> Specifically, a "2%-dataset" in that bin is probably healthy, while a 
>>>>>> "9%-dataset" probably Has Issues.
>>>>>>
>>>>>> In which case, back to Jacob's question:  what does Rmerge tell us that 
>>>>>> Rmeas doesn't.
>>>>>>
>>>>>> phx
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 08/07/2017 17:02, James Holton wrote:
>>>>>>> Sorry for the confusion.  I was going for brevity!  And failed.
>>>>>>>
>>>>>>> I know that the multiplicity correction is applied on a per-hkl basis 
>>>>>>> in the calculation of Rmeas.  However, the average multiplicity over 
>>>>>>> the whole calculation is most likely not an integer. Some hkls may be 
>>>>>>> observed twice while others only once, or perhaps 3-4 times in the same 
>>>>>>> scaling run.
>>>>>>>
>>>>>>> Allow me to do the error propagation properly.  Consider the scenario:
>>>>>>>
>>>>>>> Your outer resolution bin has a true I/sigma = 1.00 and average 
>>>>>>> multiplicity of 2.0. Let's say there are 100 hkl indices in this bin.  
>>>>>>> I choose the "true" intensities of each hkl from an exponential (aka 
>>>>>>> Wilson) distribution. Further assume the background is high, so the 
>>>>>>> error in each observation after background subtraction may be taken 
>>>>>>> from a Gaussian distribution. Let's further choose the per-hkl 
>>>>>>> multiplicity from a Poisson distribution with expectation value 2.0, so 
>>>>>>> 0 is possible, but the long-term average multiplicity is 2.0. For R 
>>>>>>> calculation, when multiplicity of any given hkl is less than 2 it is 
>>>>>>> skipped. What I end up with after 120,000 trials is a distribution of 
>>>>>>> values for each R factor.  See attached graph.
>>>>>>>
>>>>>>> What I hope is readily apparent is that the distribution of
>>>>>>> Rmerge values is taller and sharper than that of the Rmeas values.  The 
>>>>>>> most likely Rmeas is 80% and that of Rmerge is 64.6%.  This is 
>>>>>>> expected, of course.  But what I hope to impress upon you is that the 
>>>>>>> most likely value is not generally the one that you will get! The 
>>>>>>> distribution has a width.  Specifically, Rmeas could be as low as 40%, 
>>>>>>> or as high as 209%, depending on the trial.  Half of the trial results 
>>>>>>> falling between 71.4% and 90.3%, a range of 19 percentage points.  
>>>>>>> Rmerge has a middle-half range from 57.6% to 72.9% (15.3 percentage 
>>>>>>> points).  This range of possible values of Rmerge or Rmeas from data 
>>>>>>> with the same intrinsic quality is what I mean when I say "numerical 
>>>>>>> instability".  Each and every trial had the same true I/sigma and 
>>>>>>> multiplicity, and yet the R factors I get vary depending on the trial.  
>>>>>>> Unfortunately for most of us with real data, you only ever get one 
>>>>>>> trial, and you can't predict which Rmeas or Rmerge you'll get.
>>>>>>>
>>>>>>> My point here is that R statistics in general are not comparable from 
>>>>>>> experiment to experiment when you are looking at data with low average 
>>>>>>> intensity and low multiplicity, and it appears that Rmeas is less 
>>>>>>> stable than Rmerge.  Not by much, mind you, but still jumps around more.
>>>>>>>
>>>>>>> Hope that is clearer?
>>>>>>>
>>>>>>> Note that in no way am I suggesting that low-multiplicity is the right 
>>>>>>> way to collect data.  Far from it.  Especially with modern detectors 
>>>>>>> that have negligible read-out noise. But when micro crystals only give 
>>>>>>> off a handful of photons each before they die, low multiplicity might 
>>>>>>> be all you have.
>>>>>>>
>>>>>>> -James Holton
>>>>>>> MAD Scientist
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 7/7/2017 2:33 PM, Edward A. Berry wrote:
>>>>>>>> I think the confusion here is that the "multiplicity correction"
>>>>>>>> is applied on each reflection, where it will be an integer 2 or
>>>>>>>> greater (can't estimate variance with only one measurement).
>>>>>>>> You can only correct in an approximate way using using the
>>>>>>>> average multiplicity of the dataset, since it would depend on the 
>>>>>>>> distribution of multiplicity over the reflections.
>>>>>>>>
>>>>>>>> And the correction is for r-merge. You don't need to apply a
>>>>>>>> correction to R-meas.
>>>>>>>> R-meas is a redundancy-independent best estimate of the variance.
>>>>>>>> Whatever you would have used R-merge for (hopefully taking
>>>>>>>> allowance for the multiplicity) you can use R-meas and not worry about 
>>>>>>>> multiplicity.
>>>>>>>> Again, what information does R-merge provide that R-meas does
>>>>>>>> not provide in a more accurate way?
>>>>>>>>
>>>>>>>> According to the denso manual, one way to artificially reduce
>>>>>>>> R-merge is to include reflections with only one measure
>>>>>>>> (averaging in a lot of zero's always helps bring an average
>>>>>>>> down), and they say there were actually some programs that did
>>>>>>>> that. However I'm quite sure none of the ones we rely on today do that.
>>>>>>>>
>>>>>>>> On 07/07/2017 03:12 PM, Kay Diederichs wrote:
>>>>>>>>> James,
>>>>>>>>>
>>>>>>>>> I cannot follow you. "n approaches 1" can only mean n = 2 because n 
>>>>>>>>> is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. For 
>>>>>>>>> n=1, neither contributions to Rmeas nor Rmerge nor to any other 
>>>>>>>>> precision indicator can be calculated anyway, because there's nothing 
>>>>>>>>> this measurement can be compared against.
>>>>>>>>>
>>>>>>>>> just my 2 cents,
>>>>>>>>>
>>>>>>>>> Kay
>>>>>>>>>
>>>>>>>>> On Fri, 7 Jul 2017 10:57:17 -0700, James Holton 
>>>>>>>>> <jmhol...@slac.stanford.edu> wrote:
>>>>>>>>>
>>>>>>>>>> I happen to be one of those people who think Rmerge is a very
>>>>>>>>>> useful statistic.  Not as a method of evaluating the
>>>>>>>>>> resolution limit, which is mathematically ridiculous, but for
>>>>>>>>>> a host of other important things, like evaluating the
>>>>>>>>>> performance of data collection equipment, and evaluating the 
>>>>>>>>>> isomorphism of different crystals, to name a few.
>>>>>>>>>>
>>>>>>>>>> I like Rmerge because it is a simple statistic that has a
>>>>>>>>>> simple formula and has not undergone any "corrections".
>>>>>>>>>> Corrections increase complexity, and complexity opens the
>>>>>>>>>> door to manipulation by the desperate and/or misguided.  For
>>>>>>>>>> example, overzealous outlier rejection is a common way to
>>>>>>>>>> abuse R factors, and it is far too often swept under the rug,
>>>>>>>>>> sometimes without the user even knowing about it. This is
>>>>>>>>>> especially problematic when working in a regime where the statistic 
>>>>>>>>>> of interest is unstable, and for R factors this is low intensity 
>>>>>>>>>> data.
>>>>>>>>>> Rejecting just the right "outliers" can make any R factor
>>>>>>>>>> look a lot better.  Why would Rmeas be any more unstable than
>>>>>>>>>> Rmerge? Look at the formula. There is an "n-1" in the
>>>>>>>>>> denominator, where n is the multiplicity.  So, what happens
>>>>>>>>>> when n approaches 1 ? What happens when n=1? This is not to
>>>>>>>>>> say Rmerge is better than Rmeas. In fact, I believe the
>>>>>>>>>> latter is generally superior to the first, unless you are
>>>>>>>>>> working near n = 1. The sqrt(n/(n-1)) is trying to correct
>>>>>>>>>> for bias in the R statistic, but fighting one infinity with another 
>>>>>>>>>> infinity is a dangerous game.
>>>>>>>>>>
>>>>>>>>>> My point is that neither Rmerge nor Rmeas are easily
>>>>>>>>>> interpreted without knowing the multiplicity.  If you see
>>>>>>>>>> Rmeas = 10% and the multiplicity is 10, then you know what
>>>>>>>>>> that means.  Same for Rmerge, since at n=10 both stats have
>>>>>>>>>> nearly the same value.  But if you have Rmeas = 45% and
>>>>>>>>>> multiplicity = 1.05, what does that mean?  Rmeas will be only
>>>>>>>>>> 33% if the multiplicity is rounded up to 1.1. This is what I
>>>>>>>>>> mean by "numerical instability", the value of the R statistic
>>>>>>>>>> itself becomes sensitive to small amounts of noise, and
>>>>>>>>>> behaves more and more like a random number generator. And if
>>>>>>>>>> you have Rmeas = 33% and no indication of multiplicity, it is
>>>>>>>>>> hard to know what is going on.  I personally am a lot more
>>>>>>>>>> comfortable seeing qualitative agreement between Rmerge and Rmeas, 
>>>>>>>>>> because that means the numerical instability of the multiplicity 
>>>>>>>>>> correction didn't mess anything up.
>>>>>>>>>>
>>>>>>>>>> Of course, when the intensity is weak R statistics in general
>>>>>>>>>> are not useful.  Both Rmeas and Rmerge have the sum of all
>>>>>>>>>> intensities in the denominator, so when the bin-wide sum
>>>>>>>>>> approaches zero you have another infinity to contend with.
>>>>>>>>>> This one starts to rear its ugly head once I/sigma drops
>>>>>>>>>> below about 3, and this is why our ancestors always applied a
>>>>>>>>>> sigma cutoff before computing an R factor. Our small-molecule
>>>>>>>>>> colleagues still do this!  They call it "R1".  And it is an
>>>>>>>>>> excellent indicator of the overall relative error.  The
>>>>>>>>>> relative error in the outermost bin is not meaningful, and strangely 
>>>>>>>>>> enough nobody ever reported the outer-resolution Rmerge before 1995.
>>>>>>>>>>
>>>>>>>>>> For weak signals, Correlation Coefficients are better, but
>>>>>>>>>> for strong signals CC pegs out at >95%, making it harder to see 
>>>>>>>>>> relative errors.
>>>>>>>>>> I/sigma is what we'd like to know, but the value of "sigma"
>>>>>>>>>> is still prone to manipulation by not just outlier rejection,
>>>>>>>>>> but massaging the so-called "error model".  Suffice it to
>>>>>>>>>> say, crystallographic data contain more than one type of error.
>>>>>>>>>> Some sources are important for weak spots, others are
>>>>>>>>>> important for strong spots, and still others are only
>>>>>>>>>> apparent in the mid-range.  Some sources of error are only
>>>>>>>>>> important at low multiplicity, and others only manifest at high 
>>>>>>>>>> multiplicity.
>>>>>>>>>> There is no single number that can be used to evaluate all aspects 
>>>>>>>>>> of data quality.
>>>>>>>>>>
>>>>>>>>>> So, I remain a champion of reporting Rmerge.  Not in the
>>>>>>>>>> high-angle bin, because that is essentially a random number,
>>>>>>>>>> but overall Rmerge and low-angle-bin Rmerge next to
>>>>>>>>>> multiplicity, Rmeas, CC1/2 and other statistics is the only
>>>>>>>>>> way you can glean enough information about where the errors
>>>>>>>>>> are coming from in the data.  Rmeas is a useful addition
>>>>>>>>>> because it helps us correct for multiplicity without having
>>>>>>>>>> to do math in our head.  Users generally thank you for that.
>>>>>>>>>> Rmerge, however, has served us well for more than half a
>>>>>>>>>> century, and I believe Uli Arndt knew what he was doing.  I
>>>>>>>>>> hope we all know enough about history to realize that future 
>>>>>>>>>> generations seldom thank their ancestors for "protecting" them from 
>>>>>>>>>> information.
>>>>>>>>>>
>>>>>>>>>> -James Holton
>>>>>>>>>> MAD Scientist
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7/5/2017 10:36 AM, Graeme Winter wrote:
>>>>>>>>>>> Frank,
>>>>>>>>>>>
>>>>>>>>>>> you are asking me to remove features that I like, so I would feel 
>>>>>>>>>>> that the challenge is for you to prove that this is harmful however:
>>>>>>>>>>>
>>>>>>>>>>>    - at the minimum, I find it a useful check sum that the stats 
>>>>>>>>>>> are internally consistent (though I interpret it for lots of other 
>>>>>>>>>>> reasons too)
>>>>>>>>>>>    - it is faulty I agree, but (with caveats) still useful
>>>>>>>>>>> IMHO
>>>>>>>>>>>
>>>>>>>>>>> Sorry for being terse, but I remain to be convinced that
>>>>>>>>>>> removing it increases the amount of information
>>>>>>>>>>>
>>>>>>>>>>> CC’ing BB as requested
>>>>>>>>>>>
>>>>>>>>>>> Best wishes Graeme
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 5 Jul 2017, at 17:17, Frank von Delft 
>>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> You keep not answering the challenge.
>>>>>>>>>>>>
>>>>>>>>>>>> It's really simple:  what information does Rmerge provide that 
>>>>>>>>>>>> Rmeas doesn't.
>>>>>>>>>>>>
>>>>>>>>>>>> (If you answer, email to the BB.)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote:
>>>>>>>>>>>>> Dear Frank,
>>>>>>>>>>>>>
>>>>>>>>>>>>> You are forcefully arguing essentially that others are wrong if 
>>>>>>>>>>>>> we feel an existing statistic continues to be useful, and instead 
>>>>>>>>>>>>> insist that it be outlawed so that we may not make use of it, 
>>>>>>>>>>>>> just in case someone misinterprets it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Very well
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do however express disquiet that we as software developers feel 
>>>>>>>>>>>>> browbeaten to remove the output we find useful because “the 
>>>>>>>>>>>>> community” feel that it is obsolete.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I feel that Jacob’s short story on this thread illustrates that 
>>>>>>>>>>>>> educating the next generation of crystallographers to understand 
>>>>>>>>>>>>> what all of the numbers mean is critical, and that a 
>>>>>>>>>>>>> numerological approach of trying to optimise any one statistic is 
>>>>>>>>>>>>> essentially doomed. Precisely the same argument could be made for 
>>>>>>>>>>>>> people cutting the “resolution” at the wrong place in order to 
>>>>>>>>>>>>> improve the average I/sig(I) of the data set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Denying access to information is not a solution to 
>>>>>>>>>>>>> misinterpretation, from where I am sat, however I acknowledge 
>>>>>>>>>>>>> that other points of view exist.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best wishes Graeme
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5 Jul 2017, at 12:11, Frank von Delft 
>>>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Graeme, Andrew
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jacob is not arguing against an R-based statistic;  he's pointing 
>>>>>>>>>>>>> out that leaving out the multiplicity-weighting is prehistoric 
>>>>>>>>>>>>> (Diederichs & Karplus published it 20 years ago!).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So indeed:   Rmerge, Rpim and I/sigI give different information.  
>>>>>>>>>>>>> As you say.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But no:   Rmerge and Rmeas and Rcryst do NOT give different 
>>>>>>>>>>>>> information.  Except:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     * Rmerge is a (potentially) misleading version of Rmeas.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     * Rcryst and Rmerge and Rsym are terms that no longer have 
>>>>>>>>>>>>> significance in the single cryo-dataset world.
>>>>>>>>>>>>>
>>>>>>>>>>>>> phx.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/07/2017 09:43, Andrew Leslie wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to support Graeme in his wish to retain Rmerge in 
>>>>>>>>>>>>> Table 1, essentially for exactly the same reasons.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also strongly support Francis Reyes comment about the 
>>>>>>>>>>>>> usefulness of Rmerge at low resolution, and I would add to his 
>>>>>>>>>>>>> list that it can also, in some circumstances, be more indicative 
>>>>>>>>>>>>> of the wrong choice of symmetry (too high) than the statistics 
>>>>>>>>>>>>> that come from POINTLESS (excellent though that program is!).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrew
>>>>>>>>>>>>> On 5 Jul 2017, at 05:44, Graeme Winter 
>>>>>>>>>>>>> <graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> HI Jacob
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I got this - and I appreciate the benefit of Rmeas for 
>>>>>>>>>>>>> dealing with measuring agreement for small-multiplicity 
>>>>>>>>>>>>> observations. Having this *as well* is very useful and I agree 
>>>>>>>>>>>>> Rmeas / Rpim / CC-half should be the primary “quality” statistics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, you asked if there is any reason to *keep* rather
>>>>>>>>>>>>> than *eliminate* Rmerge, and I offered one :o)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not see what harm there is reporting Rmerge, even if it is 
>>>>>>>>>>>>> just used in the inner shell or just used to capture a flavour of 
>>>>>>>>>>>>> the data set overall. I also appreciate that Rmeas converges to 
>>>>>>>>>>>>> the same value for large multiplicity i.e.:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Overall InnerShell  OuterShell
>>>>>>>>>>>>> Low resolution limit                       39.02 39.02      1.39
>>>>>>>>>>>>> High resolution limit                       1.35 6.04      1.35
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rmerge  (within I+/I-)                     0.080 0.057     2.871
>>>>>>>>>>>>> Rmerge  (all I+ and I-)                    0.081 0.059     2.922
>>>>>>>>>>>>> Rmeas (within I+/I-)                       0.081 0.058     2.940
>>>>>>>>>>>>> Rmeas (all I+ & I-) 0.082 0.059     2.958
>>>>>>>>>>>>> Rpim (within I+/I-)                        0.013 0.009     0.628
>>>>>>>>>>>>> Rpim (all I+ & I-) 0.009 0.007     0.453
>>>>>>>>>>>>> Rmerge in top intensity bin                0.050 -         -
>>>>>>>>>>>>> Total number of observations             1265512 16212     53490
>>>>>>>>>>>>> Total number unique                        17515 224      1280
>>>>>>>>>>>>> Mean((I)/sd(I))                             29.7 104.3       1.5
>>>>>>>>>>>>> Mn(I) half-set correlation CC(1/2)         1.000 1.000     0.778
>>>>>>>>>>>>> Completeness                               100.0 99.7     100.0
>>>>>>>>>>>>> Multiplicity                                72.3 72.4      41.8
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anomalous completeness                     100.0 100.0     100.0
>>>>>>>>>>>>> Anomalous multiplicity                      37.2 42.7      21.0
>>>>>>>>>>>>> DelAnom correlation between half-sets      0.497 0.766    -0.026
>>>>>>>>>>>>> Mid-Slope of Anom Normal Probability       1.039 -         -
>>>>>>>>>>>>>
>>>>>>>>>>>>> (this is a good case for Rpim & CC-half as resolution
>>>>>>>>>>>>> limit
>>>>>>>>>>>>> criteria)
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the statistics you want to use are there & some others
>>>>>>>>>>>>> also, what is the pressure to remove them? Surely we want
>>>>>>>>>>>>> to educate on how best to interpret the entire table above
>>>>>>>>>>>>> to get a fuller picture of the overall quality of the
>>>>>>>>>>>>> data? My 0th-order request would be to publish the three
>>>>>>>>>>>>> shells as above ;o)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers Graeme
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4 Jul 2017, at 22:09, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. 
>>>>>>>>>>>>> Rmeas is simply (Rmerge * sqrt(n/n-1)) where n is the number of 
>>>>>>>>>>>>> measurements of that reflection. It's merely a way of correcting 
>>>>>>>>>>>>> for the multiplicity-related artifact of Rmerge, which is 
>>>>>>>>>>>>> becoming even more of a problem with data sets of increasing 
>>>>>>>>>>>>> variability in multiplicity. Consider the case of comparing a 
>>>>>>>>>>>>> data set with a multiplicity of 2 versus one of 100: equivalent 
>>>>>>>>>>>>> data quality would yield Rmerges diverging by a factor of ~1.4. 
>>>>>>>>>>>>> But this has all been covered before in several papers. It can be 
>>>>>>>>>>>>> and is reported in resolution bins, so can used exactly as you 
>>>>>>>>>>>>> say. So, why not "disappear" Rmerge from the software?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only reason I could come up with for keeping it is historical 
>>>>>>>>>>>>> reasons or comparisons to previous datasets, but anyway those 
>>>>>>>>>>>>> comparisons would be confounded by variabities in multiplicity 
>>>>>>>>>>>>> and a hundred other things, so come on, developers, just comment 
>>>>>>>>>>>>> it out!
>>>>>>>>>>>>>
>>>>>>>>>>>>> JPK
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From:
>>>>>>>>>>>>> graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac.
>>>>>>>>>>>>> uk> [mailto:graeme.win...@diamond.ac.uk]
>>>>>>>>>>>>> Sent: Tuesday, July 04, 2017 4:37 PM
>>>>>>>>>>>>> To: Keller, Jacob
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>
>>>>>>>>>>>>> >
>>>>>>>>>>>>> Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk>
>>>>>>>>>>>>> Subject: Re: [ccp4bb] Rmergicide Through Programming
>>>>>>>>>>>>>
>>>>>>>>>>>>> HI Jacob
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unbiased estimate of the true unmerged I/sig(I) of your
>>>>>>>>>>>>> data (I find this particularly useful at low resolution)
>>>>>>>>>>>>> i.e. if your inner shell Rmerge is 10% your data agree
>>>>>>>>>>>>> very poorly; if 2% says your data agree very well provided
>>>>>>>>>>>>> you have sensible multiplicity… obviously depends on
>>>>>>>>>>>>> sensible interpretation. Rpim hides this (though tells you
>>>>>>>>>>>>> more about the quality of average measurement)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Essentially, for I/sig(I) you can (by and large) adjust your 
>>>>>>>>>>>>> sig(I) values however you like if you were so inclined. You can 
>>>>>>>>>>>>> only adjust Rmerge by excluding measurements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would therefore defend that - amongst the other stats
>>>>>>>>>>>>> you enumerate below - it still has a place
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers Graeme
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4 Jul 2017, at 14:10, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rmerge does contain information which complements the others.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What information? I was trying to think of a counterargument to 
>>>>>>>>>>>>> what I proposed, but could not think of a reason in the world to 
>>>>>>>>>>>>> keep reporting it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> JPK
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4 Jul 2017, at 12:00, Keller, Jacob 
>>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Crystallographers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Having been repeatedly chagrinned about the continued use and 
>>>>>>>>>>>>> reporting of Rmerge rather than Rmeas or similar, I thought of a 
>>>>>>>>>>>>> potential way to promote the change: what if merging programs 
>>>>>>>>>>>>> would completely omit Rmerge/cryst/sym? Is there some reason to 
>>>>>>>>>>>>> continue to report these stats, or are they just grandfathered 
>>>>>>>>>>>>> into the software? I doubt that any journal or crystallographer 
>>>>>>>>>>>>> would insist on reporting Rmerge per se. So, I wonder what 
>>>>>>>>>>>>> developers would think about commenting out a few lines of their 
>>>>>>>>>>>>> code, seeing what happens? Maybe a comment to the effect of 
>>>>>>>>>>>>> "Rmerge is now deprecated; use Rmeas" would be useful as well. 
>>>>>>>>>>>>> Would something catastrophic happen?
>>>>>>>>>>>>>
>>>>>>>>>>>>> All the best,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jacob Keller
>>>>>>>>>>>>>
>>>>>>>>>>>>> *******************************************
>>>>>>>>>>>>> Jacob Pearson Keller, PhD
>>>>>>>>>>>>> Research Scientist
>>>>>>>>>>>>> HHMI Janelia Research Campus / Looger lab
>>>>>>>>>>>>> Phone: (571)209-4000 x3159
>>>>>>>>>>>>> Email:
>>>>>>>>>>>>> kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><
>>>>>>>>>>>>> ma ilto:kell...@janelia.hhmi.org>
>>>>>>>>>>>>> *******************************************
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> This e-mail and any attachments may contain confidential, 
>>>>>>>>>>>>> copyright and or privileged material, and are for the use of the 
>>>>>>>>>>>>> intended addressee only. If you are not the intended addressee or 
>>>>>>>>>>>>> an authorised recipient of the addressee please notify us of 
>>>>>>>>>>>>> receipt by returning the e-mail and do not use, copy, retain, 
>>>>>>>>>>>>> distribute or disclose the information in or attached to the 
>>>>>>>>>>>>> e-mail.
>>>>>>>>>>>>> Any opinions expressed within this e-mail are those of the 
>>>>>>>>>>>>> individual and not necessarily of Diamond Light Source Ltd.
>>>>>>>>>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or 
>>>>>>>>>>>>> any attachments are free from viruses and we cannot accept 
>>>>>>>>>>>>> liability for any damage which you may sustain as a result of 
>>>>>>>>>>>>> software viruses which may be transmitted in or with the message.
>>>>>>>>>>>>> Diamond Light Source Limited (company no. 4375679).
>>>>>>>>>>>>> Registered in England and Wales with its registered office
>>>>>>>>>>>>> at Diamond House, Harwell Science and Innovation Campus,
>>>>>>>>>>>>> Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>
> --
> John Berrisford
> PDBe
> European Bioinformatics Institute (EMBL-EBI) European Molecular
> Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10
> 1SD UK
> Tel: +44 1223 492529
>
> http://www.pdbe.org
> http://www.facebook.com/proteindatabank
> http://twitter.com/PDBeurope
-----

Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato E-mailová 
zpráva nebo její přílohy pouze informativní charakter. Tato zpráva ani její 
přílohy v žádném ohledu Biotechnologický ústav AV ČR, v. v. i. k ničemu 
nezavazují. Text této zprávy nebo jejích příloh není návrhem na uzavření 
smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani jiným právním 
jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá předsmluvní 
odpovědnost Biotechnologického ústavu AV ČR, v. v. i.

Disclaimer: If not expressly stated otherwise, this e-mail message (including 
any attached files) is intended purely for informational purposes and does not 
represent a binding agreement on the part of Institute of Biotechnology CAS. 
The text of this message and its attachments cannot be considered as a proposal 
to conclude a contract, nor the acceptance of a proposal to conclude a 
contract, nor any other legal act leading to concluding any contract; nor does 
it create any pre-contractual liability on the part of Institute of 
Biotechnology CAS

Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming

Reply via email to