Hello, I think this needs a little bit of crystarchaeology.
Rmerge and Rsym used to be different. This was at a time when data sets were typically collected from several crystals. Pre-cryo cooling, with data recorded on photographic film (Arndt-Wonacott cameras). Rmerge = agreement R-factor from data from several crystals; Rsym = agreement R-factor from symmetry-equivalents within one crystal. [I just type "agreement R-factor" in order not to have to type the formulae] At that time, people were confused about these two terms. Nowadays both are (used as) synonyms. Cheers, Fred. -----Original Message----- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Phil Evans Sent: Monday, July 10, 2017 5:43 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] AW: [ccp4bb] Rmergicide Through Programming What is the difference between Rmerge and Rsym - I thought they were the same? Rrim == Rmeas I think Phil > On 10 Jul 2017, at 15:18, John Berrisford <j...@ebi.ac.uk> wrote: > > Dear Herman > > The new PDB deposition system (OneDep) allows you to enter values for Rmerge, > Rsym, Rpim, Rrim and / or CC half. If, during deposition, you do not provide > a value for any of these metrics then we will ask you for a value for one of > them. > > Also, PDB format is a legacy format for the PDB. In 2014 mmCIF became the > archive format for the PDB and some large entries are no longer distributed > in PDB format. mmCIF is not limited by the constraints of punch cards. > > Please see > https://www.wwpdb.org/documentation/file-formats-and-the-pdb > > Regards > > John > > PDBe > > > > On 10/07/2017 09:26, herman.schreu...@sanofi.com wrote: >> Dear All, >> >> For me this whole discussion is an example of a large number of people >> barking at the wrong tree. The real issue is not whether data processing >> programs print amongst many quality indicators an Rmerge as well, but the >> fact that the PDB and many journals still insist on using the Rmerge as >> primary quality indicator. As long as this is true, novice scientist might >> be led to believe that Rmerge is the most important quality indicator. As >> soon as the PDB and the journals request some other indicator, this will be >> over. So that is where we should direct our efforts to. >> >> I don't understand at all, why the PDB still insists on an obsolete quality >> indicator. However, the PDB format for the coordinates also dates back to >> the 1960's to be used with punch cards. >> >> My 2 cents. >> Herman >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] Im Auftrag >> von Edward A. Berry >> Gesendet: Samstag, 8. Juli 2017 22:31 >> An: CCP4BB@JISCMAIL.AC.UK >> Betreff: Re: [ccp4bb] Rmergicide Through Programming >> >> But R-merge is not really narrower as a fraction of the mean value- it just >> gets smaller proportionantly as all the numbers get smaller: >> RMSD of .0043 for R-meas multiplied by factor of 0.022/.027 gives 0.0035 >> which is the RMSD for Rmerge. The same was true in the previous example. You >> could multiply R-meas by .5 or .2 and get a sharper distribution yet! And >> that factor would be constant, where this only applies for super-low >> redundancy. >> >> On 07/08/2017 03:23 PM, James Holton wrote: >>> The expected distribution of Rmeas values is still wider than that of >>> Rmerge for data with I/sigma=30 and average multiplicity=2.0. Graph >>> attached. >>> >>> I expect that anytime you incorporate more than one source of information >>> you run the risk of a noisier statistic because every source of information >>> can contain noise. That is, Rmeas combines information about multiplicity >>> with the absolute deviates in the data to form a statistic that is more >>> accurate that Rmerge, but also (potentially) less precise. >>> >>> Perhaps that is what we are debating here? Which is better? accuracy or >>> precision? Personally, I prefer to know both. >>> >>> -James Holton >>> MAD Scientist >>> >>> On 7/8/2017 11:02 AM, Frank von Delft wrote: >>>> It is quite easy to end up with low multiplicities in the low resolution >>>> shell, especially for low symmetry and fast-decaying crystals. >>>> >>>> It is this scenario where Rmerge (lowres) is more misleading than Reas. >>>> >>>> phx >>>> >>>> >>>> On 08/07/2017 17:31, James Holton wrote: >>>>> What does Rmeas tell us that Rmerge doesn't? Given that we know the >>>>> multiplicity? >>>>> >>>>> -James Holton >>>>> MAD Scientist >>>>> >>>>> On 7/8/2017 9:15 AM, Frank von Delft wrote: >>>>>> Anyway, back to reality: does anybody still use R statistics to >>>>>> evaluate anything other than /strong/ data? Certainly I never look at >>>>>> it except for the low-resolution bin (or strongest reflections). >>>>>> Specifically, a "2%-dataset" in that bin is probably healthy, while a >>>>>> "9%-dataset" probably Has Issues. >>>>>> >>>>>> In which case, back to Jacob's question: what does Rmerge tell us that >>>>>> Rmeas doesn't. >>>>>> >>>>>> phx >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 08/07/2017 17:02, James Holton wrote: >>>>>>> Sorry for the confusion. I was going for brevity! And failed. >>>>>>> >>>>>>> I know that the multiplicity correction is applied on a per-hkl basis >>>>>>> in the calculation of Rmeas. However, the average multiplicity over >>>>>>> the whole calculation is most likely not an integer. Some hkls may be >>>>>>> observed twice while others only once, or perhaps 3-4 times in the same >>>>>>> scaling run. >>>>>>> >>>>>>> Allow me to do the error propagation properly. Consider the scenario: >>>>>>> >>>>>>> Your outer resolution bin has a true I/sigma = 1.00 and average >>>>>>> multiplicity of 2.0. Let's say there are 100 hkl indices in this bin. >>>>>>> I choose the "true" intensities of each hkl from an exponential (aka >>>>>>> Wilson) distribution. Further assume the background is high, so the >>>>>>> error in each observation after background subtraction may be taken >>>>>>> from a Gaussian distribution. Let's further choose the per-hkl >>>>>>> multiplicity from a Poisson distribution with expectation value 2.0, so >>>>>>> 0 is possible, but the long-term average multiplicity is 2.0. For R >>>>>>> calculation, when multiplicity of any given hkl is less than 2 it is >>>>>>> skipped. What I end up with after 120,000 trials is a distribution of >>>>>>> values for each R factor. See attached graph. >>>>>>> >>>>>>> What I hope is readily apparent is that the distribution of >>>>>>> Rmerge values is taller and sharper than that of the Rmeas values. The >>>>>>> most likely Rmeas is 80% and that of Rmerge is 64.6%. This is >>>>>>> expected, of course. But what I hope to impress upon you is that the >>>>>>> most likely value is not generally the one that you will get! The >>>>>>> distribution has a width. Specifically, Rmeas could be as low as 40%, >>>>>>> or as high as 209%, depending on the trial. Half of the trial results >>>>>>> falling between 71.4% and 90.3%, a range of 19 percentage points. >>>>>>> Rmerge has a middle-half range from 57.6% to 72.9% (15.3 percentage >>>>>>> points). This range of possible values of Rmerge or Rmeas from data >>>>>>> with the same intrinsic quality is what I mean when I say "numerical >>>>>>> instability". Each and every trial had the same true I/sigma and >>>>>>> multiplicity, and yet the R factors I get vary depending on the trial. >>>>>>> Unfortunately for most of us with real data, you only ever get one >>>>>>> trial, and you can't predict which Rmeas or Rmerge you'll get. >>>>>>> >>>>>>> My point here is that R statistics in general are not comparable from >>>>>>> experiment to experiment when you are looking at data with low average >>>>>>> intensity and low multiplicity, and it appears that Rmeas is less >>>>>>> stable than Rmerge. Not by much, mind you, but still jumps around more. >>>>>>> >>>>>>> Hope that is clearer? >>>>>>> >>>>>>> Note that in no way am I suggesting that low-multiplicity is the right >>>>>>> way to collect data. Far from it. Especially with modern detectors >>>>>>> that have negligible read-out noise. But when micro crystals only give >>>>>>> off a handful of photons each before they die, low multiplicity might >>>>>>> be all you have. >>>>>>> >>>>>>> -James Holton >>>>>>> MAD Scientist >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/7/2017 2:33 PM, Edward A. Berry wrote: >>>>>>>> I think the confusion here is that the "multiplicity correction" >>>>>>>> is applied on each reflection, where it will be an integer 2 or >>>>>>>> greater (can't estimate variance with only one measurement). >>>>>>>> You can only correct in an approximate way using using the >>>>>>>> average multiplicity of the dataset, since it would depend on the >>>>>>>> distribution of multiplicity over the reflections. >>>>>>>> >>>>>>>> And the correction is for r-merge. You don't need to apply a >>>>>>>> correction to R-meas. >>>>>>>> R-meas is a redundancy-independent best estimate of the variance. >>>>>>>> Whatever you would have used R-merge for (hopefully taking >>>>>>>> allowance for the multiplicity) you can use R-meas and not worry about >>>>>>>> multiplicity. >>>>>>>> Again, what information does R-merge provide that R-meas does >>>>>>>> not provide in a more accurate way? >>>>>>>> >>>>>>>> According to the denso manual, one way to artificially reduce >>>>>>>> R-merge is to include reflections with only one measure >>>>>>>> (averaging in a lot of zero's always helps bring an average >>>>>>>> down), and they say there were actually some programs that did >>>>>>>> that. However I'm quite sure none of the ones we rely on today do that. >>>>>>>> >>>>>>>> On 07/07/2017 03:12 PM, Kay Diederichs wrote: >>>>>>>>> James, >>>>>>>>> >>>>>>>>> I cannot follow you. "n approaches 1" can only mean n = 2 because n >>>>>>>>> is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. For >>>>>>>>> n=1, neither contributions to Rmeas nor Rmerge nor to any other >>>>>>>>> precision indicator can be calculated anyway, because there's nothing >>>>>>>>> this measurement can be compared against. >>>>>>>>> >>>>>>>>> just my 2 cents, >>>>>>>>> >>>>>>>>> Kay >>>>>>>>> >>>>>>>>> On Fri, 7 Jul 2017 10:57:17 -0700, James Holton >>>>>>>>> <jmhol...@slac.stanford.edu> wrote: >>>>>>>>> >>>>>>>>>> I happen to be one of those people who think Rmerge is a very >>>>>>>>>> useful statistic. Not as a method of evaluating the >>>>>>>>>> resolution limit, which is mathematically ridiculous, but for >>>>>>>>>> a host of other important things, like evaluating the >>>>>>>>>> performance of data collection equipment, and evaluating the >>>>>>>>>> isomorphism of different crystals, to name a few. >>>>>>>>>> >>>>>>>>>> I like Rmerge because it is a simple statistic that has a >>>>>>>>>> simple formula and has not undergone any "corrections". >>>>>>>>>> Corrections increase complexity, and complexity opens the >>>>>>>>>> door to manipulation by the desperate and/or misguided. For >>>>>>>>>> example, overzealous outlier rejection is a common way to >>>>>>>>>> abuse R factors, and it is far too often swept under the rug, >>>>>>>>>> sometimes without the user even knowing about it. This is >>>>>>>>>> especially problematic when working in a regime where the statistic >>>>>>>>>> of interest is unstable, and for R factors this is low intensity >>>>>>>>>> data. >>>>>>>>>> Rejecting just the right "outliers" can make any R factor >>>>>>>>>> look a lot better. Why would Rmeas be any more unstable than >>>>>>>>>> Rmerge? Look at the formula. There is an "n-1" in the >>>>>>>>>> denominator, where n is the multiplicity. So, what happens >>>>>>>>>> when n approaches 1 ? What happens when n=1? This is not to >>>>>>>>>> say Rmerge is better than Rmeas. In fact, I believe the >>>>>>>>>> latter is generally superior to the first, unless you are >>>>>>>>>> working near n = 1. The sqrt(n/(n-1)) is trying to correct >>>>>>>>>> for bias in the R statistic, but fighting one infinity with another >>>>>>>>>> infinity is a dangerous game. >>>>>>>>>> >>>>>>>>>> My point is that neither Rmerge nor Rmeas are easily >>>>>>>>>> interpreted without knowing the multiplicity. If you see >>>>>>>>>> Rmeas = 10% and the multiplicity is 10, then you know what >>>>>>>>>> that means. Same for Rmerge, since at n=10 both stats have >>>>>>>>>> nearly the same value. But if you have Rmeas = 45% and >>>>>>>>>> multiplicity = 1.05, what does that mean? Rmeas will be only >>>>>>>>>> 33% if the multiplicity is rounded up to 1.1. This is what I >>>>>>>>>> mean by "numerical instability", the value of the R statistic >>>>>>>>>> itself becomes sensitive to small amounts of noise, and >>>>>>>>>> behaves more and more like a random number generator. And if >>>>>>>>>> you have Rmeas = 33% and no indication of multiplicity, it is >>>>>>>>>> hard to know what is going on. I personally am a lot more >>>>>>>>>> comfortable seeing qualitative agreement between Rmerge and Rmeas, >>>>>>>>>> because that means the numerical instability of the multiplicity >>>>>>>>>> correction didn't mess anything up. >>>>>>>>>> >>>>>>>>>> Of course, when the intensity is weak R statistics in general >>>>>>>>>> are not useful. Both Rmeas and Rmerge have the sum of all >>>>>>>>>> intensities in the denominator, so when the bin-wide sum >>>>>>>>>> approaches zero you have another infinity to contend with. >>>>>>>>>> This one starts to rear its ugly head once I/sigma drops >>>>>>>>>> below about 3, and this is why our ancestors always applied a >>>>>>>>>> sigma cutoff before computing an R factor. Our small-molecule >>>>>>>>>> colleagues still do this! They call it "R1". And it is an >>>>>>>>>> excellent indicator of the overall relative error. The >>>>>>>>>> relative error in the outermost bin is not meaningful, and strangely >>>>>>>>>> enough nobody ever reported the outer-resolution Rmerge before 1995. >>>>>>>>>> >>>>>>>>>> For weak signals, Correlation Coefficients are better, but >>>>>>>>>> for strong signals CC pegs out at >95%, making it harder to see >>>>>>>>>> relative errors. >>>>>>>>>> I/sigma is what we'd like to know, but the value of "sigma" >>>>>>>>>> is still prone to manipulation by not just outlier rejection, >>>>>>>>>> but massaging the so-called "error model". Suffice it to >>>>>>>>>> say, crystallographic data contain more than one type of error. >>>>>>>>>> Some sources are important for weak spots, others are >>>>>>>>>> important for strong spots, and still others are only >>>>>>>>>> apparent in the mid-range. Some sources of error are only >>>>>>>>>> important at low multiplicity, and others only manifest at high >>>>>>>>>> multiplicity. >>>>>>>>>> There is no single number that can be used to evaluate all aspects >>>>>>>>>> of data quality. >>>>>>>>>> >>>>>>>>>> So, I remain a champion of reporting Rmerge. Not in the >>>>>>>>>> high-angle bin, because that is essentially a random number, >>>>>>>>>> but overall Rmerge and low-angle-bin Rmerge next to >>>>>>>>>> multiplicity, Rmeas, CC1/2 and other statistics is the only >>>>>>>>>> way you can glean enough information about where the errors >>>>>>>>>> are coming from in the data. Rmeas is a useful addition >>>>>>>>>> because it helps us correct for multiplicity without having >>>>>>>>>> to do math in our head. Users generally thank you for that. >>>>>>>>>> Rmerge, however, has served us well for more than half a >>>>>>>>>> century, and I believe Uli Arndt knew what he was doing. I >>>>>>>>>> hope we all know enough about history to realize that future >>>>>>>>>> generations seldom thank their ancestors for "protecting" them from >>>>>>>>>> information. >>>>>>>>>> >>>>>>>>>> -James Holton >>>>>>>>>> MAD Scientist >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7/5/2017 10:36 AM, Graeme Winter wrote: >>>>>>>>>>> Frank, >>>>>>>>>>> >>>>>>>>>>> you are asking me to remove features that I like, so I would feel >>>>>>>>>>> that the challenge is for you to prove that this is harmful however: >>>>>>>>>>> >>>>>>>>>>> - at the minimum, I find it a useful check sum that the stats >>>>>>>>>>> are internally consistent (though I interpret it for lots of other >>>>>>>>>>> reasons too) >>>>>>>>>>> - it is faulty I agree, but (with caveats) still useful >>>>>>>>>>> IMHO >>>>>>>>>>> >>>>>>>>>>> Sorry for being terse, but I remain to be convinced that >>>>>>>>>>> removing it increases the amount of information >>>>>>>>>>> >>>>>>>>>>> CC’ing BB as requested >>>>>>>>>>> >>>>>>>>>>> Best wishes Graeme >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 5 Jul 2017, at 17:17, Frank von Delft >>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk> wrote: >>>>>>>>>>>> >>>>>>>>>>>> You keep not answering the challenge. >>>>>>>>>>>> >>>>>>>>>>>> It's really simple: what information does Rmerge provide that >>>>>>>>>>>> Rmeas doesn't. >>>>>>>>>>>> >>>>>>>>>>>> (If you answer, email to the BB.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote: >>>>>>>>>>>>> Dear Frank, >>>>>>>>>>>>> >>>>>>>>>>>>> You are forcefully arguing essentially that others are wrong if >>>>>>>>>>>>> we feel an existing statistic continues to be useful, and instead >>>>>>>>>>>>> insist that it be outlawed so that we may not make use of it, >>>>>>>>>>>>> just in case someone misinterprets it. >>>>>>>>>>>>> >>>>>>>>>>>>> Very well >>>>>>>>>>>>> >>>>>>>>>>>>> I do however express disquiet that we as software developers feel >>>>>>>>>>>>> browbeaten to remove the output we find useful because “the >>>>>>>>>>>>> community” feel that it is obsolete. >>>>>>>>>>>>> >>>>>>>>>>>>> I feel that Jacob’s short story on this thread illustrates that >>>>>>>>>>>>> educating the next generation of crystallographers to understand >>>>>>>>>>>>> what all of the numbers mean is critical, and that a >>>>>>>>>>>>> numerological approach of trying to optimise any one statistic is >>>>>>>>>>>>> essentially doomed. Precisely the same argument could be made for >>>>>>>>>>>>> people cutting the “resolution” at the wrong place in order to >>>>>>>>>>>>> improve the average I/sig(I) of the data set. >>>>>>>>>>>>> >>>>>>>>>>>>> Denying access to information is not a solution to >>>>>>>>>>>>> misinterpretation, from where I am sat, however I acknowledge >>>>>>>>>>>>> that other points of view exist. >>>>>>>>>>>>> >>>>>>>>>>>>> Best wishes Graeme >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 5 Jul 2017, at 12:11, Frank von Delft >>>>>>>>>>>>> <frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Graeme, Andrew >>>>>>>>>>>>> >>>>>>>>>>>>> Jacob is not arguing against an R-based statistic; he's pointing >>>>>>>>>>>>> out that leaving out the multiplicity-weighting is prehistoric >>>>>>>>>>>>> (Diederichs & Karplus published it 20 years ago!). >>>>>>>>>>>>> >>>>>>>>>>>>> So indeed: Rmerge, Rpim and I/sigI give different information. >>>>>>>>>>>>> As you say. >>>>>>>>>>>>> >>>>>>>>>>>>> But no: Rmerge and Rmeas and Rcryst do NOT give different >>>>>>>>>>>>> information. Except: >>>>>>>>>>>>> >>>>>>>>>>>>> * Rmerge is a (potentially) misleading version of Rmeas. >>>>>>>>>>>>> >>>>>>>>>>>>> * Rcryst and Rmerge and Rsym are terms that no longer have >>>>>>>>>>>>> significance in the single cryo-dataset world. >>>>>>>>>>>>> >>>>>>>>>>>>> phx. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/07/2017 09:43, Andrew Leslie wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to support Graeme in his wish to retain Rmerge in >>>>>>>>>>>>> Table 1, essentially for exactly the same reasons. >>>>>>>>>>>>> >>>>>>>>>>>>> I also strongly support Francis Reyes comment about the >>>>>>>>>>>>> usefulness of Rmerge at low resolution, and I would add to his >>>>>>>>>>>>> list that it can also, in some circumstances, be more indicative >>>>>>>>>>>>> of the wrong choice of symmetry (too high) than the statistics >>>>>>>>>>>>> that come from POINTLESS (excellent though that program is!). >>>>>>>>>>>>> >>>>>>>>>>>>> Andrew >>>>>>>>>>>>> On 5 Jul 2017, at 05:44, Graeme Winter >>>>>>>>>>>>> <graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> HI Jacob >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, I got this - and I appreciate the benefit of Rmeas for >>>>>>>>>>>>> dealing with measuring agreement for small-multiplicity >>>>>>>>>>>>> observations. Having this *as well* is very useful and I agree >>>>>>>>>>>>> Rmeas / Rpim / CC-half should be the primary “quality” statistics. >>>>>>>>>>>>> >>>>>>>>>>>>> However, you asked if there is any reason to *keep* rather >>>>>>>>>>>>> than *eliminate* Rmerge, and I offered one :o) >>>>>>>>>>>>> >>>>>>>>>>>>> I do not see what harm there is reporting Rmerge, even if it is >>>>>>>>>>>>> just used in the inner shell or just used to capture a flavour of >>>>>>>>>>>>> the data set overall. I also appreciate that Rmeas converges to >>>>>>>>>>>>> the same value for large multiplicity i.e.: >>>>>>>>>>>>> >>>>>>>>>>>>> Overall InnerShell OuterShell >>>>>>>>>>>>> Low resolution limit 39.02 39.02 1.39 >>>>>>>>>>>>> High resolution limit 1.35 6.04 1.35 >>>>>>>>>>>>> >>>>>>>>>>>>> Rmerge (within I+/I-) 0.080 0.057 2.871 >>>>>>>>>>>>> Rmerge (all I+ and I-) 0.081 0.059 2.922 >>>>>>>>>>>>> Rmeas (within I+/I-) 0.081 0.058 2.940 >>>>>>>>>>>>> Rmeas (all I+ & I-) 0.082 0.059 2.958 >>>>>>>>>>>>> Rpim (within I+/I-) 0.013 0.009 0.628 >>>>>>>>>>>>> Rpim (all I+ & I-) 0.009 0.007 0.453 >>>>>>>>>>>>> Rmerge in top intensity bin 0.050 - - >>>>>>>>>>>>> Total number of observations 1265512 16212 53490 >>>>>>>>>>>>> Total number unique 17515 224 1280 >>>>>>>>>>>>> Mean((I)/sd(I)) 29.7 104.3 1.5 >>>>>>>>>>>>> Mn(I) half-set correlation CC(1/2) 1.000 1.000 0.778 >>>>>>>>>>>>> Completeness 100.0 99.7 100.0 >>>>>>>>>>>>> Multiplicity 72.3 72.4 41.8 >>>>>>>>>>>>> >>>>>>>>>>>>> Anomalous completeness 100.0 100.0 100.0 >>>>>>>>>>>>> Anomalous multiplicity 37.2 42.7 21.0 >>>>>>>>>>>>> DelAnom correlation between half-sets 0.497 0.766 -0.026 >>>>>>>>>>>>> Mid-Slope of Anom Normal Probability 1.039 - - >>>>>>>>>>>>> >>>>>>>>>>>>> (this is a good case for Rpim & CC-half as resolution >>>>>>>>>>>>> limit >>>>>>>>>>>>> criteria) >>>>>>>>>>>>> >>>>>>>>>>>>> If the statistics you want to use are there & some others >>>>>>>>>>>>> also, what is the pressure to remove them? Surely we want >>>>>>>>>>>>> to educate on how best to interpret the entire table above >>>>>>>>>>>>> to get a fuller picture of the overall quality of the >>>>>>>>>>>>> data? My 0th-order request would be to publish the three >>>>>>>>>>>>> shells as above ;o) >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers Graeme >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 4 Jul 2017, at 22:09, Keller, Jacob >>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. >>>>>>>>>>>>> Rmeas is simply (Rmerge * sqrt(n/n-1)) where n is the number of >>>>>>>>>>>>> measurements of that reflection. It's merely a way of correcting >>>>>>>>>>>>> for the multiplicity-related artifact of Rmerge, which is >>>>>>>>>>>>> becoming even more of a problem with data sets of increasing >>>>>>>>>>>>> variability in multiplicity. Consider the case of comparing a >>>>>>>>>>>>> data set with a multiplicity of 2 versus one of 100: equivalent >>>>>>>>>>>>> data quality would yield Rmerges diverging by a factor of ~1.4. >>>>>>>>>>>>> But this has all been covered before in several papers. It can be >>>>>>>>>>>>> and is reported in resolution bins, so can used exactly as you >>>>>>>>>>>>> say. So, why not "disappear" Rmerge from the software? >>>>>>>>>>>>> >>>>>>>>>>>>> The only reason I could come up with for keeping it is historical >>>>>>>>>>>>> reasons or comparisons to previous datasets, but anyway those >>>>>>>>>>>>> comparisons would be confounded by variabities in multiplicity >>>>>>>>>>>>> and a hundred other things, so come on, developers, just comment >>>>>>>>>>>>> it out! >>>>>>>>>>>>> >>>>>>>>>>>>> JPK >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: >>>>>>>>>>>>> graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac. >>>>>>>>>>>>> uk> [mailto:graeme.win...@diamond.ac.uk] >>>>>>>>>>>>> Sent: Tuesday, July 04, 2017 4:37 PM >>>>>>>>>>>>> To: Keller, Jacob >>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org> >>>>>>>>>>>>> > >>>>>>>>>>>>> Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk> >>>>>>>>>>>>> Subject: Re: [ccp4bb] Rmergicide Through Programming >>>>>>>>>>>>> >>>>>>>>>>>>> HI Jacob >>>>>>>>>>>>> >>>>>>>>>>>>> Unbiased estimate of the true unmerged I/sig(I) of your >>>>>>>>>>>>> data (I find this particularly useful at low resolution) >>>>>>>>>>>>> i.e. if your inner shell Rmerge is 10% your data agree >>>>>>>>>>>>> very poorly; if 2% says your data agree very well provided >>>>>>>>>>>>> you have sensible multiplicity… obviously depends on >>>>>>>>>>>>> sensible interpretation. Rpim hides this (though tells you >>>>>>>>>>>>> more about the quality of average measurement) >>>>>>>>>>>>> >>>>>>>>>>>>> Essentially, for I/sig(I) you can (by and large) adjust your >>>>>>>>>>>>> sig(I) values however you like if you were so inclined. You can >>>>>>>>>>>>> only adjust Rmerge by excluding measurements. >>>>>>>>>>>>> >>>>>>>>>>>>> I would therefore defend that - amongst the other stats >>>>>>>>>>>>> you enumerate below - it still has a place >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers Graeme >>>>>>>>>>>>> >>>>>>>>>>>>> On 4 Jul 2017, at 14:10, Keller, Jacob >>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Rmerge does contain information which complements the others. >>>>>>>>>>>>> >>>>>>>>>>>>> What information? I was trying to think of a counterargument to >>>>>>>>>>>>> what I proposed, but could not think of a reason in the world to >>>>>>>>>>>>> keep reporting it. >>>>>>>>>>>>> >>>>>>>>>>>>> JPK >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 4 Jul 2017, at 12:00, Keller, Jacob >>>>>>>>>>>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Dear Crystallographers, >>>>>>>>>>>>> >>>>>>>>>>>>> Having been repeatedly chagrinned about the continued use and >>>>>>>>>>>>> reporting of Rmerge rather than Rmeas or similar, I thought of a >>>>>>>>>>>>> potential way to promote the change: what if merging programs >>>>>>>>>>>>> would completely omit Rmerge/cryst/sym? Is there some reason to >>>>>>>>>>>>> continue to report these stats, or are they just grandfathered >>>>>>>>>>>>> into the software? I doubt that any journal or crystallographer >>>>>>>>>>>>> would insist on reporting Rmerge per se. So, I wonder what >>>>>>>>>>>>> developers would think about commenting out a few lines of their >>>>>>>>>>>>> code, seeing what happens? Maybe a comment to the effect of >>>>>>>>>>>>> "Rmerge is now deprecated; use Rmeas" would be useful as well. >>>>>>>>>>>>> Would something catastrophic happen? >>>>>>>>>>>>> >>>>>>>>>>>>> All the best, >>>>>>>>>>>>> >>>>>>>>>>>>> Jacob Keller >>>>>>>>>>>>> >>>>>>>>>>>>> ******************************************* >>>>>>>>>>>>> Jacob Pearson Keller, PhD >>>>>>>>>>>>> Research Scientist >>>>>>>>>>>>> HHMI Janelia Research Campus / Looger lab >>>>>>>>>>>>> Phone: (571)209-4000 x3159 >>>>>>>>>>>>> Email: >>>>>>>>>>>>> kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>< >>>>>>>>>>>>> ma ilto:kell...@janelia.hhmi.org> >>>>>>>>>>>>> ******************************************* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> This e-mail and any attachments may contain confidential, >>>>>>>>>>>>> copyright and or privileged material, and are for the use of the >>>>>>>>>>>>> intended addressee only. If you are not the intended addressee or >>>>>>>>>>>>> an authorised recipient of the addressee please notify us of >>>>>>>>>>>>> receipt by returning the e-mail and do not use, copy, retain, >>>>>>>>>>>>> distribute or disclose the information in or attached to the >>>>>>>>>>>>> e-mail. >>>>>>>>>>>>> Any opinions expressed within this e-mail are those of the >>>>>>>>>>>>> individual and not necessarily of Diamond Light Source Ltd. >>>>>>>>>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or >>>>>>>>>>>>> any attachments are free from viruses and we cannot accept >>>>>>>>>>>>> liability for any damage which you may sustain as a result of >>>>>>>>>>>>> software viruses which may be transmitted in or with the message. >>>>>>>>>>>>> Diamond Light Source Limited (company no. 4375679). >>>>>>>>>>>>> Registered in England and Wales with its registered office >>>>>>>>>>>>> at Diamond House, Harwell Science and Innovation Campus, >>>>>>>>>>>>> Didcot, Oxfordshire, OX11 0DE, United Kingdom >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > > -- > John Berrisford > PDBe > European Bioinformatics Institute (EMBL-EBI) European Molecular > Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 > 1SD UK > Tel: +44 1223 492529 > > http://www.pdbe.org > http://www.facebook.com/proteindatabank > http://twitter.com/PDBeurope ----- Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato E-mailová zpráva nebo její přílohy pouze informativní charakter. Tato zpráva ani její přílohy v žádném ohledu Biotechnologický ústav AV ČR, v. v. i. k ničemu nezavazují. Text této zprávy nebo jejích příloh není návrhem na uzavření smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani jiným právním jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá předsmluvní odpovědnost Biotechnologického ústavu AV ČR, v. v. i. Disclaimer: If not expressly stated otherwise, this e-mail message (including any attached files) is intended purely for informational purposes and does not represent a binding agreement on the part of Institute of Biotechnology CAS. The text of this message and its attachments cannot be considered as a proposal to conclude a contract, nor the acceptance of a proposal to conclude a contract, nor any other legal act leading to concluding any contract; nor does it create any pre-contractual liability on the part of Institute of Biotechnology CAS