Hi James,

May I just offer a short counter-argument to your case for not including
weak reflections in the merging residuals?

Unlike many people I rather like Rmerge, not because it tells you how good
the data are, but because it gives you a clue as to how well the unmerged
measurements agree with one another. It's already been mentioned on this
thread that Rmerge is ~ 0.8 / <I/sigma> which means that the inverse is also
true - an Rmerge of 0.8 indicates that the average measurement in the shell
has an I/sigma of ~ 1 (presuming there are sufficient multiple measurements
- if the multiplicity is < 3 or so this can be nonsense)

This does not depend on the error model or the multiplicity. It just talks
about the average. Now, if we exclude all measurements with an I/sigma of
less than three we have no idea of how strong the reflections in the shell
are on average. We're just top-slicing the good reflections and asking how
well they agree. Well, with an I/sigma > 3 I would hope they agree rather
well if your error model is reasonable. It would suddenly become rare to see
an Rmerge > 0.3 in the outer shell.

I like Rpim. It tells you how good the average measurement should be
provided you have not too much radiation damage. However, without Rmerge I
can't get a real handle on how well the measurements agree.

Personally, what I would like to see is the full contents of the Scala log
file available as graphs along with Rd from xdsstat and some other choice
statistics so you can get a relatively complete picture, however I
appreciate that this is unrealistic :o)

Just my 2c.

Cheerio,

Graeme

On 8 March 2011 20:07, James Holton <jmhol...@lbl.gov> wrote:

> Although George does not mention anything about data reduction programs, I
> take from his description that common small-molecule data processing
> packages (SAINT, others?), have also been modernized to record all data (no
> I/sigmaI > 2 or 3 cutoff).  I agree with him that this is a good thing!  And
> it is also a good thing that small-molecule refinement programs use all
> data.  I just don't think it is a good idea to use all data in R factor
> calculations.
>
> Like Ron, I will probably be dating myself when I say that when I first got
> into the macromolecular crystallography business, it was still commonplace
> to use a 2-3 sigma spot intensity cutoff.  In fact, this is the reason why
> the PDB wants to know your "completeness" in the outermost resolution shell
> (in those days, the outer resolution was defined by where completeness drops
> to ~80% after the 3 sigma spot cutoff).  My experience with this, however,
> was brief, as the maximum-likelihood revolution was just starting to take
> hold, and the denzo manual specifically stated that only bad people use
> sigma cutoffs > -3.0.  Nevertheless, like many crystallographers from this
> era, I have fond memories of the REALLY low R factors you can get by using
> this arcane and now reviled practice.  Rsym values of 1-2% were common.
>
> It was only recently that I learned enough about statistics to understand
> the wisdom of my ancestors and that a 3-sigma cutoff is actually the "right
> thing to do" if you want to measure a fractional error (like an R factor).
>  That is all I'm saying.
>
> -James Holton
> MAD Scientist
>
>
> On 3/6/2011 2:50 PM, Ronald E Stenkamp wrote:
>
>> My small molecule experience is old enough (maybe 20 years) that I doubt
>> if it's even close to representing current practices (best or otherwise).
>>  Given George's comments, I suspect (and hope) that less-than cutoffs are
>> historical artifacts at this point, kept around in software for making
>> comparisons with older structure determinations.  But a bit of scanning of
>> Acta papers and others might be necessary to confirm that.  Ron
>>
>>
>> On Sun, 6 Mar 2011, James Holton wrote:
>>
>>
>>> Yes, I would classify anything with I/sigmaI < 3 as "weak".  And yes, of
>>> course it is possible to get "weak" spots from small molecule crystals.
>>> After all, there is no spot so "strong" that it cannot be defeated by a
>>> sufficient amount of background!  I just meant that, relatively speaking,
>>> the intensities diffracted from a small molecule crystal are orders of
>>> magnitude brighter than those from a macromolecular crystal of the same
>>> size, and even the same quality (the 1/Vcell^2 term in Darwin's formula).
>>>
>>> I find it interesting that you point out the use of a 2 sigma(I)
>>> intensity cutoff for small molecule data sets!  Is this still common
>>> practice?  I am not a card-carrying "small molecule crystallographer", so
>>> I'm not sure. However, if that is the case, then by definition there are no
>>> "weak" intensities in the data set.  And this is exactly the kind of data
>>> you want for least-squares refinement targets and computing "% error"
>>> quality metrics like R factors.  For likelihood targets, however, the "weak"
>>> data are actually a powerful restraint.
>>>
>>> -James Holton
>>> MAD Scientist
>>>
>>> On 3/6/2011 11:22 AM, Ronald E Stenkamp wrote:
>>>
>>>> Could you please expand on your statement that "small-molecule data has
>>>> essentially no weak spots."?  The small molecule data sets I've worked with
>>>> have had large numbers of "unobserved" reflections where I used 2 sigma(I)
>>>> cutoffs (maybe 15-30% of the reflections).  Would you consider those "weak"
>>>> spots or not?  Ron
>>>>
>>>> On Sun, 6 Mar 2011, James Holton wrote:
>>>>
>>>>  I should probably admit that I might be indirectly responsible for the
>>>>> resurgence of this I/sigma > 3 idea, but I never intended this in the way
>>>>> described by the original poster's reviewer!
>>>>>
>>>>> What I have been trying to encourage people to do is calculate R
>>>>> factors using only hkls for which the signal-to-noise ratio is > 3.  Not
>>>>> refinement! Refinement should be done against all data.  I merely propose
>>>>> that weak data be excluded from R-factor calculations after the
>>>>> refinement/scaling/mergeing/etc. is done.
>>>>>
>>>>> This is because R factors are a metric of the FRACTIONAL error in
>>>>> something (aka a "% difference"), but a "% error" is only meaningful when
>>>>> the thing being measured is not zero.  However, in macromolecular
>>>>> crystallography, we tend to measure a lot of "zeroes".  There is nothing
>>>>> wrong with measuring zero!  An excellent example of this is confirming 
>>>>> that
>>>>> a systematic absence is in fact "absent".  The "sigma" on the intensity
>>>>> assigned to an absent spot is still a useful quantity, because it reflects
>>>>> how confident you are in the measurement.  I.E.  a sigma of "10" vs "100"
>>>>> means you are more sure that the intensity is zero. However, there is no 
>>>>> "R
>>>>> factor" for systematic absences. How could there be!  This is because the
>>>>> definition of "% error" starts to break down as the "true" spot intensity
>>>>> gets weaker, and it becomes completely meaningless when the "true" 
>>>>> intensity
>>>>> reaches zero.
>>>>>
>>>>> Historically, I believe the widespread use of R factors came about
>>>>> because small-molecule data has essentially no weak spots.  With the
>>>>> exception of absences (which are not used in refinement), spots from "salt
>>>>> crystals" are strong all the way out to edge of the detector, (even out to
>>>>> the "limiting sphere", which is defined by the x-ray wavelength).  So, 
>>>>> when
>>>>> all the data are strong, a "% error" is an easy-to-calculate quantity that
>>>>> actually describes the "sigma"s of the data very well.  That is, sigma(I) 
>>>>> of
>>>>> strong spots tends to be dominated by things like beam flicker, spindle
>>>>> stability, shutter accuracy, etc.  All these usually add up to ~5% error,
>>>>> and indeed even the Braggs could typically get +/-5% for the intensity of
>>>>> the diffracted rays they were measuring.  Things like Rsym were therefore
>>>>> created to check that nothing "funny" happened in the measurement.
>>>>>
>>>>> For similar reasons, the quality of a model refined against all-strong
>>>>> data is described very well by a "% error", and this is why the 
>>>>> refinement R
>>>>> factors rapidly became popular.  Most people intuitively know what you 
>>>>> mean
>>>>> if you say that your model fits the data to "within 5%".  In fact, a 
>>>>> widely
>>>>> used criterion for the correctness of a "small molecule" structure is that
>>>>> the refinement R factor must be LOWER than Rsym.  This is equivalent to
>>>>> saying that your curve (model) fit your data "to within experimental 
>>>>> error".
>>>>> Unfortunately, this has never been the case for macromolecular structures!
>>>>>
>>>>> The problem with protein crystals, of course, is that we have lots of
>>>>> "weak" data.  And by "weak", I don't mean "bad"!  Yes, it is always nicer 
>>>>> to
>>>>> have more intense spots, but there is nothing shameful about knowing that
>>>>> certain intensities are actually very close to zero.  In fact, from the
>>>>> point of view of the refinement program, isn't describing some high-angle
>>>>> spot as: "zero, plus or minus 10", better than "I have no idea"?   Indeed,
>>>>> several works mentioned already as well as the "free lunch algorithm" have
>>>>> demonstrated that these "zero" data can actually be useful, even if it is
>>>>> well beyond the "resolution limit".
>>>>>
>>>>> So, what do we do?  I see no reason to abandon R factors, since they
>>>>> have such a long history and give us continuity of criteria going back
>>>>> almost a century.  However, I also see no reason to punish ourselves by
>>>>> including lots of zeroes in the denominator.  In fact, using weak data in 
>>>>> an
>>>>> R factor calculation defeats their best feature.  R factors are a very 
>>>>> good
>>>>> estimate of the fractional component of the total error, provided they are
>>>>> calculated with strong data only.
>>>>>
>>>>> Of course, with strong and weak data, the best thing to do is compare
>>>>> the model-data disagreement with the magnitude of the error.  That is,
>>>>> compare |Fobs-Fcalc| to sigma(Fobs), not Fobs itself.  Modern refinement
>>>>> programs do this!  And I say the more data the merrier.
>>>>>
>>>>>
>>>>> -James Holton
>>>>> MAD Scientist
>>>>>
>>>>>
>>>>> On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:
>>>>>
>>>>>> hi
>>>>>>
>>>>>> Recently on a paper I submitted, it was the editor of the journal who
>>>>>> wanted exactly the same thing. I never argued with the editor about this
>>>>>> (should have maybe), but it could be one cause of the epidemic that Bart
>>>>>> Hazes saw....
>>>>>>
>>>>>>
>>>>>> best regards
>>>>>>
>>>>>> Marjolein
>>>>>>
>>>>>> On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:
>>>>>>
>>>>>>  Dear all,
>>>>>>> I got a reviewer comment that indicate the "need to refine the
>>>>>>> structures at an appropriate resolution (I/sigmaI of>3.0), and 
>>>>>>> re-submit the
>>>>>>> revised coordinate files to the PDB for validation.". In the manuscript 
>>>>>>> I
>>>>>>> present some crystal structures determined by molecular replacement 
>>>>>>> using
>>>>>>> the same protein in a different space group as search model. Does anyone
>>>>>>> know the origin or the theoretical basis of this "I/sigmaI>3.0" rule 
>>>>>>> for an
>>>>>>> appropriate resolution?
>>>>>>> Thanks,
>>>>>>> Bye,
>>>>>>> Roberto.
>>>>>>>
>>>>>>>
>>>>>>> Roberto Battistutta
>>>>>>> Associate Professor
>>>>>>> Department of Chemistry
>>>>>>> University of Padua
>>>>>>> via Marzolo 1, 35131 Padova - ITALY
>>>>>>> tel. +39.049.8275265/67
>>>>>>> fax. +39.049.8275239
>>>>>>> roberto.battistu...@unipd.it
>>>>>>> www.chimica.unipd.it/roberto.battistutta/
>>>>>>> VIMM (Venetian Institute of Molecular Medicine)
>>>>>>> via Orus 2, 35129 Padova - ITALY
>>>>>>> tel. +39.049.7923236
>>>>>>> fax +39.049.7923250
>>>>>>> www.vimm.it
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>

Reply via email to