Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

Ian Tickle Wed, 27 Oct 2010 04:22:59 -0700

Anthony,

I have used the minimum of -LLfree (i.e. same as maximum free
likelihood) as a stopping rule for both weight optimisation and adding
waters, the former because it seems to be well justified by theory
(Gerard Bricogne's that is); also it's obviously very similar to Axel
Brunger's min(Rfree) rule for weight optimisation which seems to work
well.  I use it for adding waters because it seems to give a
reasonable number of waters.  Changes in Rfree seem to roughly mirror
changes in -LLfree, though they don't necessarily have minima at the
same points in parameter space; I guess that's not surprising since
unlike LLfree, Rfree is unweighted.  Using the min(-LLfree) rule
routinely for weight optimisation would be quite time consuming, so
now I just use a target RMS-Z(bonds) value based on a linear fit of
RMS-Z(bonds) vs resolution obtained from PDB-REDO refinements, where
the min(-LLfree) rule was used.

I haven't done a systematic study to see whether it can be used to
decide whether or not adding TLS parameters improves the model, but in
most of the cases I looked at (though admittedly not all) using TLS
reduces Rfree and -LLfree, or at least doesn't cause them to increase
significantly, so now I just use TLS routinely (like most other people
I guess!).  If I were being totally consistent with the use of my
rule, I should really test -LLfree after using TLS and if it does
increase then throw away the TLS model!  This area could benefit from
more careful investigation!

I also tried min(Rfree-Rwork) as a stopping rule for weight
optimisation and adding waters but it didn't give good results (i.e.
the number of waters added seemed unrealistic).  I haven't tried your
rule min(Rfree-Rwork/2) in either case, and it may indeed turn out
that it works better than mine.  I was just interested to know whether
you had arrived at your rule by experimentation, and if so how it
compared with other possible rules.

I do have one reservation about your rule; the same also applies to
the min(Rfree-Rwork) rule: you can get situations where a decrease in
both Rwork and Rfree corresponds to a worse model according to the
rule, and conversely an increase in Rwork and Rfree corresponds to an
improved model.  This looks counter-intuitive to me: intuition tells
me that a model which is more consistent with all of the experimental
data (i.e. both the working and test sets) is a better model and one
which is less consistent is a worse one.  Admittedly intuition has
been known to lead one astray and it may be the case that the model
with lower Rwork & Rfree is worse if judged by the deviations from the
target geometry; however it doesn't seem likely that one would in
practice get a lower Rfree with worse geometry unless really unlucky!

For example, starting with a model with Rwork = 20, Rfree = 30 as
before (test value = 20), consider a model with Rwork = 16, Rfree =
29: the test value = 21, so a worse model by your rule.  Conversely
consider a model with Rwork = 24, Rfree = 31: test value = 19, so a
better model by your rule.  As I said this behaviour is not peculiar
to your rule; any rule which involves combining Rwork & Rfree is
likely to exhibit the same behaviour.

Cheers

-- Ian

On Tue, Oct 26, 2010 at 2:52 PM, Ian Tickle <ianj...@gmail.com> wrote:
> Anthony,
>
> Your rule actually works on the difference (Rfree - Rwork/2), not
> (Rfree - Rwork) as you said, so is rather different from what most
> people seem to be using.
>
> For example let's say the current values are Rwork = 20, Rfree = 30,
> so your current test value is (30 - 20/2) = 20.   Then according to
> your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 =
> 20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be
> acceptable by your rule (29 - 16/2 = 21, so the test value is higher).
>  Rwork = 18, Rfree = 28 would represent an improvement by your rule
> (28 - 18/2 = 19, i.e. a lower test value).
>
> You say this criterion "provides a defined end-point", i.e. a minimum
> in the test value above.  However wouldn't other linear combinations
> of Rwork & Rfree also have a defined minimum value?  In particular
> Rfree itself always has a defined minimum with respect to adding
> parameters or changing the weights, so would also satisfy your
> criterion.  There has to be some additional criterion that you are
> relying on to select the particular linear combination (Rfree -
> Rwork.2) over any of the other possible ones?
>
> Cheers
>
> -- Ian
>
> On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony <a...@ansto.gov.au> wrote:
>>
>>
>> One “rule of thumb” based on R and R-free divergence that I impress onto
>> crystallography students is this:
>>
>>
>>
>> If a change in refinement strategy or parameters (eg loosening restraints,
>> introducing TLS) or a round of addition of unimportant water molecules
>> results in a reduction of R that is more than double the reduction in
>> R-free, then don’t do it.
>>
>>
>>
>> This rule of thumb has proven successful in providing a defined end point
>> for building and refining a structure.
>>
>>
>>
>> The rule works on the differential of R – R-free divergence.  I’ve noticed
>> that some structures begin with a bigger divergence than others.  Different
>> Rmerge might explain.
>>
>>
>>
>> Has anyone else found a student in a dark room carefully adding large
>> numbers of partially occupied water molecules?
>>
>>
>>
>>
>>
>>
>>
>> Anthony
>>
>> Anthony Duff    Telephone: 02 9717 3493  Mob: 043 189 1076
>>
>>
>>
>> ________________________________
>>
>> From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem
>> Evdokimov
>> Sent: Tuesday, 26 October 2010 1:45 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] diverging Rcryst and Rfree
>>
>>
>>
>> Not that rules of thumb always have to have a rationale, nor that they're
>> always correct - but it would seem that noise in the data (of which Rmerge
>> is an indicator) should have a significant relationship with the R:Rfree
>> difference, since Rfree is not (should not be, if selected correctly)
>> subject to noise fitting. This rule is easily broken if one refines against
>> very noisy data (e.g. "that last shell with Rmerge of 55% and <I/sigmaI>
>> ratio of 1.3 is still good, right?") or if the structure is overfit. The
>> rule is only an indicative one (i.e. one should get really worried if
>> R-Rfree looks very different from Rmerge) and it breaks down at very high
>> and very low resolution (more complete picture by GK and shown in BR's
>> book).
>>
>> Since selection of data and refinement procedures is subject to the
>> decisions of the practitioner, I suspect that the extreme divergence shown
>> in the figures that you refer to is probably the result of our own
>> collective decisions. I have no proof, but I suspect that if a large enough
>> section of the PDB were to be re-refined using the same methods and the same
>> data trimming practices, the spread would be considerably more narrow.
>> That'd be somewhat hard to do - but may be doable now given the abundance of
>> auto-building and auto-correcting algorithms.
>>
>> Artem
>>
>> On Mon, Oct 25, 2010 at 9:07 PM, Bernhard Rupp (Hofkristallrat a.D.)
>> <hofkristall...@gmail.com> wrote:
>>
>> And the rationale for that rule being exactly what?
>>
>>
>>
>> For stats, see figures 12-23, 12-24
>>
>> http://www.ruppweb.org/garland/gallery/Ch12/index_2.htm
>>
>>
>>
>> br
>>
>>
>>
>> From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem
>> Evdokimov
>> Sent: Monday, October 25, 2010 6:36 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] diverging Rcryst and Rfree
>>
>>
>>
>> http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html
>>
>> as well as some notes in the older posts :)
>>
>> As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the
>> dataset for refinements that are not overfitted.
>>
>> Artem
>>
>> On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi <rjo...@purdue.edu> wrote:
>>
>> Hi all,
>>
>> Can anyone comment, in general, on diverging Rcryst and Rfree values(say>7%)
>> for
>> structures with kind of low resolutions(2.5-2.9 angstroms)?
>>
>> Thanks
>> RJ
>>
>>
>>
>>
>

Re: [ccp4bb] diverging Rcryst and Rfree [SEC=UNCLASSIFIED]

Reply via email to