Anthony, I have used the minimum of -LLfree (i.e. same as maximum free likelihood) as a stopping rule for both weight optimisation and adding waters, the former because it seems to be well justified by theory (Gerard Bricogne's that is); also it's obviously very similar to Axel Brunger's min(Rfree) rule for weight optimisation which seems to work well. I use it for adding waters because it seems to give a reasonable number of waters. Changes in Rfree seem to roughly mirror changes in -LLfree, though they don't necessarily have minima at the same points in parameter space; I guess that's not surprising since unlike LLfree, Rfree is unweighted. Using the min(-LLfree) rule routinely for weight optimisation would be quite time consuming, so now I just use a target RMS-Z(bonds) value based on a linear fit of RMS-Z(bonds) vs resolution obtained from PDB-REDO refinements, where the min(-LLfree) rule was used.
I haven't done a systematic study to see whether it can be used to decide whether or not adding TLS parameters improves the model, but in most of the cases I looked at (though admittedly not all) using TLS reduces Rfree and -LLfree, or at least doesn't cause them to increase significantly, so now I just use TLS routinely (like most other people I guess!). If I were being totally consistent with the use of my rule, I should really test -LLfree after using TLS and if it does increase then throw away the TLS model! This area could benefit from more careful investigation! I also tried min(Rfree-Rwork) as a stopping rule for weight optimisation and adding waters but it didn't give good results (i.e. the number of waters added seemed unrealistic). I haven't tried your rule min(Rfree-Rwork/2) in either case, and it may indeed turn out that it works better than mine. I was just interested to know whether you had arrived at your rule by experimentation, and if so how it compared with other possible rules. I do have one reservation about your rule; the same also applies to the min(Rfree-Rwork) rule: you can get situations where a decrease in both Rwork and Rfree corresponds to a worse model according to the rule, and conversely an increase in Rwork and Rfree corresponds to an improved model. This looks counter-intuitive to me: intuition tells me that a model which is more consistent with all of the experimental data (i.e. both the working and test sets) is a better model and one which is less consistent is a worse one. Admittedly intuition has been known to lead one astray and it may be the case that the model with lower Rwork & Rfree is worse if judged by the deviations from the target geometry; however it doesn't seem likely that one would in practice get a lower Rfree with worse geometry unless really unlucky! For example, starting with a model with Rwork = 20, Rfree = 30 as before (test value = 20), consider a model with Rwork = 16, Rfree = 29: the test value = 21, so a worse model by your rule. Conversely consider a model with Rwork = 24, Rfree = 31: test value = 19, so a better model by your rule. As I said this behaviour is not peculiar to your rule; any rule which involves combining Rwork & Rfree is likely to exhibit the same behaviour. Cheers -- Ian On Tue, Oct 26, 2010 at 2:52 PM, Ian Tickle <ianj...@gmail.com> wrote: > Anthony, > > Your rule actually works on the difference (Rfree - Rwork/2), not > (Rfree - Rwork) as you said, so is rather different from what most > people seem to be using. > > For example let's say the current values are Rwork = 20, Rfree = 30, > so your current test value is (30 - 20/2) = 20. Then according to > your rule Rwork = 18, Rfree = 29 is equally acceptable (29 - 18/2 = > 20, i.e. same test value), whereas Rwork = 16, Rfree = 29 would not be > acceptable by your rule (29 - 16/2 = 21, so the test value is higher). > Rwork = 18, Rfree = 28 would represent an improvement by your rule > (28 - 18/2 = 19, i.e. a lower test value). > > You say this criterion "provides a defined end-point", i.e. a minimum > in the test value above. However wouldn't other linear combinations > of Rwork & Rfree also have a defined minimum value? In particular > Rfree itself always has a defined minimum with respect to adding > parameters or changing the weights, so would also satisfy your > criterion. There has to be some additional criterion that you are > relying on to select the particular linear combination (Rfree - > Rwork.2) over any of the other possible ones? > > Cheers > > -- Ian > > On Tue, Oct 26, 2010 at 6:33 AM, DUFF, Anthony <a...@ansto.gov.au> wrote: >> >> >> One “rule of thumb” based on R and R-free divergence that I impress onto >> crystallography students is this: >> >> >> >> If a change in refinement strategy or parameters (eg loosening restraints, >> introducing TLS) or a round of addition of unimportant water molecules >> results in a reduction of R that is more than double the reduction in >> R-free, then don’t do it. >> >> >> >> This rule of thumb has proven successful in providing a defined end point >> for building and refining a structure. >> >> >> >> The rule works on the differential of R – R-free divergence. I’ve noticed >> that some structures begin with a bigger divergence than others. Different >> Rmerge might explain. >> >> >> >> Has anyone else found a student in a dark room carefully adding large >> numbers of partially occupied water molecules? >> >> >> >> >> >> >> >> Anthony >> >> Anthony Duff Telephone: 02 9717 3493 Mob: 043 189 1076 >> >> >> >> ________________________________ >> >> From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem >> Evdokimov >> Sent: Tuesday, 26 October 2010 1:45 PM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] diverging Rcryst and Rfree >> >> >> >> Not that rules of thumb always have to have a rationale, nor that they're >> always correct - but it would seem that noise in the data (of which Rmerge >> is an indicator) should have a significant relationship with the R:Rfree >> difference, since Rfree is not (should not be, if selected correctly) >> subject to noise fitting. This rule is easily broken if one refines against >> very noisy data (e.g. "that last shell with Rmerge of 55% and <I/sigmaI> >> ratio of 1.3 is still good, right?") or if the structure is overfit. The >> rule is only an indicative one (i.e. one should get really worried if >> R-Rfree looks very different from Rmerge) and it breaks down at very high >> and very low resolution (more complete picture by GK and shown in BR's >> book). >> >> Since selection of data and refinement procedures is subject to the >> decisions of the practitioner, I suspect that the extreme divergence shown >> in the figures that you refer to is probably the result of our own >> collective decisions. I have no proof, but I suspect that if a large enough >> section of the PDB were to be re-refined using the same methods and the same >> data trimming practices, the spread would be considerably more narrow. >> That'd be somewhat hard to do - but may be doable now given the abundance of >> auto-building and auto-correcting algorithms. >> >> Artem >> >> On Mon, Oct 25, 2010 at 9:07 PM, Bernhard Rupp (Hofkristallrat a.D.) >> <hofkristall...@gmail.com> wrote: >> >> And the rationale for that rule being exactly what? >> >> >> >> For stats, see figures 12-23, 12-24 >> >> http://www.ruppweb.org/garland/gallery/Ch12/index_2.htm >> >> >> >> br >> >> >> >> From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Artem >> Evdokimov >> Sent: Monday, October 25, 2010 6:36 PM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] diverging Rcryst and Rfree >> >> >> >> http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg04677.html >> >> as well as some notes in the older posts :) >> >> As a very basic rule of thumb, Rfree-Rwork tends to be around Rmerge for the >> dataset for refinements that are not overfitted. >> >> Artem >> >> On Mon, Oct 25, 2010 at 4:10 PM, Rakesh Joshi <rjo...@purdue.edu> wrote: >> >> Hi all, >> >> Can anyone comment, in general, on diverging Rcryst and Rfree values(say>7%) >> for >> structures with kind of low resolutions(2.5-2.9 angstroms)? >> >> Thanks >> RJ >> >> >> >> >