[ccp4bb] Free Reflections as Percent and not a Number

Axel Brunger Thu, 27 Nov 2014 13:05:04 -0800

We just had a chance to read this most interesting discussion. We would agree 
with Ian that jiggling or SA refinement may not be needed if refinement can in 
fact be run to convergence. However, this will be difficult to achieve for 
large 
structures, especially when only moderate to low resolution data are available. 
So, jiggling or SA refinement should help in these cases. We strongly recommend 
testing refinement for a particular case with and without jiggling or simulated 
annealing to determine if the same Rfree value can be achieved by just 
running refinement to convergence.


We also would like to draw attention to the importance of resetting or jiggling 
individual atomic B-factors when switching test sets or starting from a 
structure 
that has been refined against all diffraction data. Individual atomic B-factors 
also confer model bias, and convergence of individual B-factor refinement 
can be sometimes difficult.

As regards Dusan’s recent paper in Acta Cryst D, we would like to draw 
attention to our paper in Structure 20, 957-966 (2012). There we showed that 
cross-validation is useful for certain low resolution refinements were 
additional 
restraints (DEN restraints in that case) are used to prevent overfitting and 
effectively obtain a more accurate structure. All the refinements described 
in that paper were performed in torsion angle space only, producing perfect 
bond lengths and bond angles.  Cross-validation made it possible to detect 
overfitting of the data when no DEN restraints were used. We believe this 
should also apply when other types of restraints are used (e.g., reference 
restraints in phenix.refine, REFMAC, or BUSTER).  So, while the danger of 
overfitting diffraction data with an incorrect model may not be as great as it 
was 20 years ago (at high to moderate resolution) due to the availably of 
many important validation tools, the situation is very different at low 
resolution, where overfitting (even in torsion angle space) is still a very 
real 
possibility, and use of external data or restraints is essential. So, we 
believe that cross-validation remains an important (and conceptually 
simple) method to prevent overfitting and for overall structure validation.

Best regards,
Axel and Paul


Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor and Chair, Dept. of Molecular and Cellular Physiology
Stanford University


Paul Adams
Deputy Division Director, Physical Biosciences Division, Lawrence Berkeley Lab
Division Deputy for Biosciences, Advanced Light Source, Lawrence Berkeley Lab
Adjunct Professor, Department of Bioengineering, U.C. Berkeley
Vice President for Technology, the Joint BioEnergy Institute
Laboratory Research Manager, ENIGMA Science Focus Area

> On Nov 26, 2014, at 8:18 PM, dusan turk <dusan.t...@ijs.si 
> <mailto:dusan.t...@ijs.si>> wrote:
> 
> Hello guys,
> 
> There is too much text in this discussion to respond to every part of it. 
> Apart from “jiggle” in certain software like PHENIX and I believe in X-PLOR 
> derivatives the word “shake” means the same. In the “MAIN” environment I use 
> the word “kick” to randomly distort coordinates. It's first use introduced in 
> the early 90’s was to improve the convergence of model refinement and 
> minimization. I have seen it as a substitute to molecular dynamics under real 
> or reciprocal crystallographic restraints (we call this simulated annealing 
> or slow cooling) as it is computationally much faster.  The procedure in MAIN 
> is called "fast cooling” because the atoms move only under the energy 
> potential energy terms with no kinetics energy present. The “fast cooled” the 
> structure is thus frozen - take from a high energy state to the one with the 
> lowest potential energy reachable. In order to reach the lowest possible 
> point in potential energy landscape the kick at the beginning of each cooling 
> cycle is lowered. The initial kick coordinate size is typically from 0.8A and 
> drops down each cycle down to 0. The experience shows values beyond 0.8 may 
> not lead to recovery of chemically reasonable structure in every part of it. 
> Towards the end of refinement the starting kick is typically reduced to 0.05. 
>  Apart from coordinates also B-factors can be kicked. 
> 
> Are the structures after “kick” cooling refinement the same as without the 
> kick? My over two decades long experience shows that by kicking convergence 
> of refinement is improved. The resulting structures can thus be different as 
> the different repeating cooling cycles may shift them to a lower energy 
> point. However, after the structure is refined (has converged), the different 
> refinements will converge to approximately the same coordinates as Ian 
> described.  I assume the this is the numerical error of the different 
> procedures. As to the use of different TEST sets  we came to a different 
> conclusion (see bellow).
> 
> As to the claim(s) that kicking/jiggling/shaking does or does not  remove the 
> model bias the color of the answer is not black and white, but it is grey. 
> Kicking namely reduces the model bias, but does not eliminate it.  We have 
> shown this in our kick map paper by Praznikar J et al (2009) "Averaged kick 
> maps: less noise, more signal... and probably less bias." Acta Crystallogr D 
> Biol Crystallogr. 921-31.  
> 
> As for the use of % or number of reflection for R-FREE and the TEST, my 
> suggestion is not use the TEST set and concept of R-free at all. Namely, 
> - excluding the data from the target changes the target, because in 
> refinement the information present in every missing reflection can not be 
> recovered from the rest of data.  (The Fourier series terms are orthogonal to 
> each other, therefore information from each reflection is not present in any 
> other reflection.) The absence of certain data thus contains a bias of their 
> absence.
> - In addition, in the ML refinement using the cross validation the shape of 
> ML function is calculated from the TEST set of structure factors of 
> chemically reasonable structure, which is regularized by chemical energy 
> terms and thus contains systematic error. Because of this propagation of this 
> effect on the whole model structure under refinement, the cross validation 
> introduces model bias in refinement.  
> For detailed explanation you are invited to read our paper "Free kick instead 
> of cross-validation in maximum-likelihood refinement of macromolecular 
> crystal structures”  which just appeared on line in Acta Crys D (2014). 
> 
> best regards,
> dusan
> 
> 
> Dr. Dusan Turk, Prof.
> Head of Structural Biology Group http://bio.ijs.si/sbl/ 
> <http://bio.ijs.si/sbl/> 
> Head of Centre for Protein  and Structure Production
> Centre of excellence for Integrated Approaches in Chemistry and Biology of 
> Proteins, Scientific Director
> http://www.cipkebip.org/ <http://www.cipkebip.org/>
> Professor of Structural Biology at IPS "Jozef Stefan"
> e-mail: dusan.t...@ijs.si <mailto:dusan.t...@ijs.si>    
> phone: +386 1 477 3857       Dept. of Biochem.& Mol.& Struct. Biol.
> fax:   +386 1 477 3984       Jozef Stefan Institute
>                        Jamova 39, 1 000 Ljubljana,Slovenia
> Skype: dusan.turk (voice over internet: www.skype.com <http://www.skype.com/>



> On Nov 25, 2014, at 10:41 AM, Tim Gruene <t...@shelx.uni-ac.gwdg.de 
> <mailto:t...@shelx.uni-ac.gwdg.de>> wrote:
> 
> Hi Ed,
> 
> it is an easy excercise to show that theory (according to "by
> definition") and reality greatly diverge - refinement is too complex to
> get back to exactly the same structure. Maybe because one often does not
> reach convergence, no matter how  many cycles of refinement you run.
> 
> Best,
> Tim
> 
> On 11/25/2014 07:29 PM, Edward A. Berry wrote:
>>> provided the jiggling keeps the structure inside the convergence
>>> radius of refinement, then by definition the refinement will produce
>>> the same result irrespective of the starting point (i.e. jiggled or
>>> not).  If the jiggling takes the structure outside the radius of
>>> convergence then the original structure will not be retrievable
>>> without manual rebuilding: I'm assuming that's not the goal here.
>> 
>> 
>> I actually agree with this, but an R-free purist might argue that you
>> have to get outside of radius of convergence to eliminate R-free bias.
>> Otherwise, by definition, "you will just refine back to the same old
>> biased structure!".
>> (but you have shown that the conventional .2A rms is within radius of
>> convergence)
>> 
>> In fact Dale's concern about low-res reflections could be put in terms
>> of radius of convergence and false minima.
>> Moving a lot of atoms by .2 A will have a significant effect on the
>> phase of a 2A reflection, but almost no effect on a 20A reflection. Say
>> you have refined against all the low resolution reflections, and got a
>> structure that fits better than it should because it is fitting the
>> noise in the free reflections. Now take away the free reflections and
>> continue to refine. It will drop into the nearest local minimum, which
>> since it is near the solution with all reflections, will still give
>> artificially low R-free.  Jiggling by 0.2 A will have no effect because
>> the local minima are are extremely broad and shallow, as far as the
>> low-res reflections go.
>> 
>> But then you could say that since any local minima are so broad, all
>> structures that are even slightly reasonable, (including the correct
>> one) will be within radius of convergence of the same minimum as far as
>> the low-res reflections are concerned. The nearest false minimum
>> involves moving atoms by 5-10 A, so within reason the convergence point
>> will be completely independent of the starting structure. Presumably
>> this is why Phenix rigid body refinement starts out at ultra-low
>> resolution: to increase the radius of convergence. From that
>> perspective, rather than being the worrisome part, the low-resolution is
>> the region where we can assume Ian's assumption is correct.
>> 
>> What about another experiment, which I think we've discussed before.
>> Take a structure refined to convergence with a pristine free set. Now
>> refine to convergence against all the data. The purist will say that the
>> free set is hopelessly corrupted. And sure enough when we take that
>> structure and calculate free-R with the original set, R-free is same as
>> R-work within statistical significance.  But- I guess adding the extra
>> 5% reflections will not change any atomic position by more than 0.2 A
>> (maybe 0.02A), and so we are still well within radius of convergence of
>> the original unbiased structure. Refining against the original working
>> set will give back that unbiased structure, and Rfree will return to it
>> original value.
>> 
>> This suggest, if the only purpose of Rfree is to get a number to deposit
>> with the pdb (which it is not), you should first solve your structure
>> using all the data, fitting the noise; then exclude a free set and back
>> off on fitting the noise of it to get the R-free.  The only problem
>> would be that during the refinement without guidance of R-free, you may
>> have engaged in some practice that hurt the structure so much that it
>> ends up out of RoC of the well-refined structure. Not because you were
>> fitting the noise (anyway you are fitting the noise in your 95% working
>> set) but because you would not have been warned that some procedure was
>> not helping.
>> 
>> Very provocative discussion!
>> eab
>> 
>> 
>>> On 11/25/2014 11:03 AM, Ian Tickle wrote:
>>> Dear All
>>> 
>>> I'd like to raise the question again of whether any of this 'jiggling'
>>> (i.e. addition of random noise to the co-ordinates) is really
>>> necessary anyway, notwithstanding Dale's valid point that even if it
>>> were necessary, jiggling in its present incarnation is unlikely to
>>> work because it's unlikely to erase the influence of low res. reflexions.
>>> 
>>> My claim is that jiggling is completely unnecessary, because I
>>> maintain that refinement to convergence is alI that is required to
>>> remove the bias when an alternate test set is selected.  In fact I
>>> claim that it's the refinement, not the jiggling, that's wholly
>>> responsible for removing the bias.  I know we thrashed this out a
>>> while back and I recall that the discussion ended with a challenge to
>>> me to prove my claim that the refine-only Rfrees are indeed unbiased. 
>>> I couldn't see an easy way of doing this which didn't involve
>>> rebuilding and re-refining the same structure 20 times over, without
>>> introducing any observer bias.
>>> 
>>> The present discussion prompted me to think again about this and I
>>> believe I can prove part of my claim quite easily, that jiggling has
>>> no effect on the results.  Proving that the resulting Rfrees are
>>> unbiased is much harder, since as we've seen there's no proof that
>>> jiggling actually removes the bias as claimed by its proponents. 
>>> However given that said proponents of jiggling+refinement have been
>>> happy to accept for many years that their results are unbiased, then
>>> they must be equally happy now to accept that the refinement-only
>>> results are also unbiased, provided I can demonstrate that the
>>> difference between the results is insignificant.
>>> 
>>> The experimental proof rests on comparison between the Rfrees and
>>> RMSDs of the jiggled+refined and the refined-only structures for the
>>> 19 possible alternate test sets (assuming 5% test-set size).  If
>>> jiggling makes no difference as I claim then there should be no
>>> significant difference between the Rfrees and insignificant RMSDs for
>>> all pairs of alternate test sets.
>>> 
>>> However, first we must be careful to establish what is a suitable
>>> value for the noise magnitude to add to the co-ordinates.  If it's too
>>> small it won't remove the bias (again notwithstanding Dale's point
>>> that it's unlikely to have any effect anyway on the low res. data);
>>> too large and you push it beyond the convergence radius of the
>>> refinement and end up damaging the structure irretrievably (at least
>>> unless you're prepared to do significant rebuilding of the model).
>>> 
>>> For the record here's the crystal info for the test data I selected:
>>> 
>>> Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
>>> Resol: 40-1.58 A.
>>> Working set size: 11563   Test set size: 611 (5%)   Test set: 0
>>> Refinement program:     BUSTER.
>>> Noise addition program: PDBSET.
>>> 
>>> It's wise to choose a small protein since you need to run lots of
>>> refinements!  However feel free to try the same thing with your own data.
>>> 
>>> First I took care that the starting model was refined to convergence
>>> using the original test set 0, and I performed 2 sequential runs of
>>> refinement with BUSTER (the deviations are relative to the input
>>> co-ordinates in each case):
>>> 
>>> Ncyc  Rwork   Rfree   RMSD MaxDev
>>> 82     0.181  0.230     0.005   0.072
>>> 51     0.181  0.231     0.002   0.015
>>> 
>>> The advantage of using BUSTER is that it has its own convergence test;
>>> with REFMAC you have to guess.
>>> 
>>> Then I tried a range of input noise values (0.20, 0.25. 0.30, 0.35,
>>> 0.40, 0.50 A) on the refined starting model.  Note that these are
>>> RMSDs, not maximum shifts as claimed by the PDBSET documentation.  In
>>> each case I did 4 sequential runs of BUSTER on the jiggled
>>> co-ordinates and by looking at the RMSDs and max. shifts I decided
>>> that 0.25 A RMSD was all the structure could stand without risking
>>> permanent damage (note that the default noise value in PDBSET is 0.2):
>>> 
>>> Initial RMSD: 0.248  MaxDev: 0.407
>>> 
>>> Ncyc  Rwork   Rfree   RMSD  MaxDev
>>> 358    0.183   0.230    0.052    0.454
>>> 126    0.181   0.232    0.041    0.383
>>> 65    0.181   0.232    0.040    0.368
>>> 50    0.181   0.232    0.040    0.360
>>> 
>>> The only purpose of the above refinements is to establish the most
>>> suitable noise value; the resulting refined PDB files were not used.
>>> 
>>> So then I took the co-ordinates with 0.25 A noise added and for each
>>> test set 1-19 did 2 sequential runs of BUSTER.
>>> 
>>> Finally I took the original refined starting model (i.e. without noise
>>> addition) and again refined to convergence using all 19 alternate test
>>> sets.
>>> 
>>> The results are attached.  The correlation coefficient between the 2
>>> sets of Rfrees is 0.992 and the mean RMSD between the sets is 0.04 A,
>>> so the difference between the 2 sets is indeed insignificant.
>>> 
>>> I don't find this result surprising at all: provided the jiggling
>>> keeps the structure inside the convergence radius of refinement, then
>>> by definition the refinement will produce the same result irrespective
>>> of the starting point (i.e. jiggled or not).  If the jiggling takes
>>> the structure outside the radius of convergence then the original
>>> structure will not be retrievable without manual rebuilding: I'm
>>> assuming that's not the goal here.
>>> 
>>> I suspect that the idea of jiggling may have come about because
>>> refinements have not always been carried through to convergence:
>>> clearly if you don't do a proper job of refinement then you must
>>> expect some of the original bias to remain.  Also to head off the
>>> suggestion that simulated annealing refinement would fix this I would
>>> suggest that any kind of SA refinement is only of value for initial MR
>>> models when there may be significant systematic error in the model;
>>> it's not generally advisable to perform it on final refined models
>>> (jiggled or not) when there is no such systematic error present.
>>> 
>>> Cheers
>>> 
>>> -- Ian
>>> 
>>> 
>>> On 21 November 2014 18:56, Dale Tronrud <de...@daletronrud.com 
>>> <mailto:de...@daletronrud.com>
>>> <mailto:de...@daletronrud.com <mailto:de...@daletronrud.com>>> wrote:
>> 
>> 
>>> On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
>>> <snip...>
>> 
>>> As to the convenience of carrying over a test set to another
>>> dataset, Eleanor made a suggestion to circumvent this necessity
>>> some time ago: pass your coordinates through pdbset and add some
>>> noise before refinement:
>> 
>>> pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof
>> 
>> 
>> I've heard this "debiasing" procedure proposed before, but I've
>> never seen a proper test showing that it works.  I'm concerned that
>> this will not erase the influence of low resolution reflections that
>> were in the old working set but are now in the new test set.  While
>> adding 0.4 A gaussian noise to a model would cause large changes to
>> the 2 A structure factors I doubt it would do much to those at 10 A.
>> 
>> It seems to me that one would have to have random, but
>>>> correlated,
>> shifts in atomic parameters to affect the low resolution data - waves
>> of displacements, sometimes to the left and other times to the right.
>> You would need, of course, a superposition of such waves that span
>> all the scales of resolution in the data set.
>> 
>> Has anyone looked at the pdbset jiggling results and shown
>>>> that the
>> low resolution data are scrambled?
>> 
>> Dale Tronrud
>> 
>>> Xavier
>> 
>>>> On 20/11/14 11:43 PM, Keller, Jacob wrote:
>>>> Dear Crystallographers,
>> 
>>>> I thought that for reliable values for Rfree, one needs only to
>>>> satisfy counting statistics, and therefore using at most a couple
>>>> thousand reflections should always be sufficient. Almost always,
>>>> however, some seemingly-arbitrary percentage of reflections is
>>>> used, say 5%. Is there any rationale for using a percentage
>>>> rather than some absolute number like 1000?
>> 
>>>> All the best,
>> 
>>>> Jacob
>> 
>>>> ******************************************* Jacob Pearson Keller,
>>>> PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
>>>> Ashburn, VA 20147 email:kell...@janelia.hhmi.org 
>>>> <mailto:kell...@janelia.hhmi.org>
>>>> <mailto:kell...@janelia.hhmi.org <mailto:kell...@janelia.hhmi.org>>
>>>> ******************************************* .

>>>

[ccp4bb] Free Reflections as Percent and not a Number

Reply via email to