Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
I see that Dale and I are in pretty well complete agreement on this subject (even though I honestly hadn't read Dale's response when I sent mine!) - I think we now have a definitive explanation, so hopefully this will be the last time that this question comes up, or if not at least we now have a useful thread that future queries on this subject can be referred to! I would like to make one further point, and in fact caution *against* using Rfree directly as an indicator of the optimal weight as has been suggested in the literature & elsewhere. I gave some reasons why Rfree is not sufficiently accurate for this in my previous response: what theory we have suggests strongly that the free log-likelihood gain (LLGfree) is the correct statistic to use, and that the Rfree minimum approximates the LLGfree maximum only poorly. My point is that not all SF calculation programs even compute R factors using the same formula! The 'conventional/textbook' definition of R (which I believe I'm correct in saying is the way it's defined in Refmac) is R = Sum|Fo-Fc|/Sum(Fo) where Fo and Fc are the observed & calculated structure amplitudes. This is the form of R factor that is really appropriate only when least-squares is the optimisation method. The program I used (Buster-TNT) computes R factors using the phase-probability weighted F ('Fexpect') in place of Fc, which is the more appropriate form when maximum likelihood optimisation is used, and means that this form of Rfree gives a much better approximation of the LLGfree maximum (even though it is still actually quite poor!). Clearly the solution to all this is *not* to use Rfree at all for this purpose and use LLGfree instead, which all ML-based programs can actually easily calculate. One last point: when this subject came up last, the issue of whether it's valid at all to 'contaminate' the test set by using any kind of 'free' statistic in this way was raised. The answer is I think that there is inevitably some contamination, but that it's insignificant. The reason is that the number of weighting parameters determined in this way (don't forget that the test set is also used to determine sigma-A values), is very small compared with the number of variable parameters in restrained refinement (i.e. typically 4 per atom), so that the reduction in the number of degrees of freedom is insignificant. The alternative of not using the test set in the calculation would undoubtedly lead to even bigger errors. Cheers -- Ian > -Original Message- > From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On > Behalf Of Dale Tronrud > Sent: 04 April 2007 22:33 > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after > TLS refinement? > > Bernhard Rupp wrote: > >> People also felt that the RMSD bond/angle of 0.016/1.6 was > still a little > > high. > > > > This was subject of a discussion before on the board and I > still don't > > understand it: > > > > If I recall correctly, even in highly accurate and precise > > small molecule structures, the rmsd of corresponding > > bonds and angles are ~0.014A and 1.8deg. > > > > It always seems to me that getting these values much below > is not a sign > > of crystallographic prowess but over-restraining them? > > > > Is it just that - given good resolution in the first place > - the balance > > of restraints (matrix weight) vs low R (i.e., Xray data) > gives the best > > Rfree or lowest gap at (artificially?) lower rmsd? > > > > Is that then the best model? > > > > I understand that even thermal vibration accounts for about 1.7 deg > > angle deviation - are lower rmsd deviations then a manifestation > > of low temp? But that does not seem to be much of an effect, if > > one looks at the tables from the CSD small mol data (shown in > > nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). > > > > > This is an on-going topic of discussion so let me put in > my two cents. > > We calculate libraries of "ideal geometry" based on precise, small > molecule structures. When these small molecule crystal structures are > compared to our derived libraries they are found to contain > deviations. > These deviations are larger than the uncertainty in these models and > are presumed to reflect real features of the molecule; perturbations > due to the local environment in the crystal. > > These same perturbations are present in our crystals and we should > expect to find deviations from "ideal geometry" on the same scale as > that seen in the precise models. This expectation lead to > the practice > in the 19
Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
AutoBuster automatically adjusts the relative X-ray vs geometry weight to give this target value. It can be seen that the optimal LLGfree is obtained at RMSD(bonds) = 0.006, i.e. considerably less than what most people would have used for a 1.33 Ang. structure! Note that for the first entry the weight appears to be zero because it's rounded to the nearest integer for the printout, obviously it can't be exactly zero. Also note there is a loss of significant figures in the printout of small values of RMSD(bonds). Weight(X-ray) Rwork Rfree RMSD(bonds) LLGfree 0.00200.183 0.221 0.004 0.00156 0.004 1120.181 0.220 0.004 0.00561 0.006 2280.178 0.218 0.006 0.00832 <<< Optimum. 0.008 3150.177 0.218 0.008 0.00648 0.010 4440.176 0.218 0.010 0 (Reference). 0.012 5810.175 0.218 0.012-0.00202 0.016 9230.174 0.218 0.016-0.01189 0.020 12970.173 0.219 0.019-0.01851 I also obtained these results @ 2 Ang. resolution cutoff: Weight(X-ray) Rwork Rfree RMSD(bonds) LLGfree 0.00100.155 0.222 0.002-0.15021 <<< Optimum. 0.002 1000.153 0.223 0.002-0.18788 0.003 1500.149 0.222 0.003-0.20075 0.004 2000.146 0.221 0.004-0.21292 0.005 2500.143 0.221 0.005-0.22524 i.e. optimum around RMSD = 0.001 to 0.002. Does this mean we should aim for 0.001 - 0.002 RMSD @ 2 Ang.? Absolutely not! Firstly with such tight restraints the structure becomes very stiff and is very likely to stick in one of a multitude of false minima (such as inverted chiral centres). Secondly does it actually matter at 2 Ang. whether the RMSD is 0.001 or 0.02? The value 0.001 may be the formally correct one, but does the value 0.02 actually lead to significant errors in the structure? After all the differences in Rfree are miniscule (the differences in LLGfree are actually grossly exaggerated), the maps are indistinguishable, and you don't really expect to determine bond lengths with an accuracy better than 0.02 anyway. So my conclusion is that at below-atomic resolution this is a completely pointless argument, because it doesn't matter what RMSD you aim for, provided it's reasonable of course (say < 0.02). You will introduce some errors by having the RMSD's higher than the optimal values but these will be insignificant, the most important thing is not to have geometry restraints so tight that the refinement doesn't converge in a sensible amount of time. Happy Easter! -- Ian > -Original Message- > From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On > Behalf Of Bernhard Rupp > Sent: 04 April 2007 21:06 > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after > TLS refinement? > > >People also felt that the RMSD bond/angle of 0.016/1.6 was > still a little > high. > > This was subject of a discussion before on the board and I > still don't > understand it: > > If I recall correctly, even in highly accurate and precise > small molecule structures, the rmsd of corresponding > bonds and angles are ~0.014A and 1.8deg. > > It always seems to me that getting these values much below is > not a sign > of crystallographic prowess but over-restraining them? > > Is it just that - given good resolution in the first place - > the balance > of restraints (matrix weight) vs low R (i.e., Xray data) > gives the best > Rfree or lowest gap at (artificially?) lower rmsd? > > Is that then the best model? > > I understand that even thermal vibration accounts for about 1.7 deg > angle deviation - are lower rmsd deviations then a manifestation > of low temp? But that does not seem to be much of an effect, if > one looks at the tables from the CSD small mol data (shown in > nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). > > > Thx, br > > > > Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Comp
Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
Bernhard Rupp wrote: People also felt that the RMSD bond/angle of 0.016/1.6 was still a little high. This was subject of a discussion before on the board and I still don't understand it: If I recall correctly, even in highly accurate and precise small molecule structures, the rmsd of corresponding bonds and angles are ~0.014A and 1.8deg. It always seems to me that getting these values much below is not a sign of crystallographic prowess but over-restraining them? Is it just that - given good resolution in the first place - the balance of restraints (matrix weight) vs low R (i.e., Xray data) gives the best Rfree or lowest gap at (artificially?) lower rmsd? Is that then the best model? I understand that even thermal vibration accounts for about 1.7 deg angle deviation - are lower rmsd deviations then a manifestation of low temp? But that does not seem to be much of an effect, if one looks at the tables from the CSD small mol data (shown in nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). This is an on-going topic of discussion so let me put in my two cents. We calculate libraries of "ideal geometry" based on precise, small molecule structures. When these small molecule crystal structures are compared to our derived libraries they are found to contain deviations. These deviations are larger than the uncertainty in these models and are presumed to reflect real features of the molecule; perturbations due to the local environment in the crystal. These same perturbations are present in our crystals and we should expect to find deviations from "ideal geometry" on the same scale as that seen in the precise models. This expectation lead to the practice in the 1980's of setting r.m.s. targets of 0.02A and 3 degrees for agreement to bond length and angle libraries. While this seems quite reasonable, we are left with the question: Are the deviations from "ideal geometry" we see in a particular model in any way related to the actual deviations of the molecule in the crystal? The uncertainties (su's) of the bond lengths in a model based on 4A diffraction data are huge compared to the absolute value of the true deviation. For example, if the model had a deviation from "ideal geometry" of 0.02A but the uncertainty of the distance is 0.2A can we say that we have detected a signal that is significantly different than zero, the null hypothesis? If we have a model with a collection of deviations from "ideal geometry" but we have no expectation that those deviations are indicative of the true deviations of the molecule in the crystal, are those deviations serving any purpose? If they do not reflect any property of the crystal they are noise and should be filtered out. By this argument a model based on 4A resolution diffraction data should have no deviation from "idea geometry" while one based on 0.9A diffraction data should have no restraints on "ideal geometry" since the deviations are probably all real and significant (except for specific regions of the molecule that have problems). The problem we all face is the vast area between these extremes, compounded by our inability to calculate proper uncertainties for the parameters of our models. The free R is our current tool-of-choice when it comes to attempting to judge the statistical significance of aspects of our model, without performing proper statistical tests which we don't know how to do. If we allow our model the freedom to deviate from our library and the free R improves a "significant" (??) amount then the resulting deviations must have some similarity to the true deviations in the crystal, but if the free R does not improve then the deviations must not be related to reality and should be suppressed. This is the type of assumption we make whenever we use the free R to make a choice. What we end of doing is not making a yes/no decision but instead we variably suppress the amplitude of the deviations from "idea geometry" and that is harder to justify. I think a reasonable argument can be made, but I have already written too many words in this letter. It doesn't really matter because we left the road of mathematical rigor when we took the R free path. Unfortunately, many people have ignored what Brunger said in Methods in Enzymology about choosing your X-ray/geometry weight based on the free R and just starting saying "the rms bond length deviation must be 0.007A". The deviations from "idea geometry" of your model should be no more or no less than what you can justifiably claim is a reflection of the true state of the molecule in your crystal. Dale Tronrud
Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
>People also felt that the RMSD bond/angle of 0.016/1.6 was still a little high. This was subject of a discussion before on the board and I still don't understand it: If I recall correctly, even in highly accurate and precise small molecule structures, the rmsd of corresponding bonds and angles are ~0.014A and 1.8deg. It always seems to me that getting these values much below is not a sign of crystallographic prowess but over-restraining them? Is it just that - given good resolution in the first place - the balance of restraints (matrix weight) vs low R (i.e., Xray data) gives the best Rfree or lowest gap at (artificially?) lower rmsd? Is that then the best model? I understand that even thermal vibration accounts for about 1.7 deg angle deviation - are lower rmsd deviations then a manifestation of low temp? But that does not seem to be much of an effect, if one looks at the tables from the CSD small mol data (shown in nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). Thx, br
Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?
Hello, Thanks very much to all who replied with thoughts and suggestions. The consensus was that my interpretation was not correct, and it is not valid to stop Refmac after TLS refinement. People also felt that the RMSD bond/angle of 0.016/1.6 was still a little high. Phenix.refine was suggested as a complete solution, with SA and TLS in the same package. All the best, Nick --On 28 March 2007 16:34 +0100 "NM Burton, Biochemistry" <[EMAIL PROTECTED]> wrote: Hello, I've refined a structure with CNS to Rwork/free=0.226/0.273. I switched to Refmac5.2 to take advantage of TLS refinement and set up a run with 10 cycles of TLS refinement (groups as suggested by the TLSMD server) followed by 10 cycles of restrained co-ordinate refinement. After the TLS cycles the model was improved (Rwork/free=0.222/0.241), however during co-ordinate refinement Rfree refined up (final Rwork/free=0.197/0.264). My understanding would be that the TLS refinement is modelling the ADPs most accurately, but that Refmac's co-ordinate refinement is over-fitting slightly. Would this seem correct? And if so, is it valid to run Refmac with no cycles of co-ordinate refinement and take the resulting model as the final structure? Thanks very much, Nick -- NM Burton, Biochemistry [EMAIL PROTECTED] -- NM Burton, Biochemistry [EMAIL PROTECTED]