Hi Bernhard This subject has indeed come up repeatedly and it has never in my view been clearly explained why the optimal RMS(calc-target) geometry value (i.e. bond length, bond angle etc.) in restrained refinements must always be <= RMS(obs-mean(obs)) = SD(obs) from unrestrained refinements, assuming that target = mean(obs) value, i.e. we are using the mean(obs) values from the unrestrained refinements of small molecules as the target values for the restrained macromolecular refinements. We did in fact prove this formally in our 1998 paper (Acta D54, 243-252, see App. B) for the least-squares case and no-one has AFAIK found a flaw in our argument. If anyone thinks there's an error in our proof I challenge them to say what it is ('intuitive' arguments are not acceptable!). Note that there's no reason to believe that the conclusion will differ in the maximum-likelihood case.
This of course begs the question "what precisely do you mean by 'optimal'?". The refinement target is the total likelihood, or more precisely, log(likelihood) for the geometry + X-ray terms. The optimal parameter set is then the one obtained at the global maximum of the total likelihood, and is the one most consistent with the restraints and the X-ray data. The problem of course centres on the optimal choice of relative weighting of these terms, but we showed formally in a second paper (Acta D54, 547-557) that the optimal choice of weights is that which gives the global maximum of the *free* likelihood, provided the refinement had converged at the global maximum of the *working* likelihood. Fortunately this corresponds exactly with intuition so there's hopefully no argument on this point! Then by dint of some hand-waving we showed that making some fairly drastic approximations (such as all reflections having the same weight!) you could show that this also corresponds to minimum Rfree at convergence, which was of course the same conclusion that others had come to less formally. The amount by which the optimal RMSD(calc) value is less than SD(obs) is highly resolution-dependent: at the resolution at which most small-molecule structures are determined (~ 0.8 Ang, assuming of course that significant diffracted intensity is observed out to that resolution!), the difference will obviously be zero, because the relatively small number of restraints will have a negligible effect compared with the huge number of valid X-ray observations, to such an extent in fact that it would be completely pointless to use restraints. Then as the high resolution limit becomes poorer you have fewer reflections, so the difference will increase until in the limit when the observation/parameter ratio <= 1 then the optimal RMSD(calc) = 0. The big question is: how much less is the optimal RMSD(calc) geometry value compared with SD(obs) at any given resolution? Another way of explaining it is to invoke Bayes' theorem: at the high resolution limiting value the X-ray likelihood term dominates the geometry prior so the optimal RMSD(calc) = SD(obs), whereas at the low res limit it's the other way round and RMSD(calc) = 0. Yet another way to look at it is that at typical resolutions for macromolecules the X-ray data actually determines the bond lengths only very poorly as is shown by unrestrained refinement. For example I did an unrestrained Shel-X refinement @ 2 Ang and obtained RMS(calc-target) = 0.22 compared with typical SD(obs) = 0.02. Hence you really have no justification in restrained refinements @ 2 Ang for claiming that d(calc) differs from d(target) by as much as you could justify at say 1 Ang where the atoms are resolved and the bonds lengths are much more accurately determined. If you allow RMSD(calc) to equal SD(obs) then most of the deviation that you're seeing is just noise from the experimental errors, unrelated to any meaningful differences between d(target) and d(true). To claim that these differences are significant would be the same as saying that the bond lengths are determined by the X-ray data equally accurately at low resolution as at high resolution which is clearly nonsense. It's just a good example of "where freedom is given, liberties will be taken". However these arguments are only qualitative so I did some tests with real data. This is 1.33 Ang. data previously refined anisotropically with Shel-X, then I fixed the Biso values equal to Bequiv to avoid any effects from the B-factor restraints. I used autoBuster-TNT because AFAIK other programs don't print the free log-likelihood (LLGfree) values that I need to judge the optimal refinement conditions (Rfree is not sufficiently accurate for this for reasons mentioned above). The starting model in each case was a structure refined such that RMSD(bonds) = 0.01 ('Reference' in the table), and this was refined to convergence using a range of input values of the expected RMSD(bonds). AutoBuster automatically adjusts the relative X-ray vs geometry weight to give this target value. It can be seen that the optimal LLGfree is obtained at RMSD(bonds) = 0.006, i.e. considerably less than what most people would have used for a 1.33 Ang. structure! Note that for the first entry the weight appears to be zero because it's rounded to the nearest integer for the printout, obviously it can't be exactly zero. Also note there is a loss of significant figures in the printout of small values of RMSD(bonds). <RMSD(bonds)> Weight(X-ray) Rwork Rfree RMSD(bonds) LLGfree 0.002 0 0.183 0.221 0.004 0.00156 0.004 112 0.181 0.220 0.004 0.00561 0.006 228 0.178 0.218 0.006 0.00832 <<< Optimum. 0.008 315 0.177 0.218 0.008 0.00648 0.010 444 0.176 0.218 0.010 0 (Reference). 0.012 581 0.175 0.218 0.012 -0.00202 0.016 923 0.174 0.218 0.016 -0.01189 0.020 1297 0.173 0.219 0.019 -0.01851 I also obtained these results @ 2 Ang. resolution cutoff: <RMSD(bonds)> Weight(X-ray) Rwork Rfree RMSD(bonds) LLGfree 0.001 0 0.155 0.222 0.002 -0.15021 <<< Optimum. 0.002 100 0.153 0.223 0.002 -0.18788 0.003 150 0.149 0.222 0.003 -0.20075 0.004 200 0.146 0.221 0.004 -0.21292 0.005 250 0.143 0.221 0.005 -0.22524 i.e. optimum around RMSD = 0.001 to 0.002. Does this mean we should aim for 0.001 - 0.002 RMSD @ 2 Ang.? Absolutely not! Firstly with such tight restraints the structure becomes very stiff and is very likely to stick in one of a multitude of false minima (such as inverted chiral centres). Secondly does it actually matter at 2 Ang. whether the RMSD is 0.001 or 0.02? The value 0.001 may be the formally correct one, but does the value 0.02 actually lead to significant errors in the structure? After all the differences in Rfree are miniscule (the differences in LLGfree are actually grossly exaggerated), the maps are indistinguishable, and you don't really expect to determine bond lengths with an accuracy better than 0.02 anyway. So my conclusion is that at below-atomic resolution this is a completely pointless argument, because it doesn't matter what RMSD you aim for, provided it's reasonable of course (say < 0.02). You will introduce some errors by having the RMSD's higher than the optimal values but these will be insignificant, the most important thing is not to have geometry restraints so tight that the refinement doesn't converge in a sensible amount of time. Happy Easter! -- Ian > -----Original Message----- > From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On > Behalf Of Bernhard Rupp > Sent: 04 April 2007 21:06 > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after > TLS refinement? > > >People also felt that the RMSD bond/angle of 0.016/1.6 was > still a little > high. > > This was subject of a discussion before on the board and I > still don't > understand it: > > If I recall correctly, even in highly accurate and precise > small molecule structures, the rmsd of corresponding > bonds and angles are ~0.014A and 1.8deg. > > It always seems to me that getting these values much below is > not a sign > of crystallographic prowess but over-restraining them? > > Is it just that - given good resolution in the first place - > the balance > of restraints (matrix weight) vs low R (i.e., Xray data) > gives the best > Rfree or lowest gap at (artificially?) lower rmsd? > > Is that then the best model? > > I understand that even thermal vibration accounts for about 1.7 deg > angle deviation - are lower rmsd deviations then a manifestation > of low temp? But that does not seem to be much of an effect, if > one looks at the tables from the CSD small mol data (shown in > nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). > > > Thx, br > > > > Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674