Hi Bernhard

This subject has indeed come up repeatedly and it has never in my view
been clearly explained why the optimal RMS(calc-target) geometry value
(i.e. bond length, bond angle etc.) in restrained refinements must
always be <= RMS(obs-mean(obs)) = SD(obs) from unrestrained refinements,
assuming that target = mean(obs) value, i.e. we are using the mean(obs)
values from the unrestrained refinements of small molecules as the
target values for the restrained macromolecular refinements.  We did in
fact prove this formally in our 1998 paper (Acta D54, 243-252, see App.
B) for the least-squares case and no-one has AFAIK found a flaw in our
argument.  If anyone thinks there's an error in our proof I challenge
them to say what it is ('intuitive' arguments are not acceptable!).
Note that there's no reason to believe that the conclusion will differ
in the maximum-likelihood case.

This of course begs the question "what precisely do you mean by
'optimal'?".  The refinement target is the total likelihood, or more
precisely, log(likelihood) for the geometry + X-ray terms.  The optimal
parameter set is then the one obtained at the global maximum of the
total likelihood, and is the one most consistent with the restraints and
the X-ray data.  The problem of course centres on the optimal choice of
relative weighting of these terms, but we showed formally in a second
paper (Acta D54, 547-557) that the optimal choice of weights is that
which gives the global maximum of the *free* likelihood, provided the
refinement had converged at the global maximum of the *working*
likelihood.  Fortunately this corresponds exactly with intuition so
there's hopefully no argument on this point!  Then by dint of some
hand-waving we showed that making some fairly drastic approximations
(such as all reflections having the same weight!) you could show that
this also corresponds to minimum Rfree at convergence, which was of
course the same conclusion that others had come to less formally.

The amount by which the optimal RMSD(calc) value is less than SD(obs) is
highly resolution-dependent: at the resolution at which most
small-molecule structures are determined (~ 0.8 Ang, assuming of course
that significant diffracted intensity is observed out to that
resolution!), the difference will obviously be zero, because the
relatively small number of restraints will have a negligible effect
compared with the huge number of valid X-ray observations, to such an
extent in fact that it would be completely pointless to use restraints.
Then as the high resolution limit becomes poorer you have fewer
reflections, so the difference will increase until in the limit when the
observation/parameter ratio <= 1 then the optimal RMSD(calc) = 0.  The
big question is: how much less is the optimal RMSD(calc) geometry value
compared with SD(obs) at any given resolution?

Another way of explaining it is to invoke Bayes' theorem: at the high
resolution limiting value the X-ray likelihood term dominates the
geometry prior so the optimal RMSD(calc) = SD(obs), whereas at the low
res limit it's the other way round and RMSD(calc) = 0.

Yet another way to look at it is that at typical resolutions for
macromolecules the X-ray data actually determines the bond lengths only
very poorly as is shown by unrestrained refinement.  For example I did
an unrestrained Shel-X refinement @ 2 Ang and obtained RMS(calc-target)
= 0.22 compared with typical SD(obs) = 0.02.  Hence you really have no
justification in restrained refinements @ 2 Ang for claiming that
d(calc) differs from d(target) by as much as you could justify at say 1
Ang where the atoms are resolved and the bonds lengths are much more
accurately determined.  If you allow RMSD(calc) to equal SD(obs) then
most of the deviation that you're seeing is just noise from the
experimental errors, unrelated to any meaningful differences between
d(target) and d(true).  To claim that these differences are significant
would be the same as saying that the bond lengths are determined by the
X-ray data equally accurately at low resolution as at high resolution
which is clearly nonsense.  It's just a good example of "where freedom
is given, liberties will be taken".

However these arguments are only qualitative so I did some tests with
real data.  This is 1.33 Ang. data previously refined anisotropically
with Shel-X, then I fixed the Biso values equal to Bequiv to avoid any
effects from the B-factor restraints.  I used autoBuster-TNT because
AFAIK other programs don't print the free log-likelihood (LLGfree)
values that I need to judge the optimal refinement conditions (Rfree is
not sufficiently accurate for this for reasons mentioned above).  The
starting model in each case was a structure refined such that
RMSD(bonds) = 0.01 ('Reference' in the table), and this was refined to
convergence using a range of input values of the expected RMSD(bonds).
AutoBuster automatically adjusts the relative X-ray vs geometry weight
to give this target value.  It can be seen that the optimal LLGfree is
obtained at RMSD(bonds) = 0.006, i.e. considerably less than what most
people would have used for a 1.33 Ang. structure!  Note that for the
first entry the weight appears to be zero because it's rounded to the
nearest integer for the printout, obviously it can't be exactly zero.
Also note there is a loss of significant figures in the printout of
small values of RMSD(bonds).

<RMSD(bonds)>  Weight(X-ray)  Rwork  Rfree  RMSD(bonds)  LLGfree
    0.002            0        0.183  0.221     0.004     0.00156
    0.004          112        0.181  0.220     0.004     0.00561
    0.006          228        0.178  0.218     0.006     0.00832  <<<
Optimum.
    0.008          315        0.177  0.218     0.008     0.00648
    0.010          444        0.176  0.218     0.010     0
(Reference).
    0.012          581        0.175  0.218     0.012    -0.00202
    0.016          923        0.174  0.218     0.016    -0.01189
    0.020         1297        0.173  0.219     0.019    -0.01851

I also obtained these results @ 2 Ang. resolution cutoff:

<RMSD(bonds)>  Weight(X-ray)  Rwork  Rfree  RMSD(bonds)  LLGfree
    0.001            0        0.155  0.222     0.002    -0.15021  <<<
Optimum.
    0.002          100        0.153  0.223     0.002    -0.18788
    0.003          150        0.149  0.222     0.003    -0.20075
    0.004          200        0.146  0.221     0.004    -0.21292
    0.005          250        0.143  0.221     0.005    -0.22524

i.e. optimum around RMSD = 0.001 to 0.002.

Does this mean we should aim for 0.001 - 0.002 RMSD @ 2 Ang.?
Absolutely not!  Firstly with such tight restraints the structure
becomes very stiff and is very likely to stick in one of a multitude of
false minima (such as inverted chiral centres).  Secondly does it
actually matter at 2 Ang. whether the RMSD is 0.001 or 0.02?  The value
0.001 may be the formally correct one, but does the value 0.02 actually
lead to significant errors in the structure?  After all the differences
in Rfree are miniscule (the differences in LLGfree are actually grossly
exaggerated), the maps are indistinguishable, and you don't really
expect to determine bond lengths with an accuracy better than 0.02
anyway.

So my conclusion is that at below-atomic resolution this is a completely
pointless argument, because it doesn't matter what RMSD you aim for,
provided it's reasonable of course (say < 0.02).  You will introduce
some errors by having the RMSD's higher than the optimal values but
these will be insignificant, the most important thing is not to have
geometry restraints so tight that the refinement doesn't converge in a
sensible amount of time.

Happy Easter!

-- Ian


> -----Original Message-----
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Bernhard Rupp
> Sent: 04 April 2007 21:06
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after 
> TLS refinement?
> 
> >People also felt that the RMSD bond/angle of 0.016/1.6 was 
> still a little
> high.
> 
> This was subject of a discussion before on the board and I 
> still don't 
> understand it:
> 
> If I recall correctly, even in highly accurate and precise
> small molecule structures, the rmsd of corresponding
> bonds and angles are ~0.014A and 1.8deg. 
> 
> It always seems to me that getting these values much below is 
> not a sign
> of crystallographic prowess but over-restraining them?
> 
> Is it just that - given good resolution in the first place - 
> the balance 
> of restraints (matrix weight) vs low R (i.e., Xray data) 
> gives the best 
> Rfree or lowest gap at (artificially?) lower rmsd?
> 
> Is that then the best model?
> 
> I understand that even thermal vibration accounts for about 1.7 deg 
> angle deviation -  are lower rmsd deviations then a manifestation
> of low temp? But that does not seem to be much of an effect, if
> one looks at the tables from the CSD small mol data (shown in 
> nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). 
>  
> 
> Thx, br
> 
>  
> 
> 

Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy 
all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674

Reply via email to