Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?

2007-04-10 Thread Ian Tickle
 
I see that Dale and I are in pretty well complete agreement on this
subject (even though I honestly hadn't read Dale's response when I sent
mine!) - I think we now have a definitive explanation, so hopefully this
will be the last time that this question comes up, or if not at least we
now have a useful thread that future queries on this subject can be
referred to!

I would like to make one further point, and in fact caution *against*
using Rfree directly as an indicator of the optimal weight as has been
suggested in the literature & elsewhere.  I gave some reasons why Rfree
is not sufficiently accurate for this in my previous response: what
theory we have suggests strongly that the free log-likelihood gain
(LLGfree) is the correct statistic to use, and that the Rfree minimum
approximates the LLGfree maximum only poorly.  My point is that not all
SF calculation programs even compute R factors using the same formula!

The 'conventional/textbook' definition of R (which I believe I'm correct
in saying is the way it's defined in Refmac) is R = Sum|Fo-Fc|/Sum(Fo)
where Fo and Fc are the observed & calculated structure amplitudes.
This is the form of R factor that is really appropriate only when
least-squares is the optimisation method.  The program I used
(Buster-TNT) computes R factors using the phase-probability weighted F
('Fexpect') in place of Fc, which is the more appropriate form when
maximum likelihood optimisation is used, and means that this form of
Rfree gives a much better approximation of the LLGfree maximum (even
though it is still actually quite poor!).

Clearly the solution to all this is *not* to use Rfree at all for this
purpose and use LLGfree instead, which all ML-based programs can
actually easily calculate.

One last point: when this subject came up last, the issue of whether
it's valid at all to 'contaminate' the test set by using any kind of
'free' statistic in this way was raised.  The answer is I think that
there is inevitably some contamination, but that it's insignificant.
The reason is that the number of weighting parameters determined in this
way (don't forget that the test set is also used to determine sigma-A
values), is very small compared with the number of variable parameters
in restrained refinement (i.e. typically 4 per atom), so that the
reduction in the number of degrees of freedom is insignificant.  The
alternative of not using the test set in the calculation would
undoubtedly lead to even bigger errors.

Cheers

-- Ian

> -Original Message-
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Dale Tronrud
> Sent: 04 April 2007 22:33
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after 
> TLS refinement?
> 
> Bernhard Rupp wrote:
> >> People also felt that the RMSD bond/angle of 0.016/1.6 was 
> still a little
> > high.
> > 
> > This was subject of a discussion before on the board and I 
> still don't 
> > understand it:
> > 
> > If I recall correctly, even in highly accurate and precise
> > small molecule structures, the rmsd of corresponding
> > bonds and angles are ~0.014A and 1.8deg. 
> > 
> > It always seems to me that getting these values much below 
> is not a sign
> > of crystallographic prowess but over-restraining them?
> > 
> > Is it just that - given good resolution in the first place 
> - the balance 
> > of restraints (matrix weight) vs low R (i.e., Xray data) 
> gives the best 
> > Rfree or lowest gap at (artificially?) lower rmsd?
> > 
> > Is that then the best model?
> > 
> > I understand that even thermal vibration accounts for about 1.7 deg 
> > angle deviation -  are lower rmsd deviations then a manifestation
> > of low temp? But that does not seem to be much of an effect, if
> > one looks at the tables from the CSD small mol data (shown in 
> > nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). 
> >  
> > 
> This is an on-going topic of discussion so let me put in 
> my two cents.
> 
> We calculate libraries of "ideal geometry" based on precise, small
> molecule structures.  When these small molecule crystal structures are
> compared to our derived libraries they are found to contain 
> deviations.
> These deviations are larger than the uncertainty in these models and
> are presumed to reflect real features of the molecule; perturbations
> due to the local environment in the crystal.
> 
> These same perturbations are present in our crystals and we should
> expect to find deviations from "ideal geometry" on the same scale as
> that seen in the precise models.  This expectation lead to 
> the practice
> in the 19

Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?

2007-04-05 Thread Ian Tickle
AutoBuster automatically adjusts the relative X-ray vs geometry weight
to give this target value.  It can be seen that the optimal LLGfree is
obtained at RMSD(bonds) = 0.006, i.e. considerably less than what most
people would have used for a 1.33 Ang. structure!  Note that for the
first entry the weight appears to be zero because it's rounded to the
nearest integer for the printout, obviously it can't be exactly zero.
Also note there is a loss of significant figures in the printout of
small values of RMSD(bonds).

  Weight(X-ray)  Rwork  Rfree  RMSD(bonds)  LLGfree
0.00200.183  0.221 0.004 0.00156
0.004  1120.181  0.220 0.004 0.00561
0.006  2280.178  0.218 0.006 0.00832  <<<
Optimum.
0.008  3150.177  0.218 0.008 0.00648
0.010  4440.176  0.218 0.010 0
(Reference).
0.012  5810.175  0.218 0.012-0.00202
0.016  9230.174  0.218 0.016-0.01189
0.020 12970.173  0.219 0.019-0.01851

I also obtained these results @ 2 Ang. resolution cutoff:

  Weight(X-ray)  Rwork  Rfree  RMSD(bonds)  LLGfree
0.00100.155  0.222 0.002-0.15021  <<<
Optimum.
0.002  1000.153  0.223 0.002-0.18788
0.003  1500.149  0.222 0.003-0.20075
0.004  2000.146  0.221 0.004-0.21292
0.005  2500.143  0.221 0.005-0.22524

i.e. optimum around RMSD = 0.001 to 0.002.

Does this mean we should aim for 0.001 - 0.002 RMSD @ 2 Ang.?
Absolutely not!  Firstly with such tight restraints the structure
becomes very stiff and is very likely to stick in one of a multitude of
false minima (such as inverted chiral centres).  Secondly does it
actually matter at 2 Ang. whether the RMSD is 0.001 or 0.02?  The value
0.001 may be the formally correct one, but does the value 0.02 actually
lead to significant errors in the structure?  After all the differences
in Rfree are miniscule (the differences in LLGfree are actually grossly
exaggerated), the maps are indistinguishable, and you don't really
expect to determine bond lengths with an accuracy better than 0.02
anyway.

So my conclusion is that at below-atomic resolution this is a completely
pointless argument, because it doesn't matter what RMSD you aim for,
provided it's reasonable of course (say < 0.02).  You will introduce
some errors by having the RMSD's higher than the optimal values but
these will be insignificant, the most important thing is not to have
geometry restraints so tight that the refinement doesn't converge in a
sensible amount of time.

Happy Easter!

-- Ian


> -Original Message-
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
> Behalf Of Bernhard Rupp
> Sent: 04 April 2007 21:06
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Summary - Valid to stop Refmac after 
> TLS refinement?
> 
> >People also felt that the RMSD bond/angle of 0.016/1.6 was 
> still a little
> high.
> 
> This was subject of a discussion before on the board and I 
> still don't 
> understand it:
> 
> If I recall correctly, even in highly accurate and precise
> small molecule structures, the rmsd of corresponding
> bonds and angles are ~0.014A and 1.8deg. 
> 
> It always seems to me that getting these values much below is 
> not a sign
> of crystallographic prowess but over-restraining them?
> 
> Is it just that - given good resolution in the first place - 
> the balance 
> of restraints (matrix weight) vs low R (i.e., Xray data) 
> gives the best 
> Rfree or lowest gap at (artificially?) lower rmsd?
> 
> Is that then the best model?
> 
> I understand that even thermal vibration accounts for about 1.7 deg 
> angle deviation -  are lower rmsd deviations then a manifestation
> of low temp? But that does not seem to be much of an effect, if
> one looks at the tables from the CSD small mol data (shown in 
> nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). 
>  
> 
> Thx, br
> 
>  
> 
> 

Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy 
all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Comp

Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?

2007-04-04 Thread Dale Tronrud

Bernhard Rupp wrote:

People also felt that the RMSD bond/angle of 0.016/1.6 was still a little

high.

This was subject of a discussion before on the board and I still don't 
understand it:


If I recall correctly, even in highly accurate and precise
small molecule structures, the rmsd of corresponding
bonds and angles are ~0.014A and 1.8deg. 


It always seems to me that getting these values much below is not a sign
of crystallographic prowess but over-restraining them?

Is it just that - given good resolution in the first place - the balance 
of restraints (matrix weight) vs low R (i.e., Xray data) gives the best 
Rfree or lowest gap at (artificially?) lower rmsd?


Is that then the best model?

I understand that even thermal vibration accounts for about 1.7 deg 
angle deviation -  are lower rmsd deviations then a manifestation

of low temp? But that does not seem to be much of an effect, if
one looks at the tables from the CSD small mol data (shown in 
nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). 
 


   This is an on-going topic of discussion so let me put in my two cents.

   We calculate libraries of "ideal geometry" based on precise, small
molecule structures.  When these small molecule crystal structures are
compared to our derived libraries they are found to contain deviations.
These deviations are larger than the uncertainty in these models and
are presumed to reflect real features of the molecule; perturbations
due to the local environment in the crystal.

   These same perturbations are present in our crystals and we should
expect to find deviations from "ideal geometry" on the same scale as
that seen in the precise models.  This expectation lead to the practice
in the 1980's of setting r.m.s. targets of 0.02A and 3 degrees for
agreement to bond length and angle libraries.

   While this seems quite reasonable, we are left with the question:
Are the deviations from "ideal geometry" we see in a particular model
in any way related to the actual deviations of the molecule in the
crystal?  The uncertainties (su's) of the bond lengths in a model based
on 4A diffraction data are huge compared to the absolute value of the
true deviation.  For example, if the model had a deviation from "ideal
geometry" of 0.02A but the uncertainty of the distance is 0.2A can we
say that we have detected a signal that is significantly different than
zero, the null hypothesis?

   If we have a model with a collection of deviations from "ideal geometry"
but we have no expectation that those deviations are indicative of the
true deviations of the molecule in the crystal, are those deviations
serving any purpose?  If they do not reflect any property of the crystal
they are noise and should be filtered out.

   By this argument a model based on 4A resolution diffraction data should
have no deviation from "idea geometry" while one based on 0.9A diffraction
data should have no restraints on "ideal geometry" since the deviations
are probably all real and significant (except for specific regions of
the molecule that have problems).

   The problem we all face is the vast area between these extremes,
compounded by our inability to calculate proper uncertainties for the
parameters of our models.  The free R is our current tool-of-choice when
it comes to attempting to judge the statistical significance of aspects
of our model, without performing proper statistical tests which we don't
know how to do.  If we allow our model the freedom to deviate from our
library and the free R improves a "significant" (??) amount then the
resulting deviations must have some similarity to the true deviations
in the crystal, but if the free R does not improve then the deviations
must not be related to reality and should be suppressed.  This is the
type of assumption we make whenever we use the free R to make a choice.

   What we end of doing is not making a yes/no decision but instead we
variably suppress the amplitude of the deviations from "idea geometry"
and that is harder to justify.  I think a reasonable argument can be
made, but I have already written too many words in this letter.  It doesn't
really matter because we left the road of mathematical rigor when we took
the R free path.

   Unfortunately, many people have ignored what Brunger said in Methods
in Enzymology about choosing your X-ray/geometry weight based on the
free R and just starting saying "the rms bond length deviation must
be 0.007A".  The deviations from "idea geometry" of your model should be
no more or no less than what you can justifiably claim is a reflection
of the true state of the molecule in your crystal.

Dale Tronrud


Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?

2007-04-04 Thread Bernhard Rupp
>People also felt that the RMSD bond/angle of 0.016/1.6 was still a little
high.

This was subject of a discussion before on the board and I still don't 
understand it:

If I recall correctly, even in highly accurate and precise
small molecule structures, the rmsd of corresponding
bonds and angles are ~0.014A and 1.8deg. 

It always seems to me that getting these values much below is not a sign
of crystallographic prowess but over-restraining them?

Is it just that - given good resolution in the first place - the balance 
of restraints (matrix weight) vs low R (i.e., Xray data) gives the best 
Rfree or lowest gap at (artificially?) lower rmsd?

Is that then the best model?

I understand that even thermal vibration accounts for about 1.7 deg 
angle deviation -  are lower rmsd deviations then a manifestation
of low temp? But that does not seem to be much of an effect, if
one looks at the tables from the CSD small mol data (shown in 
nicely in comparison to the 91 Engh/Huber data in Tables F, pp385). 
 

Thx, br

 


Re: [ccp4bb] Summary - Valid to stop Refmac after TLS refinement?

2007-04-04 Thread NM Burton, Biochemistry

Hello,

Thanks very much to all who replied with thoughts and suggestions.

The consensus was that my interpretation was not correct, and it is not 
valid to stop Refmac after TLS refinement.  People also felt that the RMSD 
bond/angle of 0.016/1.6 was still a little high.  Phenix.refine was 
suggested as a complete solution, with SA and TLS in the same package.


All the best,

Nick

--On 28 March 2007 16:34 +0100 "NM Burton, Biochemistry" 
<[EMAIL PROTECTED]> wrote:



Hello,

I've refined a structure with CNS to Rwork/free=0.226/0.273.  I switched
to Refmac5.2 to take advantage of TLS refinement and set up a run with 10
cycles of TLS refinement (groups as suggested by the TLSMD server)
followed by 10 cycles of restrained co-ordinate refinement.  After the
TLS cycles the model was improved (Rwork/free=0.222/0.241), however
during co-ordinate refinement Rfree refined up (final
Rwork/free=0.197/0.264).  My understanding would be that the TLS
refinement is modelling the ADPs most accurately, but that Refmac's
co-ordinate refinement is over-fitting slightly.  Would this seem
correct?  And if so, is it valid to run Refmac with no cycles of
co-ordinate refinement and take the resulting model as the final
structure?

Thanks very much,

Nick

--
NM Burton, Biochemistry
[EMAIL PROTECTED]




--
NM Burton, Biochemistry
[EMAIL PROTECTED]