[ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Hailiang Zhang
Hi all:

I have a question about deciding an ideal "Weight matrix" value in REFMAC.
When I change it from 0.1 to 0.001, the bond distance rmsd changes from
0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution 3.2A).
Now I am not sure what is the best balance based on these numbers. Are
there any references or empirical values? Thanks!

Best Regards, Hailiang


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Ian Tickle
Hi Hailiang

The short answer is that the optimal X-ray weighting factor minimises
Rfree, or better -LLfree.

However this is tricky to carry out in practice since it means you
have to run several jobs adjusting the weight manually each time to
find the optimum.  Also, ideally the same procedure should be
performed for the B weighting factor, but this adds yet another
dimension to the problem, and I suspect most people just go with the
default B weighting factor (though strictly speaking its optimum value
is resolution-dependent).

Another somewhat easier way in practice is to adjust the weight to get
a particular target value for RMS-Z(bonds), however you still have the
problem of choosing that optimal target value.  The median value of
RMS-Z(bonds) over the whole PDB is about 0.5 so you could use that for
everything, though ideally the value should be lower than that for low
resolution data and higher for high resolution.  I use this
empirically-derived formula obtained by fitting the RMS-Z(bonds)
values in the PDB to a straight line with resolution:

  RMS-Z(bonds) = 0.85 - 0.146*resolution

though this is probably valid only in the resolution range 3.5 to 1
Ang, since the number of structures outside that range is too small to
get a meaningful fit.  I'm sure others have different opinions on
this.

One problem with the 'WEIGHT MATRIX' value is that the optimum is
resolution-dependent, i.e. the optimum value for a low-resolution
dataset is quite different from that for a high-resolution one.  The
'WEIGHT AUTO' option is much better in this respect as the optimum
value is much less resolution-independent.  The default weight value
for 'WEIGHT AUTO' is 10 but I find this much too high, and I always
reset it to 'WEIGHT AUTO  2.5' as a first attempt.

Cheers

-- Ian

On Tue, Sep 21, 2010 at 8:54 PM, Hailiang Zhang  wrote:
> Hi all:
>
> I have a question about deciding an ideal "Weight matrix" value in REFMAC.
> When I change it from 0.1 to 0.001, the bond distance rmsd changes from
> 0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution 3.2A).
> Now I am not sure what is the best balance based on these numbers. Are
> there any references or empirical values? Thanks!
>
> Best Regards, Hailiang
>


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Ian Tickle
To give credit where it is due I should perhaps have explained that
the formula for RMS-Z(bonds) that I quoted was derived from an
analysis of re-refinements from the PDB-REDO project
(http://www.cmbi.ru.nl/pdb_redo), not from the PDB itself.  PDB-REDO
itself uses the LLfree optimisation method that I referred to briefly.

Cheers

-- Ian

On Tue, Sep 21, 2010 at 9:42 PM, Ian Tickle  wrote:
> Hi Hailiang
>
> The short answer is that the optimal X-ray weighting factor minimises
> Rfree, or better -LLfree.
>
> However this is tricky to carry out in practice since it means you
> have to run several jobs adjusting the weight manually each time to
> find the optimum.  Also, ideally the same procedure should be
> performed for the B weighting factor, but this adds yet another
> dimension to the problem, and I suspect most people just go with the
> default B weighting factor (though strictly speaking its optimum value
> is resolution-dependent).
>
> Another somewhat easier way in practice is to adjust the weight to get
> a particular target value for RMS-Z(bonds), however you still have the
> problem of choosing that optimal target value.  The median value of
> RMS-Z(bonds) over the whole PDB is about 0.5 so you could use that for
> everything, though ideally the value should be lower than that for low
> resolution data and higher for high resolution.  I use this
> empirically-derived formula obtained by fitting the RMS-Z(bonds)
> values in the PDB to a straight line with resolution:
>
>   RMS-Z(bonds) = 0.85 - 0.146*resolution
>
> though this is probably valid only in the resolution range 3.5 to 1
> Ang, since the number of structures outside that range is too small to
> get a meaningful fit.  I'm sure others have different opinions on
> this.
>
> One problem with the 'WEIGHT MATRIX' value is that the optimum is
> resolution-dependent, i.e. the optimum value for a low-resolution
> dataset is quite different from that for a high-resolution one.  The
> 'WEIGHT AUTO' option is much better in this respect as the optimum
> value is much less resolution-independent.  The default weight value
> for 'WEIGHT AUTO' is 10 but I find this much too high, and I always
> reset it to 'WEIGHT AUTO  2.5' as a first attempt.
>
> Cheers
>
> -- Ian
>
> On Tue, Sep 21, 2010 at 8:54 PM, Hailiang Zhang  wrote:
>> Hi all:
>>
>> I have a question about deciding an ideal "Weight matrix" value in REFMAC.
>> When I change it from 0.1 to 0.001, the bond distance rmsd changes from
>> 0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution 3.2A).
>> Now I am not sure what is the best balance based on these numbers. Are
>> there any references or empirical values? Thanks!
>>
>> Best Regards, Hailiang
>>
>


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Roger Rowlett


  
  
The automatic weighting feature in the latest
  releases of refmac is pretty good. However, if you are setting the
  X-ray weighting factor manually, typical targets for rms bond
  angles are 1.0-2.0 degrees and rms bond lengths of 0.01-0.02 A.

On 9/21/2010 3:54 PM, Hailiang Zhang wrote:

  Hi all:

I have a question about deciding an ideal "Weight matrix" value in REFMAC.
When I change it from 0.1 to 0.001, the bond distance rmsd changes from
0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution 3.2A).
Now I am not sure what is the best balance based on these numbers. Are
there any references or empirical values? Thanks!

Best Regards, Hailiang


-- 
  

Roger S. Rowlett
Professor
Department of Chemistry
Colgate University
13 Oak Drive
Hamilton, NY 13346

tel: (315)-228-7245
ofc: (315)-228-7395
fax: (315)-228-7935
email: rrowl...@colgate.edu
  

  



Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Hailiang Zhang
Hi Ian:

Thanks a lot! I have 2 questions:

(1). Can I say the X-ray weighting is optimal when it yields the smallest
Rfree, meanwhile RMS-Z(bonds) is smaller than "0.85 - 0.146*resolution"
(angles also maybe)?

(2). Why RMS-Z(bonds) should be lower than that for low resolution data
and higher for high resolution? Or why high-resolution can allows more
outliers?

Thanks again for that!

Best Regards, Hailiang

> To give credit where it is due I should perhaps have explained that
> the formula for RMS-Z(bonds) that I quoted was derived from an
> analysis of re-refinements from the PDB-REDO project
> (http://www.cmbi.ru.nl/pdb_redo), not from the PDB itself.  PDB-REDO
> itself uses the LLfree optimisation method that I referred to briefly.
>
> Cheers
>
> -- Ian
>
> On Tue, Sep 21, 2010 at 9:42 PM, Ian Tickle  wrote:
>> Hi Hailiang
>>
>> The short answer is that the optimal X-ray weighting factor minimises
>> Rfree, or better -LLfree.
>>
>> However this is tricky to carry out in practice since it means you
>> have to run several jobs adjusting the weight manually each time to
>> find the optimum.  Also, ideally the same procedure should be
>> performed for the B weighting factor, but this adds yet another
>> dimension to the problem, and I suspect most people just go with the
>> default B weighting factor (though strictly speaking its optimum value
>> is resolution-dependent).
>>
>> Another somewhat easier way in practice is to adjust the weight to get
>> a particular target value for RMS-Z(bonds), however you still have the
>> problem of choosing that optimal target value.  The median value of
>> RMS-Z(bonds) over the whole PDB is about 0.5 so you could use that for
>> everything, though ideally the value should be lower than that for low
>> resolution data and higher for high resolution.  I use this
>> empirically-derived formula obtained by fitting the RMS-Z(bonds)
>> values in the PDB to a straight line with resolution:
>>
>>   RMS-Z(bonds) = 0.85 - 0.146*resolution
>>
>> though this is probably valid only in the resolution range 3.5 to 1
>> Ang, since the number of structures outside that range is too small to
>> get a meaningful fit.  I'm sure others have different opinions on
>> this.
>>
>> One problem with the 'WEIGHT MATRIX' value is that the optimum is
>> resolution-dependent, i.e. the optimum value for a low-resolution
>> dataset is quite different from that for a high-resolution one.  The
>> 'WEIGHT AUTO' option is much better in this respect as the optimum
>> value is much less resolution-independent.  The default weight value
>> for 'WEIGHT AUTO' is 10 but I find this much too high, and I always
>> reset it to 'WEIGHT AUTO  2.5' as a first attempt.
>>
>> Cheers
>>
>> -- Ian
>>
>> On Tue, Sep 21, 2010 at 8:54 PM, Hailiang Zhang 
>> wrote:
>>> Hi all:
>>>
>>> I have a question about deciding an ideal "Weight matrix" value in
>>> REFMAC.
>>> When I change it from 0.1 to 0.001, the bond distance rmsd changes from
>>> 0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution
>>> 3.2A).
>>> Now I am not sure what is the best balance based on these numbers. Are
>>> there any references or empirical values? Thanks!
>>>
>>> Best Regards, Hailiang
>>>
>>
>
>


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-21 Thread Bernhard Rupp (Hofkristallrat a.D.)
> (2). Why RMS-Z(bonds) should be lower than that for low resolution data
and higher for high resolution? Or why high-resolution can allows more
outliers?

Imagine torsion angle refinement only, at low resolution: The bond
lengths are fixed to target values, their rmsd will be zero.

Imagine free refinement at atomic resolution: the rmsd will approach the
established variance/sigma for the (small molecule/peptide) target values.


In between it depends on the restraint weights (or overall: matrix weight)
that are needed to keep the model within bounds of plausible
stereochemistry.
The less data, the more restraint weight, almost always.

REFMAC to my knowledge uses(ed) the same rmsd target values (close to 
target variance or somewhat less) for all resolutions. Garib?

The LLfree or Rfree minimization as Ian mentioned is the correct way to go,
and I
found that - at least until recently - the REFMAC default B restraints
are too tight (Garib?), at least for my structures and around 2A (depends on
molecule).

So there is no fixed recipe or target - one needs to try and find values 
appropriate to the specific structure. At convergence of the refinement 
of course.

Now, whether the differences between differently restrained
models are *significant* is an entire story in itself...

BR

 
Thanks again for that!

Best Regards, Hailiang

> To give credit where it is due I should perhaps have explained that
> the formula for RMS-Z(bonds) that I quoted was derived from an
> analysis of re-refinements from the PDB-REDO project
> (http://www.cmbi.ru.nl/pdb_redo), not from the PDB itself.  PDB-REDO
> itself uses the LLfree optimisation method that I referred to briefly.
>
> Cheers
>
> -- Ian
>
> On Tue, Sep 21, 2010 at 9:42 PM, Ian Tickle  wrote:
>> Hi Hailiang
>>
>> The short answer is that the optimal X-ray weighting factor minimises
>> Rfree, or better -LLfree.
>>
>> However this is tricky to carry out in practice since it means you
>> have to run several jobs adjusting the weight manually each time to
>> find the optimum.  Also, ideally the same procedure should be
>> performed for the B weighting factor, but this adds yet another
>> dimension to the problem, and I suspect most people just go with the
>> default B weighting factor (though strictly speaking its optimum value
>> is resolution-dependent).
>>
>> Another somewhat easier way in practice is to adjust the weight to get
>> a particular target value for RMS-Z(bonds), however you still have the
>> problem of choosing that optimal target value.  The median value of
>> RMS-Z(bonds) over the whole PDB is about 0.5 so you could use that for
>> everything, though ideally the value should be lower than that for low
>> resolution data and higher for high resolution.  I use this
>> empirically-derived formula obtained by fitting the RMS-Z(bonds)
>> values in the PDB to a straight line with resolution:
>>
>>   RMS-Z(bonds) = 0.85 - 0.146*resolution
>>
>> though this is probably valid only in the resolution range 3.5 to 1
>> Ang, since the number of structures outside that range is too small to
>> get a meaningful fit.  I'm sure others have different opinions on
>> this.
>>
>> One problem with the 'WEIGHT MATRIX' value is that the optimum is
>> resolution-dependent, i.e. the optimum value for a low-resolution
>> dataset is quite different from that for a high-resolution one.  The
>> 'WEIGHT AUTO' option is much better in this respect as the optimum
>> value is much less resolution-independent.  The default weight value
>> for 'WEIGHT AUTO' is 10 but I find this much too high, and I always
>> reset it to 'WEIGHT AUTO  2.5' as a first attempt.
>>
>> Cheers
>>
>> -- Ian
>>
>> On Tue, Sep 21, 2010 at 8:54 PM, Hailiang Zhang 
>> wrote:
>>> Hi all:
>>>
>>> I have a question about deciding an ideal "Weight matrix" value in
>>> REFMAC.
>>> When I change it from 0.1 to 0.001, the bond distance rmsd changes from
>>> 0.075 to 0.008, while the R changes from 0.26 to 0.33 (resolution
>>> 3.2A).
>>> Now I am not sure what is the best balance based on these numbers. Are
>>> there any references or empirical values? Thanks!
>>>
>>> Best Regards, Hailiang
>>>
>>
>
>


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-22 Thread Ian Tickle
> (1). Can I say the X-ray weighting is optimal when it yields the smallest
> Rfree, meanwhile RMS-Z(bonds) is smaller than "0.85 - 0.146*resolution"
> (angles also maybe)?

The weighting is optimal when the free likelihood is maximised with
respect to the weights, or equivalently when the negative log of the
free likelihood (-LLfree: the number printed by Refmac) is minimised.
The practical problem is that this requires a lot of refinement runs
with different weights to locate the optimum.  Ideally also the B
weighting factor needs to be optimised by the same method, but this
makes it a 2-parameter optimisation so you would need even more runs
of Refmac to locate the optimum.  The B weighting factor is
resolution-dependent so a single value is really not suitable at all
resolutions.  I was suggesting using the PDB-REDO based
resolution-dependent RMS-Z(bonds) target value as a "quick-and-dirty"
alternative which won't be too far out.

> (2). Why RMS-Z(bonds) should be lower than that for low resolution data
> and higher for high resolution? Or why high-resolution can allows more
> outliers?

Bernhard's thought experiment is a good one, I would just say that if
you only have low resolution data you can't hope to estimate small
deviations from the target values accurately: there's a good chance
that half of them will be just random deviations in the wrong
direction and only produce overfitting and an increase in Rfree; hence
you won't achieve the optimal LLfree.  If you have high resolution
data then of course you are justified in claiming that the deviations
from the target values that you observe are meaningful - that's what
'resolution' means.  You're second question about only being able to
detect outliers with high resolution data answers your first question!

This is analogous to the 'D' factor in D*Fcalc for the map
coefficients: the effect of random error is to reduce the expectation.
 Incidentally, changing the subject briefly to a previous thread: note
that Refmac writes out D*Fcalc in the 'FC' column, not Fcalc, so if
people deposit this column in the PDB then it cannot be used to
reproduce the R factor, which requires Fcalc.

Cheers

-- Ian


Re: [ccp4bb] how to decide an ideal "Weight matrix" value in REFMAC

2010-09-22 Thread Bernhard Rupp (Hofkristallrat a.D.)
> Bernhard's thought experiment is a good one, 

I picked it up from Ianprobably posted earlier

BR