Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-14 Thread Morten Kjeldgaard

On 09/01/2008, at 20.48, Anastassis Perrakis wrote:


I actually think that inaccurate cells are a big source of misery  
in many refinements. I have found the idea
of WhatCheck to actually check your cell by looking at the  
projection of bond lengths of certain types along the cell axes  
most useful.
I would hardly advocate to measure your cell that way, but going  
back to you data and looking at the cell again would be worth it.




However, cell parameters are quantities that have been experimentally  
determined, thus calculating them using the approach of WhatCheck is  
methodically incorrect. Furthermore, there is lots of scattering  
matter in the unit cell that is not accurately modelled by atoms.


Bond lengths and angles are only needed in macromolecular  
crystallography to overcome the poor data to parameter ratio. They  
are an artefact of life, so to speak. And as such, they are  
artificially tightly restrained to some ideal value. The data is  
normally not strong enough to tell us that a bond is _not_ ideal.


As any measurements, cell parameters have an uncertainty associated  
with them, so a better approach would be to propagate those errors in  
measurements in a proper statistical fashion.


Cheers,
Morten

--
Morten Kjeldgaard, asc. professor, MSc, PhD
Department of Molecular Biology, Aarhus University
Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark.
Lab +45 89425026 * Mobile +45 51860147 * Fax +45 86123178
Home +45 86188180 * http://www.bioxray.dk/~mok


Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-10 Thread Dirk Kostrewa

Am 09.01.2008 um 20:48 schrieb Anastassis Perrakis:
snip
I actually think that inaccurate cells are a big source of misery  
in many refinements. I have found the idea
of WhatCheck to actually check your cell by looking at the  
projection of bond lengths of certain types along the cell axes  
most useful.
I would hardly advocate to measure your cell that way, but going  
back to you data and looking at the cell again would be worth it.

snip

There was a discussion many years ago on this bulletin board about  
unit cell scaling errors reported by WHAT_CHECK that resulted merely  
from some different dictionary values used in the refinement program  
(CNS, if I remember correctly) and in WHAT_CHECK, although both  
claimed to use the Engh  Huber parameters. This difference projected  
onto the unit cell axes resulted in a reported unit cell scaling  
error, that could not be fixed by iterative rescaling of the unit  
cell and refinement, and thus was an artifact. So, I wouldn't even  
trust the unit cells reported by WHAT_CHECK . . .


Best regards,

Dirk.

***
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: [EMAIL PROTECTED]
***




Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread William Scott
Sorry, that should have read

because the value is established by social consensus, it is thus NOT
guaranteed to be perfectly accurate, ...

In other words, one can imagine some source of systematic error in
establishing an ideal bond length.  For example, the crystal packing
environment of small molecules might tend to distort a bond by a couple
hundredths of an Ångstrom.


William Scott wrote:
 Dear Yang Li:


 Happy New Year to you, too, (ahead of Feb. 7th).

 You certainly owe us no apology; the reverse may not be true.

 Your question is an important one, as is what you have written below.

 I'm not certain I have a completely satisfactory answer.

 The reason is that ideal bond lengths may or may not be true in the
 sense that the value is established by social consensus, and is thus
 guaranteed to be perfectly accurate, even though it may be quite precise.

 Because of this, and because of natural deviations from ideality (which
 really only become trustworthy observations at extremely high resolution),
 a certain amount of wiggle room is typically allowed in terms of rmsd.

 The more conservative the refinement, the smaller the rmsd from ideality
 will be.

 Some people believe 0.02 Å deviation from ideality is reasonable, based on
 the accuracy of the dictionary values of bond lengths and angles; others
 consider that to be too sloppy and a way to artificially deflate
 Rfactors.

 I seem to have detected a tendency in the literature to aim for about 0.01
 Å deviation.  The new refinement program phenix.refine, which is supposed
 to optimize weighting between X-ray terms and stereochemical constraints
 automatically, seems to settle in at quite conservative values, such as
 0.005 Å, whereas with refmac, I can't seem to get the geometry any more
 ideal than 0.005 Å even if I try to idealize a structure in the absence of
 X-ray data.

 So, like you, I am a bit confused, and wouldn't mind hearing more from the
 experts.

 All the best,

 Bill






 yang li wrote:
 Dear All,
   I am very sorry to involve you into such insignificance
 discussion,
 I
 have reached agreement
 with Prof Gerard, please stop talking about things beyond science,
 thanks!
   I read a book today, which said A refined model should exhibit
 rms
 deviations of no more
 than 0.02A for bond length and 4 for bond angels, I just wonder about
 the
 standard of the
 bond length and the bond angel. I think most of you have read similar
 words!
 But maybe I
 didnot express clearly and made some phrasal mistakes.
   At last, happy new year to you all--though very late!


 Sincerely!
 Yang Li




Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread Nave, C (Colin)
 
The latest Acta D shows the social consensus is sometimes lacking even (or 
especially) among very experienced and able crystallographers.

Experimental determination of optimal root-mean-square deviations of 
macromolecular bond lengths and angles from their restrained ideal values 
Ian J. Tickle 
pages 1274-1281

Numerology versus reality: a voice in a recent dispute 
Mariusz Jaskolski , Miroslaw Gilski , Zbigniew Dauter and Alexander Wlodawer 
pages 1282-1283

Interesting debate

   Colin

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of William Scott
Sent: 09 January 2008 17:32
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements

Sorry, that should have read

because the value is established by social consensus, it is thus NOT 
guaranteed to be perfectly accurate, ...

In other words, one can imagine some source of systematic error in establishing 
an ideal bond length.  For example, the crystal packing environment of small 
molecules might tend to distort a bond by a couple hundredths of an Ångstrom.


William Scott wrote:
 Dear Yang Li:


 Happy New Year to you, too, (ahead of Feb. 7th).

 You certainly owe us no apology; the reverse may not be true.

 Your question is an important one, as is what you have written below.

 I'm not certain I have a completely satisfactory answer.

 The reason is that ideal bond lengths may or may not be true in the 
 sense that the value is established by social consensus, and is thus 
 guaranteed to be perfectly accurate, even though it may be quite precise.

 Because of this, and because of natural deviations from ideality 
 (which really only become trustworthy observations at extremely high 
 resolution), a certain amount of wiggle room is typically allowed in terms 
 of rmsd.

 The more conservative the refinement, the smaller the rmsd from 
 ideality will be.

 Some people believe 0.02 Å deviation from ideality is reasonable, 
 based on the accuracy of the dictionary values of bond lengths and 
 angles; others consider that to be too sloppy and a way to 
 artificially deflate Rfactors.

 I seem to have detected a tendency in the literature to aim for about 
 0.01 Å deviation.  The new refinement program phenix.refine, which is 
 supposed to optimize weighting between X-ray terms and stereochemical 
 constraints automatically, seems to settle in at quite conservative 
 values, such as
 0.005 Å, whereas with refmac, I can't seem to get the geometry any 
 more ideal than 0.005 Å even if I try to idealize a structure in the 
 absence of X-ray data.

 So, like you, I am a bit confused, and wouldn't mind hearing more from 
 the experts.

 All the best,

 Bill






 yang li wrote:
 Dear All,
   I am very sorry to involve you into such insignificance 
 discussion, I have reached agreement with Prof Gerard, please stop 
 talking about things beyond science, thanks!
   I read a book today, which said A refined model should exhibit 
 rms deviations of no more than 0.02A for bond length and 4 for bond 
 angels, I just wonder about the standard of the bond length and the 
 bond angel. I think most of you have read similar words!
 But maybe I
 didnot express clearly and made some phrasal mistakes.
   At last, happy new year to you all--though very late!


 Sincerely!
 Yang Li


DIVFONT size=1 color=grayThis e-mail and any attachments may contain 
confidential, copyright and or privileged material, and are for the use of the 
intended addressee only. If you are not the intended addressee or an authorised 
recipient of the addressee please notify us of receipt by returning the e-mail 
and do not use, copy, retain, distribute or disclose the information in or 
attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
/FONT/DIV 


Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread Herbert J. Bernstein
tables 1 is formally correct but awkward.  table 1s is confusing. I 
would suggest that we treat table 1 like sheep and make the plural the 
same as the singular.  If you don't approve of revising the English 
language, then a valid way to avoid the need for a plural is to say each 
table 1.


  --  Herbert

=
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 [EMAIL PROTECTED]
=

On Wed, 9 Jan 2008, Gerard DVD Kleywegt wrote:

...
and start quoting RMS-Z-scores (from whatcheck or, soon, from refmac) in your 
tables 1 (table 1s? what *is* the plural of table 1?).

...


Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread Todd Holyoak
For some current thoughts on bond length and bond angle deviations you may want 
to look at the following paper:

Acta Cryst. (2007). D63, 611-620   
Stereochemical restraints revisited: how accurate are refinement targets and 
how much should protein structures be allowed to deviate from them?
M. Jaskolski, M. Gilski, Z. Dauter and A. Wlodawer


Todd Holyoak

Todd Holyoak, Assistant Professor
Dept. of Biochemistry and Molecular Biology
The University of Kansas Medical Center
3901 Rainbow Blvd.
4013A WHW, MS 3030
Kansas City, KS 66160
913-588-0795 (office)
913-588-0796 (lab)
913-588-7440 (fax)
 William Scott [EMAIL PROTECTED] 01/09/08 11:31 AM 
Sorry, that should have read

because the value is established by social consensus, it is thus NOT
guaranteed to be perfectly accurate, ...

In other words, one can imagine some source of systematic error in
establishing an ideal bond length.  For example, the crystal packing
environment of small molecules might tend to distort a bond by a couple
hundredths of an Ångstrom.


William Scott wrote:
 Dear Yang Li:


 Happy New Year to you, too, (ahead of Feb. 7th).

 You certainly owe us no apology; the reverse may not be true.

 Your question is an important one, as is what you have written below.

 I'm not certain I have a completely satisfactory answer.

 The reason is that ideal bond lengths may or may not be true in the
 sense that the value is established by social consensus, and is thus
 guaranteed to be perfectly accurate, even though it may be quite precise.

 Because of this, and because of natural deviations from ideality (which
 really only become trustworthy observations at extremely high resolution),
 a certain amount of wiggle room is typically allowed in terms of rmsd.

 The more conservative the refinement, the smaller the rmsd from ideality
 will be.

 Some people believe 0.02 Å deviation from ideality is reasonable, based on
 the accuracy of the dictionary values of bond lengths and angles; others
 consider that to be too sloppy and a way to artificially deflate
 Rfactors.

 I seem to have detected a tendency in the literature to aim for about 0.01
 Å deviation.  The new refinement program phenix.refine, which is supposed
 to optimize weighting between X-ray terms and stereochemical constraints
 automatically, seems to settle in at quite conservative values, such as
 0.005 Å, whereas with refmac, I can't seem to get the geometry any more
 ideal than 0.005 Å even if I try to idealize a structure in the absence of
 X-ray data.

 So, like you, I am a bit confused, and wouldn't mind hearing more from the
 experts.

 All the best,

 Bill






 yang li wrote:
 Dear All,
   I am very sorry to involve you into such insignificance
 discussion,
 I
 have reached agreement
 with Prof Gerard, please stop talking about things beyond science,
 thanks!
   I read a book today, which said A refined model should exhibit
 rms
 deviations of no more
 than 0.02A for bond length and 4 for bond angels, I just wonder about
 the
 standard of the
 bond length and the bond angel. I think most of you have read similar
 words!
 But maybe I
 didnot express clearly and made some phrasal mistakes.
   At last, happy new year to you all--though very late!


 Sincerely!
 Yang Li





Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread Ian Tickle
Hi William  others,

Indeed, phenix.refine uses cross-validation to optimise the scaling of the 
X-ray  B-factor weights.  All I did was demonstrate that you can do 
essentially the same thing as phenix.refine but using Refmac instead.  I don't 
claim to have done anything new, except I modified Refmac to print out the free 
likelihood and used that as a target function instead of Rfree, as suggested by 
Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423.  Whatever value of the 
RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's 
based purely objectively on the experimental data, not on completely arbitrary 
and unjustifiable subjective choices, which is what Jaskolski et al. appear to 
be suggesting.  Cross-validation is a well-established methodology in 
statistics, it's certainly not 'numerology'!

Of course then you have to come up with some theory to explain the experimental 
results, i.e. why the RMSD that comes out must always be = the RMS standard 
uncertainty, but actually that's not difficult since the RMSD is related to the 
accuracy and the SU is related to the precision, and on the face of it there's 
no reason why these should be related at all (as Gerard nicely demonstrated 
with his dartboard analogy in Leeds!).  Jaskolski et al.'s theory that always 
RMSD = SU regardless of resolution just doesn't fit the experimental results, 
and as every good scientist knows, it only takes one ugly fact to destroy a 
beautiful theory.

As you point out, setting a target value of 0.02 Ang or higher for the RMSD 
bonds and similarly for the angles, unless you have very high resolution data, 
will inevitably result in take-up of some fraction of the random experimental 
errors into the refined parameters, in order to inflate the RMSD/RMSZ's to 
their target values and reduce Rwork at the expense of Rfree - otherwise known 
as overfitting!  It's not recommended practice to deliberately cause random 
errors (however small) to be added to your co-ordinates!  This is obvious if 
you think about what happens at low resolution: there's no justification for 
refining individual xyz  B's, so the optimal procedure is to use constrained 
refinement with the torsion angles as parameters, or restrained refinement with 
*very* tight restraints (if that's feasible).  Whether you use constrained 
refinement or its restrained equivalent, it will keep the bond lengths  angles 
fixed at the initial dictionary values so the RMSD's will be identically zero, 
or very nearly so, throughout the refinement.

Someone mentioned 'experienced crystallographers': actually since the 
distinction between RMSD  SU is purely a question of statistics not of 
crystallography, any crystallographic experience is unlikely to be relevant!

The other question you raised is why Refmac doesn't refine the RMSD's much 
nearer to zero - this is something I also commented on; also why the Rfree  
LLfree plots are so noisy compared with those from CNS  phenix.refine.  I 
think it's to do with rounding errors in the gradient calculation and/or 
optimisation code.  Refmac may be using single precision, whereas phenix.refine 
may be using double - I'm just guessing, maybe the programmers could comment?  
This is something I would like to see improved, in order to make 
cross-validation with Refmac more reliable  useful.

Cheers

-- Ian

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of William Scott
 Sent: 09 January 2008 17:32
 To: William Scott
 Cc: ccp4bb@jiscmail.ac.uk
 Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements
 
 Sorry, that should have read
 
 because the value is established by social consensus, it is thus NOT
 guaranteed to be perfectly accurate, ...
 
 In other words, one can imagine some source of systematic error in
 establishing an ideal bond length.  For example, the crystal packing
 environment of small molecules might tend to distort a bond 
 by a couple
 hundredths of an Ångstrom.
 
 
 William Scott wrote:
  Dear Yang Li:
 
 
  Happy New Year to you, too, (ahead of Feb. 7th).
 
  You certainly owe us no apology; the reverse may not be true.
 
  Your question is an important one, as is what you have 
 written below.
 
  I'm not certain I have a completely satisfactory answer.
 
  The reason is that ideal bond lengths may or may not be 
 true in the
  sense that the value is established by social consensus, and is thus
  guaranteed to be perfectly accurate, even though it may be 
 quite precise.
 
  Because of this, and because of natural deviations from 
 ideality (which
  really only become trustworthy observations at extremely 
 high resolution),
  a certain amount of wiggle room is typically allowed in 
 terms of rmsd.
 
  The more conservative the refinement, the smaller the rmsd 
 from ideality
  will be.
 
  Some people believe 0.02 Å deviation from ideality is 
 reasonable, based on
  the accuracy of the dictionary values of bond

Re: [ccp4bb] bond lengths, angles, ideality and refinements

2008-01-09 Thread Anastassis Perrakis

I would only like to iterate a small comment I posted before:

Should the cell parameters be inaccurate, optimization of weights by  
cross-validation (getting the best Rfree) will result in 'higher' RMSD.
It is easy to think about it: if in a cell is measured to be 1%  
larger than in reality, all bonds would 'prefer' to be 1% larger than  
the 'correct'
dictionary values, resulting in a higher RMSD to satisfy that and  
that structure would have the lowest Rfree because the X-ray data

would be fitted better.

I actually think that inaccurate cells are a big source of misery in  
many refinements. I have found the idea
of WhatCheck to actually check your cell by looking at the projection  
of bond lengths of certain types along the cell axes most useful.
I would hardly advocate to measure your cell that way, but going back  
to you data and looking at the cell again would be worth it.


To make it more fun, cells change during radiation damage, so ...

best regards, Tassos

On 9 Jan 2008, at 20:15, Ian Tickle wrote:


Hi William  others,

Indeed, phenix.refine uses cross-validation to optimise the scaling  
of the X-ray  B-factor weights.  All I did was demonstrate that  
you can do essentially the same thing as phenix.refine but using  
Refmac instead.  I don't claim to have done anything new, except I  
modified Refmac to print out the free likelihood and used that as a  
target function instead of Rfree, as suggested by Gerard Bricogne  
in Meth. Enzymol. (1997) 276, 361-423.  Whatever value of the RMSD  
(or better the RMS Z-score) comes out of that, you can be sure that  
it's based purely objectively on the experimental data, not on  
completely arbitrary and unjustifiable subjective choices, which is  
what Jaskolski et al. appear to be suggesting.  Cross-validation is  
a well-established methodology in statistics, it's certainly not  
'numerology'!


Of course then you have to come up with some theory to explain the  
experimental results, i.e. why the RMSD that comes out must always  
be = the RMS standard uncertainty, but actually that's not  
difficult since the RMSD is related to the accuracy and the SU is  
related to the precision, and on the face of it there's no reason  
why these should be related at all (as Gerard nicely demonstrated  
with his dartboard analogy in Leeds!).  Jaskolski et al.'s theory  
that always RMSD = SU regardless of resolution just doesn't fit  
the experimental results, and as every good scientist knows, it  
only takes one ugly fact to destroy a beautiful theory.


As you point out, setting a target value of 0.02 Ang or higher for  
the RMSD bonds and similarly for the angles, unless you have very  
high resolution data, will inevitably result in take-up of some  
fraction of the random experimental errors into the refined  
parameters, in order to inflate the RMSD/RMSZ's to their target  
values and reduce Rwork at the expense of Rfree - otherwise known  
as overfitting!  It's not recommended practice to deliberately  
cause random errors (however small) to be added to your co- 
ordinates!  This is obvious if you think about what happens at low  
resolution: there's no justification for refining individual xyz   
B's, so the optimal procedure is to use constrained refinement with  
the torsion angles as parameters, or restrained refinement with  
*very* tight restraints (if that's feasible).  Whether you use  
constrained refinement or its restrained equivalent, it will keep  
the bond lengths  angles fixed at the initial dictionary values so  
the RMSD's will be identically zero, or very nearly so, throughout  
the refinement.


Someone mentioned 'experienced crystallographers': actually since  
the distinction between RMSD  SU is purely a question of  
statistics not of crystallography, any crystallographic experience  
is unlikely to be relevant!


The other question you raised is why Refmac doesn't refine the  
RMSD's much nearer to zero - this is something I also commented on;  
also why the Rfree  LLfree plots are so noisy compared with those  
from CNS  phenix.refine.  I think it's to do with rounding errors  
in the gradient calculation and/or optimisation code.  Refmac may  
be using single precision, whereas phenix.refine may be using  
double - I'm just guessing, maybe the programmers could comment?   
This is something I would like to see improved, in order to make  
cross-validation with Refmac more reliable  useful.


Cheers

-- Ian


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of William Scott
Sent: 09 January 2008 17:32
To: William Scott
Cc: ccp4bb@jiscmail.ac.uk
Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements

Sorry, that should have read

because the value is established by social consensus, it is thus NOT
guaranteed to be perfectly accurate, ...

In other words, one can imagine some source of systematic error in
establishing an ideal bond length.  For example, the crystal