Re: [ccp4bb] bond lengths, angles, ideality and refinements
On 09/01/2008, at 20.48, Anastassis Perrakis wrote: I actually think that inaccurate cells are a big source of misery in many refinements. I have found the idea of WhatCheck to actually check your cell by looking at the projection of bond lengths of certain types along the cell axes most useful. I would hardly advocate to measure your cell that way, but going back to you data and looking at the cell again would be worth it. However, cell parameters are quantities that have been experimentally determined, thus calculating them using the approach of WhatCheck is methodically incorrect. Furthermore, there is lots of scattering matter in the unit cell that is not accurately modelled by atoms. Bond lengths and angles are only needed in macromolecular crystallography to overcome the poor data to parameter ratio. They are an artefact of life, so to speak. And as such, they are artificially tightly restrained to some ideal value. The data is normally not strong enough to tell us that a bond is _not_ ideal. As any measurements, cell parameters have an uncertainty associated with them, so a better approach would be to propagate those errors in measurements in a proper statistical fashion. Cheers, Morten -- Morten Kjeldgaard, asc. professor, MSc, PhD Department of Molecular Biology, Aarhus University Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark. Lab +45 89425026 * Mobile +45 51860147 * Fax +45 86123178 Home +45 86188180 * http://www.bioxray.dk/~mok
Re: [ccp4bb] bond lengths, angles, ideality and refinements
Am 09.01.2008 um 20:48 schrieb Anastassis Perrakis: I actually think that inaccurate cells are a big source of misery in many refinements. I have found the idea of WhatCheck to actually check your cell by looking at the projection of bond lengths of certain types along the cell axes most useful. I would hardly advocate to measure your cell that way, but going back to you data and looking at the cell again would be worth it. There was a discussion many years ago on this bulletin board about unit cell scaling errors reported by WHAT_CHECK that resulted merely from some different dictionary values used in the refinement program (CNS, if I remember correctly) and in WHAT_CHECK, although both claimed to use the Engh & Huber parameters. This difference projected onto the unit cell axes resulted in a reported unit cell scaling error, that could not be fixed by iterative rescaling of the unit cell and refinement, and thus was an artifact. So, I wouldn't even trust the unit cells reported by WHAT_CHECK . . . Best regards, Dirk. *** Dirk Kostrewa Gene Center, A 5.07 Ludwig-Maximilians-University Feodor-Lynen-Str. 25 81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: [EMAIL PROTECTED] ***
Re: [ccp4bb] bond lengths, angles, ideality and refinements
I would only like to iterate a small comment I posted before: Should the cell parameters be inaccurate, optimization of weights by cross-validation (getting the best Rfree) will result in 'higher' RMSD. It is easy to think about it: if in a cell is measured to be 1% larger than in reality, all bonds would 'prefer' to be 1% larger than the 'correct' dictionary values, resulting in a higher RMSD to satisfy that and that structure would have the lowest Rfree because the X-ray data would be fitted better. I actually think that inaccurate cells are a big source of misery in many refinements. I have found the idea of WhatCheck to actually check your cell by looking at the projection of bond lengths of certain types along the cell axes most useful. I would hardly advocate to measure your cell that way, but going back to you data and looking at the cell again would be worth it. To make it more fun, cells change during radiation damage, so ... best regards, Tassos On 9 Jan 2008, at 20:15, Ian Tickle wrote: Hi William & others, Indeed, phenix.refine uses cross-validation to optimise the scaling of the X-ray & B-factor weights. All I did was demonstrate that you can do essentially the same thing as phenix.refine but using Refmac instead. I don't claim to have done anything new, except I modified Refmac to print out the free likelihood and used that as a target function instead of Rfree, as suggested by Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423. Whatever value of the RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's based purely objectively on the experimental data, not on completely arbitrary and unjustifiable subjective choices, which is what Jaskolski et al. appear to be suggesting. Cross-validation is a well-established methodology in statistics, it's certainly not 'numerology'! Of course then you have to come up with some theory to explain the experimental results, i.e. why the RMSD that comes out must always be <= the RMS standard uncertainty, but actually that's not difficult since the RMSD is related to the accuracy and the SU is related to the precision, and on the face of it there's no reason why these should be related at all (as Gerard nicely demonstrated with his dartboard analogy in Leeds!). Jaskolski et al.'s theory that always RMSD = regardless of resolution just doesn't fit the experimental results, and as every good scientist knows, it only takes one ugly fact to destroy a beautiful theory. As you point out, setting a target value of 0.02 Ang or higher for the RMSD bonds and similarly for the angles, unless you have very high resolution data, will inevitably result in take-up of some fraction of the random experimental errors into the refined parameters, in order to inflate the RMSD/RMSZ's to their target values and reduce Rwork at the expense of Rfree - otherwise known as overfitting! It's not recommended practice to deliberately cause random errors (however small) to be added to your co- ordinates! This is obvious if you think about what happens at low resolution: there's no justification for refining individual xyz & B's, so the optimal procedure is to use constrained refinement with the torsion angles as parameters, or restrained refinement with *very* tight restraints (if that's feasible). Whether you use constrained refinement or its restrained equivalent, it will keep the bond lengths & angles fixed at the initial dictionary values so the RMSD's will be identically zero, or very nearly so, throughout the refinement. Someone mentioned 'experienced crystallographers': actually since the distinction between RMSD & SU is purely a question of statistics not of crystallography, any crystallographic experience is unlikely to be relevant! The other question you raised is why Refmac doesn't refine the RMSD's much nearer to zero - this is something I also commented on; also why the Rfree & LLfree plots are so noisy compared with those from CNS & phenix.refine. I think it's to do with rounding errors in the gradient calculation and/or optimisation code. Refmac may be using single precision, whereas phenix.refine may be using double - I'm just guessing, maybe the programmers could comment? This is something I would like to see improved, in order to make cross-validation with Refmac more reliable & useful. Cheers -- Ian -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Scott Sent: 09 January 2008 17:32 To: William Scott Cc: ccp4bb@jiscmail.ac.uk Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements Sorry, that should have read "because the value is established by social consensus, it
Re: [ccp4bb] bond lengths, angles, ideality and refinements
Hi William & others, Indeed, phenix.refine uses cross-validation to optimise the scaling of the X-ray & B-factor weights. All I did was demonstrate that you can do essentially the same thing as phenix.refine but using Refmac instead. I don't claim to have done anything new, except I modified Refmac to print out the free likelihood and used that as a target function instead of Rfree, as suggested by Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423. Whatever value of the RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's based purely objectively on the experimental data, not on completely arbitrary and unjustifiable subjective choices, which is what Jaskolski et al. appear to be suggesting. Cross-validation is a well-established methodology in statistics, it's certainly not 'numerology'! Of course then you have to come up with some theory to explain the experimental results, i.e. why the RMSD that comes out must always be <= the RMS standard uncertainty, but actually that's not difficult since the RMSD is related to the accuracy and the SU is related to the precision, and on the face of it there's no reason why these should be related at all (as Gerard nicely demonstrated with his dartboard analogy in Leeds!). Jaskolski et al.'s theory that always RMSD = regardless of resolution just doesn't fit the experimental results, and as every good scientist knows, it only takes one ugly fact to destroy a beautiful theory. As you point out, setting a target value of 0.02 Ang or higher for the RMSD bonds and similarly for the angles, unless you have very high resolution data, will inevitably result in take-up of some fraction of the random experimental errors into the refined parameters, in order to inflate the RMSD/RMSZ's to their target values and reduce Rwork at the expense of Rfree - otherwise known as overfitting! It's not recommended practice to deliberately cause random errors (however small) to be added to your co-ordinates! This is obvious if you think about what happens at low resolution: there's no justification for refining individual xyz & B's, so the optimal procedure is to use constrained refinement with the torsion angles as parameters, or restrained refinement with *very* tight restraints (if that's feasible). Whether you use constrained refinement or its restrained equivalent, it will keep the bond lengths & angles fixed at the initial dictionary values so the RMSD's will be identically zero, or very nearly so, throughout the refinement. Someone mentioned 'experienced crystallographers': actually since the distinction between RMSD & SU is purely a question of statistics not of crystallography, any crystallographic experience is unlikely to be relevant! The other question you raised is why Refmac doesn't refine the RMSD's much nearer to zero - this is something I also commented on; also why the Rfree & LLfree plots are so noisy compared with those from CNS & phenix.refine. I think it's to do with rounding errors in the gradient calculation and/or optimisation code. Refmac may be using single precision, whereas phenix.refine may be using double - I'm just guessing, maybe the programmers could comment? This is something I would like to see improved, in order to make cross-validation with Refmac more reliable & useful. Cheers -- Ian > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of William Scott > Sent: 09 January 2008 17:32 > To: William Scott > Cc: ccp4bb@jiscmail.ac.uk > Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements > > Sorry, that should have read > > "because the value is established by social consensus, it is thus NOT > guaranteed to be perfectly accurate, ..." > > In other words, one can imagine some source of systematic error in > establishing an ideal bond length. For example, the crystal packing > environment of small molecules might tend to distort a bond > by a couple > hundredths of an Ångstrom. > > > William Scott wrote: > > Dear Yang Li: > > > > > > Happy New Year to you, too, (ahead of Feb. 7th). > > > > You certainly owe us no apology; the reverse may not be true. > > > > Your question is an important one, as is what you have > written below. > > > > I'm not certain I have a completely satisfactory answer. > > > > The reason is that ideal bond lengths may or may not be > "true" in the > > sense that the value is established by social consensus, and is thus > > guaranteed to be perfectly accurate, even though it may be > quite precise. > > > > Because of this, and because of natural deviations
Re: [ccp4bb] bond lengths, angles, ideality and refinements
"tables 1" is formally correct but awkward. "table 1s" is confusing. I would suggest that we treat "table 1" like sheep and make the plural the same as the singular. If you don't approve of revising the English language, then a valid way to avoid the need for a plural is to say "each table 1". -- Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 [EMAIL PROTECTED] = On Wed, 9 Jan 2008, Gerard DVD Kleywegt wrote: ...> and start quoting RMS-Z-scores (from whatcheck or, soon, from refmac) in your tables 1 ("table 1s"? what *is* the plural of "table 1"?). ...
Re: [ccp4bb] bond lengths, angles, ideality and refinements
For some current thoughts on bond length and bond angle deviations you may want to look at the following paper: ... but that would be rather a waste of your time (sorry!). if you're only going to read one paper about this subject this year, make it iain tickle's! http://journals.iucr.org/d/issues/2007/12/00/gx5119/gx5119.pdf and start quoting RMS-Z-scores (from whatcheck or, soon, from refmac) in your tables 1 ("table 1s"? what *is* the plural of "table 1"?). i should have mentioned this in my talk at the ccp4 study weekend last saturday --dvd (if sweden is invaded tomorrow by china and poland, you'll know whom to blame!) ** Gerard J. Kleywegt [Research Fellow of the Royal Swedish Academy of Sciences] Dept. of Cell & Molecular Biology University of Uppsala Biomedical Centre Box 596 SE-751 24 Uppsala SWEDEN http://xray.bmc.uu.se/gerard/ mailto:[EMAIL PROTECTED] ** The opinions in this message are fictional. Any similarity to actual opinions, living or dead, is purely coincidental. **
Re: [ccp4bb] bond lengths, angles, ideality and refinements
The latest Acta D shows the "social consensus" is sometimes lacking even (or especially) among very experienced and able crystallographers. Experimental determination of optimal root-mean-square deviations of macromolecular bond lengths and angles from their restrained ideal values Ian J. Tickle pages 1274-1281 Numerology versus reality: a voice in a recent dispute Mariusz Jaskolski , Miroslaw Gilski , Zbigniew Dauter and Alexander Wlodawer pages 1282-1283 Interesting debate Colin -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of William Scott Sent: 09 January 2008 17:32 To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements Sorry, that should have read "because the value is established by social consensus, it is thus NOT guaranteed to be perfectly accurate, ..." In other words, one can imagine some source of systematic error in establishing an ideal bond length. For example, the crystal packing environment of small molecules might tend to distort a bond by a couple hundredths of an Ångstrom. William Scott wrote: > Dear Yang Li: > > > Happy New Year to you, too, (ahead of Feb. 7th). > > You certainly owe us no apology; the reverse may not be true. > > Your question is an important one, as is what you have written below. > > I'm not certain I have a completely satisfactory answer. > > The reason is that ideal bond lengths may or may not be "true" in the > sense that the value is established by social consensus, and is thus > guaranteed to be perfectly accurate, even though it may be quite precise. > > Because of this, and because of natural deviations from ideality > (which really only become trustworthy observations at extremely high > resolution), a certain amount of "wiggle room" is typically allowed in terms > of rmsd. > > The more conservative the refinement, the smaller the rmsd from > ideality will be. > > Some people believe 0.02 Å deviation from ideality is reasonable, > based on the accuracy of the dictionary values of bond lengths and > angles; others consider that to be "too sloppy" and a way to > artificially deflate Rfactors. > > I seem to have detected a tendency in the literature to aim for about > 0.01 Å deviation. The new refinement program phenix.refine, which is > supposed to optimize weighting between X-ray terms and stereochemical > constraints automatically, seems to settle in at quite conservative > values, such as > 0.005 Å, whereas with refmac, I can't seem to get the geometry any > more ideal than 0.005 Å even if I try to idealize a structure in the > absence of X-ray data. > > So, like you, I am a bit confused, and wouldn't mind hearing more from > the experts. > > All the best, > > Bill > > > > > > > yang li wrote: >> Dear All, >> I am very sorry to involve you into such insignificance >> discussion, I have reached agreement with Prof Gerard, please stop >> talking about things beyond science, thanks! >> I read a book today, which said "A refined model should exhibit >> rms deviations of no more than 0.02A for bond length and 4 for bond >> angels", I just wonder about the standard of the bond length and the >> bond angel. I think most of you have read similar words! >> But maybe I >> didnot express clearly and made some phrasal mistakes. >> At last, happy new year to you all--though very late! >> >> >> Sincerely! >> Yang Li >> > This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Re: [ccp4bb] bond lengths, angles, ideality and refinements
For some current thoughts on bond length and bond angle deviations you may want to look at the following paper: Acta Cryst. (2007). D63, 611-620 Stereochemical restraints revisited: how accurate are refinement targets and how much should protein structures be allowed to deviate from them? M. Jaskolski, M. Gilski, Z. Dauter and A. Wlodawer Todd Holyoak Todd Holyoak, Assistant Professor Dept. of Biochemistry and Molecular Biology The University of Kansas Medical Center 3901 Rainbow Blvd. 4013A WHW, MS 3030 Kansas City, KS 66160 913-588-0795 (office) 913-588-0796 (lab) 913-588-7440 (fax) >>> William Scott <[EMAIL PROTECTED]> 01/09/08 11:31 AM >>> Sorry, that should have read "because the value is established by social consensus, it is thus NOT guaranteed to be perfectly accurate, ..." In other words, one can imagine some source of systematic error in establishing an ideal bond length. For example, the crystal packing environment of small molecules might tend to distort a bond by a couple hundredths of an Ångstrom. William Scott wrote: > Dear Yang Li: > > > Happy New Year to you, too, (ahead of Feb. 7th). > > You certainly owe us no apology; the reverse may not be true. > > Your question is an important one, as is what you have written below. > > I'm not certain I have a completely satisfactory answer. > > The reason is that ideal bond lengths may or may not be "true" in the > sense that the value is established by social consensus, and is thus > guaranteed to be perfectly accurate, even though it may be quite precise. > > Because of this, and because of natural deviations from ideality (which > really only become trustworthy observations at extremely high resolution), > a certain amount of "wiggle room" is typically allowed in terms of rmsd. > > The more conservative the refinement, the smaller the rmsd from ideality > will be. > > Some people believe 0.02 Å deviation from ideality is reasonable, based on > the accuracy of the dictionary values of bond lengths and angles; others > consider that to be "too sloppy" and a way to artificially deflate > Rfactors. > > I seem to have detected a tendency in the literature to aim for about 0.01 > Å deviation. The new refinement program phenix.refine, which is supposed > to optimize weighting between X-ray terms and stereochemical constraints > automatically, seems to settle in at quite conservative values, such as > 0.005 Å, whereas with refmac, I can't seem to get the geometry any more > ideal than 0.005 Å even if I try to idealize a structure in the absence of > X-ray data. > > So, like you, I am a bit confused, and wouldn't mind hearing more from the > experts. > > All the best, > > Bill > > > > > > > yang li wrote: >> Dear All, >> I am very sorry to involve you into such insignificance >> discussion, >> I >> have reached agreement >> with Prof Gerard, please stop talking about things beyond science, >> thanks! >> I read a book today, which said "A refined model should exhibit >> rms >> deviations of no more >> than 0.02A for bond length and 4 for bond angels", I just wonder about >> the >> standard of the >> bond length and the bond angel. I think most of you have read similar >> words! >> But maybe I >> didnot express clearly and made some phrasal mistakes. >> At last, happy new year to you all--though very late! >> >> >> Sincerely! >> Yang Li >> >
Re: [ccp4bb] bond lengths, angles, ideality and refinements
Sorry, that should have read "because the value is established by social consensus, it is thus NOT guaranteed to be perfectly accurate, ..." In other words, one can imagine some source of systematic error in establishing an ideal bond length. For example, the crystal packing environment of small molecules might tend to distort a bond by a couple hundredths of an Ångstrom. William Scott wrote: > Dear Yang Li: > > > Happy New Year to you, too, (ahead of Feb. 7th). > > You certainly owe us no apology; the reverse may not be true. > > Your question is an important one, as is what you have written below. > > I'm not certain I have a completely satisfactory answer. > > The reason is that ideal bond lengths may or may not be "true" in the > sense that the value is established by social consensus, and is thus > guaranteed to be perfectly accurate, even though it may be quite precise. > > Because of this, and because of natural deviations from ideality (which > really only become trustworthy observations at extremely high resolution), > a certain amount of "wiggle room" is typically allowed in terms of rmsd. > > The more conservative the refinement, the smaller the rmsd from ideality > will be. > > Some people believe 0.02 Å deviation from ideality is reasonable, based on > the accuracy of the dictionary values of bond lengths and angles; others > consider that to be "too sloppy" and a way to artificially deflate > Rfactors. > > I seem to have detected a tendency in the literature to aim for about 0.01 > Å deviation. The new refinement program phenix.refine, which is supposed > to optimize weighting between X-ray terms and stereochemical constraints > automatically, seems to settle in at quite conservative values, such as > 0.005 Å, whereas with refmac, I can't seem to get the geometry any more > ideal than 0.005 Å even if I try to idealize a structure in the absence of > X-ray data. > > So, like you, I am a bit confused, and wouldn't mind hearing more from the > experts. > > All the best, > > Bill > > > > > > > yang li wrote: >> Dear All, >> I am very sorry to involve you into such insignificance >> discussion, >> I >> have reached agreement >> with Prof Gerard, please stop talking about things beyond science, >> thanks! >> I read a book today, which said "A refined model should exhibit >> rms >> deviations of no more >> than 0.02A for bond length and 4 for bond angels", I just wonder about >> the >> standard of the >> bond length and the bond angel. I think most of you have read similar >> words! >> But maybe I >> didnot express clearly and made some phrasal mistakes. >> At last, happy new year to you all--though very late! >> >> >> Sincerely! >> Yang Li >> >