I would only like to iterate a small comment I posted before:

Should the cell parameters be inaccurate, optimization of weights by cross-validation (getting the best Rfree) will result in 'higher' RMSD. It is easy to think about it: if in a cell is measured to be 1% larger than in reality, all bonds would 'prefer' to be 1% larger than the 'correct' dictionary values, resulting in a higher RMSD to satisfy that and that structure would have the lowest Rfree because the X-ray data
would be fitted better.

I actually think that inaccurate cells are a big source of misery in many refinements. I have found the idea of WhatCheck to actually check your cell by looking at the projection of bond lengths of certain types along the cell axes most useful. I would hardly advocate to measure your cell that way, but going back to you data and looking at the cell again would be worth it.

To make it more fun, cells change during radiation damage, so ...

best regards, Tassos

On 9 Jan 2008, at 20:15, Ian Tickle wrote:

Hi William & others,

Indeed, phenix.refine uses cross-validation to optimise the scaling of the X-ray & B-factor weights. All I did was demonstrate that you can do essentially the same thing as phenix.refine but using Refmac instead. I don't claim to have done anything new, except I modified Refmac to print out the free likelihood and used that as a target function instead of Rfree, as suggested by Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423. Whatever value of the RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's based purely objectively on the experimental data, not on completely arbitrary and unjustifiable subjective choices, which is what Jaskolski et al. appear to be suggesting. Cross-validation is a well-established methodology in statistics, it's certainly not 'numerology'!

Of course then you have to come up with some theory to explain the experimental results, i.e. why the RMSD that comes out must always be <= the RMS standard uncertainty, but actually that's not difficult since the RMSD is related to the accuracy and the SU is related to the precision, and on the face of it there's no reason why these should be related at all (as Gerard nicely demonstrated with his dartboard analogy in Leeds!). Jaskolski et al.'s theory that always RMSD = <SU> regardless of resolution just doesn't fit the experimental results, and as every good scientist knows, it only takes one ugly fact to destroy a beautiful theory.

As you point out, setting a target value of 0.02 Ang or higher for the RMSD bonds and similarly for the angles, unless you have very high resolution data, will inevitably result in take-up of some fraction of the random experimental errors into the refined parameters, in order to inflate the RMSD/RMSZ's to their target values and reduce Rwork at the expense of Rfree - otherwise known as overfitting! It's not recommended practice to deliberately cause random errors (however small) to be added to your co- ordinates! This is obvious if you think about what happens at low resolution: there's no justification for refining individual xyz & B's, so the optimal procedure is to use constrained refinement with the torsion angles as parameters, or restrained refinement with *very* tight restraints (if that's feasible). Whether you use constrained refinement or its restrained equivalent, it will keep the bond lengths & angles fixed at the initial dictionary values so the RMSD's will be identically zero, or very nearly so, throughout the refinement.

Someone mentioned 'experienced crystallographers': actually since the distinction between RMSD & SU is purely a question of statistics not of crystallography, any crystallographic experience is unlikely to be relevant!

The other question you raised is why Refmac doesn't refine the RMSD's much nearer to zero - this is something I also commented on; also why the Rfree & LLfree plots are so noisy compared with those from CNS & phenix.refine. I think it's to do with rounding errors in the gradient calculation and/or optimisation code. Refmac may be using single precision, whereas phenix.refine may be using double - I'm just guessing, maybe the programmers could comment? This is something I would like to see improved, in order to make cross-validation with Refmac more reliable & useful.

Cheers

-- Ian

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of William Scott
Sent: 09 January 2008 17:32
To: William Scott
Cc: ccp4bb@jiscmail.ac.uk
Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements

Sorry, that should have read

"because the value is established by social consensus, it is thus NOT
guaranteed to be perfectly accurate, ..."

In other words, one can imagine some source of systematic error in
establishing an ideal bond length.  For example, the crystal packing
environment of small molecules might tend to distort a bond
by a couple
hundredths of an Ångstrom.


William Scott wrote:
Dear Yang Li:


Happy New Year to you, too, (ahead of Feb. 7th).

You certainly owe us no apology; the reverse may not be true.

Your question is an important one, as is what you have
written below.

I'm not certain I have a completely satisfactory answer.

The reason is that ideal bond lengths may or may not be
"true" in the
sense that the value is established by social consensus, and is thus
guaranteed to be perfectly accurate, even though it may be
quite precise.

Because of this, and because of natural deviations from
ideality (which
really only become trustworthy observations at extremely
high resolution),
a certain amount of "wiggle room" is typically allowed in
terms of rmsd.

The more conservative the refinement, the smaller the rmsd
from ideality
will be.

Some people believe 0.02 Å deviation from ideality is
reasonable, based on
the accuracy of the dictionary values of bond lengths and
angles; others
consider that to be "too sloppy" and a way to artificially deflate
Rfactors.

I seem to have detected a tendency in the literature to aim
for about 0.01
Å deviation.  The new refinement program phenix.refine,
which is supposed
to optimize weighting between X-ray terms and
stereochemical constraints
automatically, seems to settle in at quite conservative
values, such as
0.005 Å, whereas with refmac, I can't seem to get the
geometry any more
ideal than 0.005 Å even if I try to idealize a structure in
the absence of
X-ray data.

So, like you, I am a bit confused, and wouldn't mind
hearing more from the
experts.

All the best,

Bill






yang li wrote:
Dear All,
      I am very sorry to involve you into such insignificance
discussion,
I
have reached agreement
with Prof Gerard, please stop talking about things beyond science,
thanks!
      I read a book today, which said "A refined model
should exhibit
rms
deviations of no more
than 0.02A for bond length and 4 for bond angels", I just
wonder about
the
standard of the
bond length and the bond angel. I think most of you have
read similar
words!
But maybe I
didnot express clearly and made some phrasal mistakes.
      At last, happy new year to you all--though very late!


Sincerely!
Yang Li






Disclaimer
This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674

Reply via email to