Hi William & others,

Indeed, phenix.refine uses cross-validation to optimise the scaling of the 
X-ray & B-factor weights.  All I did was demonstrate that you can do 
essentially the same thing as phenix.refine but using Refmac instead.  I don't 
claim to have done anything new, except I modified Refmac to print out the free 
likelihood and used that as a target function instead of Rfree, as suggested by 
Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423.  Whatever value of the 
RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's 
based purely objectively on the experimental data, not on completely arbitrary 
and unjustifiable subjective choices, which is what Jaskolski et al. appear to 
be suggesting.  Cross-validation is a well-established methodology in 
statistics, it's certainly not 'numerology'!

Of course then you have to come up with some theory to explain the experimental 
results, i.e. why the RMSD that comes out must always be <= the RMS standard 
uncertainty, but actually that's not difficult since the RMSD is related to the 
accuracy and the SU is related to the precision, and on the face of it there's 
no reason why these should be related at all (as Gerard nicely demonstrated 
with his dartboard analogy in Leeds!).  Jaskolski et al.'s theory that always 
RMSD = <SU> regardless of resolution just doesn't fit the experimental results, 
and as every good scientist knows, it only takes one ugly fact to destroy a 
beautiful theory.

As you point out, setting a target value of 0.02 Ang or higher for the RMSD 
bonds and similarly for the angles, unless you have very high resolution data, 
will inevitably result in take-up of some fraction of the random experimental 
errors into the refined parameters, in order to inflate the RMSD/RMSZ's to 
their target values and reduce Rwork at the expense of Rfree - otherwise known 
as overfitting!  It's not recommended practice to deliberately cause random 
errors (however small) to be added to your co-ordinates!  This is obvious if 
you think about what happens at low resolution: there's no justification for 
refining individual xyz & B's, so the optimal procedure is to use constrained 
refinement with the torsion angles as parameters, or restrained refinement with 
*very* tight restraints (if that's feasible).  Whether you use constrained 
refinement or its restrained equivalent, it will keep the bond lengths & angles 
fixed at the initial dictionary values so the RMSD's will be identically zero, 
or very nearly so, throughout the refinement.

Someone mentioned 'experienced crystallographers': actually since the 
distinction between RMSD & SU is purely a question of statistics not of 
crystallography, any crystallographic experience is unlikely to be relevant!

The other question you raised is why Refmac doesn't refine the RMSD's much 
nearer to zero - this is something I also commented on; also why the Rfree & 
LLfree plots are so noisy compared with those from CNS & phenix.refine.  I 
think it's to do with rounding errors in the gradient calculation and/or 
optimisation code.  Refmac may be using single precision, whereas phenix.refine 
may be using double - I'm just guessing, maybe the programmers could comment?  
This is something I would like to see improved, in order to make 
cross-validation with Refmac more reliable & useful.

Cheers

-- Ian

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of William Scott
> Sent: 09 January 2008 17:32
> To: William Scott
> Cc: ccp4bb@jiscmail.ac.uk
> Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements
> 
> Sorry, that should have read
> 
> "because the value is established by social consensus, it is thus NOT
> guaranteed to be perfectly accurate, ..."
> 
> In other words, one can imagine some source of systematic error in
> establishing an ideal bond length.  For example, the crystal packing
> environment of small molecules might tend to distort a bond 
> by a couple
> hundredths of an Ångstrom.
> 
> 
> William Scott wrote:
> > Dear Yang Li:
> >
> >
> > Happy New Year to you, too, (ahead of Feb. 7th).
> >
> > You certainly owe us no apology; the reverse may not be true.
> >
> > Your question is an important one, as is what you have 
> written below.
> >
> > I'm not certain I have a completely satisfactory answer.
> >
> > The reason is that ideal bond lengths may or may not be 
> "true" in the
> > sense that the value is established by social consensus, and is thus
> > guaranteed to be perfectly accurate, even though it may be 
> quite precise.
> >
> > Because of this, and because of natural deviations from 
> ideality (which
> > really only become trustworthy observations at extremely 
> high resolution),
> > a certain amount of "wiggle room" is typically allowed in 
> terms of rmsd.
> >
> > The more conservative the refinement, the smaller the rmsd 
> from ideality
> > will be.
> >
> > Some people believe 0.02 Å deviation from ideality is 
> reasonable, based on
> > the accuracy of the dictionary values of bond lengths and 
> angles; others
> > consider that to be "too sloppy" and a way to artificially deflate
> > Rfactors.
> >
> > I seem to have detected a tendency in the literature to aim 
> for about 0.01
> > Å deviation.  The new refinement program phenix.refine, 
> which is supposed
> > to optimize weighting between X-ray terms and 
> stereochemical constraints
> > automatically, seems to settle in at quite conservative 
> values, such as
> > 0.005 Å, whereas with refmac, I can't seem to get the 
> geometry any more
> > ideal than 0.005 Å even if I try to idealize a structure in 
> the absence of
> > X-ray data.
> >
> > So, like you, I am a bit confused, and wouldn't mind 
> hearing more from the
> > experts.
> >
> > All the best,
> >
> > Bill
> >
> >
> >
> >
> >
> >
> > yang li wrote:
> >> Dear All,
> >>       I am very sorry to involve you into such insignificance
> >> discussion,
> >> I
> >> have reached agreement
> >> with Prof Gerard, please stop talking about things beyond science,
> >> thanks!
> >>       I read a book today, which said "A refined model 
> should exhibit
> >> rms
> >> deviations of no more
> >> than 0.02A for bond length and 4 for bond angels", I just 
> wonder about
> >> the
> >> standard of the
> >> bond length and the bond angel. I think most of you have 
> read similar
> >> words!
> >> But maybe I
> >> didnot express clearly and made some phrasal mistakes.
> >>       At last, happy new year to you all--though very late!
> >>
> >>
> >> Sincerely!
> >> Yang Li
> >>
> >
> 
> 


Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy 
all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674

Reply via email to