Re: [ccp4bb] Highest shell standards

Ulrich Genick Wed, 21 Mar 2007 10:34:23 -0800

Here are my 2-3 cents worth on the topic:

The first thing to keep in mind is that the goal of a structuredeterminationis not to get the best stats or to claim the highest possibleresolution.

The goal is to get the best possible structure and to be confident that
observed features in a structure are real and not the result of noise.

From that perspective, if any of the conclusions one draws from astructurechange depending on whether one includes data with an I/sigI in thehighest

resolution shell of 2 or 1, one probably treads on thin ice.

The general guide that one should include only data, for which theshell's average

 I/sigI > 2 comes from the following simple consideration.


F/sigF = 2 I/sigI

So if you include data with an I/sigI of 2 then your F/sigF =4. Inother words you will

have a roughly 25% experimental uncertainty in your F.

Now assume that you actually knew the structure of your protein andyou wouldcalculate the crystallographic R-factor between the Fcalcs from yourtrue structure and the

observed F.

In this situation, you would expect to get a crystallographic R-factor around 25%,simply because of the average error in your experimental structurefactor.Since most macromolecular structures have R-factors around 20%, itmakes little

sense to include data, where the experimental uncertainty alone will
guarantee that your R-factor will be worse.

Of course, these days maximum-likely-hood refinement will just downweight

such data and all you do is to burn CPU cycles.


If you actually want to do a semi rigorous test of where you should stop

including data, simply include increasingly higher resolution data inyour

refinement and see if your structure improves.
If you have really high resolution data (i.e.  better than 1.2 Angstrom)

you can do matrix inversion in SHELX and get estimated standarddeviations (esd)for your refined parameters. As you include more and more data theesds shouldinitially decrease. Simply keep including higher resolution datauntil your esds

start to increase again.

Similarly, for lower resolution data you can monitor some molecularparameters, which are notincluded in the stereochemical restraints and see, if the inclusionof higher-resolution data makes theagreement between the observed and expected parameters better. Forexample SHELX does notrestrain torsion angles in aliphatic portions of side chains. If yourstructure improves, those

angles should cluster more tightly around +60 -60 and 180...




Cheers,

Ulrich

Could someone point me to some standards for data quality,especially for publishing structures? I'm wondering in particularabout highest shell completeness, multiplicity, sigma and Rmerge.
A co-worker pointed me to a '97 article by Kleywegt and Jones:

http://xray.bmc.uu.se/gerard/gmrp/gmrp.html
"To decide at which shell to cut off the resolution, we nowadaystend to use the following criteria for the highest shell:completeness > 80 %, multiplicity > 2, more than 60 % of thereflections with I > 3 sigma(I), and Rmerge < 40 %. In our opinion,it is better to have a good 1.8 Å structure, than a poor 1.637 Åstructure."
Are these recommendations still valid with maximum likelihoodmethods? We tend to use more data, especially in terms of theRmerge and sigma cuttoff.
Thanks in advance,

Shane Atwell

Re: [ccp4bb] Highest shell standards

Reply via email to