Here are my 2-3 cents worth on the topic:

The first thing to keep in mind is that the goal of a structure determination is not to get the best stats or to claim the highest possible resolution.
The goal is to get the best possible structure and to be confident that
observed features in a structure are real and not the result of noise.

From that perspective, if any of the conclusions one draws from a structure change depending on whether one includes data with an I/sigI in the highest
resolution shell of 2 or 1, one probably treads on thin ice.

The general guide that one should include only data, for which the shell's average
 I/sigI > 2 comes from the following simple consideration.


F/sigF = 2 I/sigI

So if you include data with an I/sigI of 2 then your F/sigF =4. In other words you will
have a roughly 25% experimental uncertainty in your F.
Now assume that you actually knew the structure of your protein and you would calculate the crystallographic R-factor between the Fcalcs from your true structure and the
observed F.
In this situation, you would expect to get a crystallographic R- factor around 25%, simply because of the average error in your experimental structure factor. Since most macromolecular structures have R-factors around 20%, it makes little
sense to include data, where the experimental uncertainty alone will
guarantee that your R-factor will be worse.
Of course, these days maximum-likely-hood refinement will just down weight
such data and all you do is to burn CPU cycles.


If you actually want to do a semi rigorous test of where you should stop
including data, simply include increasingly higher resolution data in your
refinement and see if your structure improves.
If you have really high resolution data (i.e.  better than 1.2 Angstrom)
you can do matrix inversion in SHELX and get estimated standard deviations (esd) for your refined parameters. As you include more and more data the esds should initially decrease. Simply keep including higher resolution data until your esds
start to increase again.

Similarly, for lower resolution data you can monitor some molecular parameters, which are not included in the stereochemical restraints and see, if the inclusion of higher-resolution data makes the agreement between the observed and expected parameters better. For example SHELX does not restrain torsion angles in aliphatic portions of side chains. If your structure improves, those
angles should cluster more tightly around +60 -60 and 180...




Cheers,

Ulrich


Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge.

A co-worker pointed me to a '97 article by Kleywegt and Jones:

http://xray.bmc.uu.se/gerard/gmrp/gmrp.html

"To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness > 80 %, multiplicity > 2, more than 60 % of the reflections with I > 3 sigma(I), and Rmerge < 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure."

Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff.

Thanks in advance,

Shane Atwell

Reply via email to