Bernhard Rupp wrote:
People also felt that the RMSD bond/angle of 0.016/1.6 was still a little
high.
This was subject of a discussion before on the board and I still don't
understand it:
If I recall correctly, even in highly accurate and precise
small molecule structures, the rmsd of corresponding
bonds and angles are ~0.014A and 1.8deg.
It always seems to me that getting these values much below is not a sign
of crystallographic prowess but over-restraining them?
Is it just that - given good resolution in the first place - the balance
of restraints (matrix weight) vs low R (i.e., Xray data) gives the best
Rfree or lowest gap at (artificially?) lower rmsd?
Is that then the best model?
I understand that even thermal vibration accounts for about 1.7 deg
angle deviation - are lower rmsd deviations then a manifestation
of low temp? But that does not seem to be much of an effect, if
one looks at the tables from the CSD small mol data (shown in
nicely in comparison to the 91 Engh/Huber data in Tables F, pp385).
This is an on-going topic of discussion so let me put in my two cents.
We calculate libraries of "ideal geometry" based on precise, small
molecule structures. When these small molecule crystal structures are
compared to our derived libraries they are found to contain deviations.
These deviations are larger than the uncertainty in these models and
are presumed to reflect real features of the molecule; perturbations
due to the local environment in the crystal.
These same perturbations are present in our crystals and we should
expect to find deviations from "ideal geometry" on the same scale as
that seen in the precise models. This expectation lead to the practice
in the 1980's of setting r.m.s. targets of 0.02A and 3 degrees for
agreement to bond length and angle libraries.
While this seems quite reasonable, we are left with the question:
Are the deviations from "ideal geometry" we see in a particular model
in any way related to the actual deviations of the molecule in the
crystal? The uncertainties (su's) of the bond lengths in a model based
on 4A diffraction data are huge compared to the absolute value of the
true deviation. For example, if the model had a deviation from "ideal
geometry" of 0.02A but the uncertainty of the distance is 0.2A can we
say that we have detected a signal that is significantly different than
zero, the null hypothesis?
If we have a model with a collection of deviations from "ideal geometry"
but we have no expectation that those deviations are indicative of the
true deviations of the molecule in the crystal, are those deviations
serving any purpose? If they do not reflect any property of the crystal
they are noise and should be filtered out.
By this argument a model based on 4A resolution diffraction data should
have no deviation from "idea geometry" while one based on 0.9A diffraction
data should have no restraints on "ideal geometry" since the deviations
are probably all real and significant (except for specific regions of
the molecule that have problems).
The problem we all face is the vast area between these extremes,
compounded by our inability to calculate proper uncertainties for the
parameters of our models. The free R is our current tool-of-choice when
it comes to attempting to judge the statistical significance of aspects
of our model, without performing proper statistical tests which we don't
know how to do. If we allow our model the freedom to deviate from our
library and the free R improves a "significant" (??) amount then the
resulting deviations must have some similarity to the true deviations
in the crystal, but if the free R does not improve then the deviations
must not be related to reality and should be suppressed. This is the
type of assumption we make whenever we use the free R to make a choice.
What we end of doing is not making a yes/no decision but instead we
variably suppress the amplitude of the deviations from "idea geometry"
and that is harder to justify. I think a reasonable argument can be
made, but I have already written too many words in this letter. It doesn't
really matter because we left the road of mathematical rigor when we took
the R free path.
Unfortunately, many people have ignored what Brunger said in Methods
in Enzymology about choosing your X-ray/geometry weight based on the
free R and just starting saying "the rms bond length deviation must
be 0.007A". The deviations from "idea geometry" of your model should be
no more or no less than what you can justifiably claim is a reflection
of the true state of the molecule in your crystal.
Dale Tronrud