Dear all,
what I am missing in this whole thread is the question
on what is the "true" value of a given bond distance. So
far, everybody seems to assume that the "ideal" value
is equivalent to the "true" value, and that deviations
from the ideal values must therefore be outliers.
I challenge this notion. Most protein structures are
strained in some sense, which is not surprising given
that the degrees of freedom to fold a linear chain into
a tertiary structure are limited. This strain will inevitably
lead to deviations of geometric parameters from their
ideal values.
This is best illustrated by Ramachandran "outliers",
which are perfectly supported by electron density.
The strain caused by any one of them will distribute itself
over all neighbouring bond lengths and angles as well as
over the torsion angles.
In this context, the current definition of what an outlier
is, does not really make sense to me.
Best, Manfred
Am 09.11.2022 um 09:17 schrieb Dale Tronrud:
And now it is time for an "old man story". Back in the early 1990's
the Brookhaven PDB started to worry about "validating" the models
being deposited. One of the things they implemented was to add to the
header of the PDB a complete list of all bond lengths and angles that
deviated from the library value by more than 3 sigma.
In Brian Matthews' lab a student solved the structure of
beta-galactosidase which is composed of over a thousand residues and
the crystal has 16-fold ncs. The model had over 130,000 atoms, a
record for the time. The PDB declared that this was one of the worst
models they had ever seen because it had hundreds of geometry
restraints violated by greater than 3 sigma. The list in their header
went on and on.
Our response, of course, was that this model had over 130,000 bonds
and 180,000 angles and if you assume a Normal distribution the number
of 3 sigma deviants were exactly the number expected - Which is what
the geometry rmsds were saying.
Dale E. Tronrud
On 11/8/2022 3:25 PM, James Holton wrote:
Thank you Ian for your quick response!
I suppose what I'm really trying to do is put a p-value on the
"geometry" of a given PDB file. As in: what are the odds the
deviations from ideality of this model are due to chance?
I am leaning toward the need to take all the deviations in the
structure together as a set, but, as Joao just noted, that it just
"feels wrong" to tolerate a 3-sigma deviate. Even more wrong to
tolerate 4 sigma, 5 sigma. And 6 sigma deviates are really difficult
to swallow unless your have trillions of data points.
To put it down in equations, is the p-value of a structure with 1000
bonds in it with one 3-sigma deviate given by:
a) p = 1-erf(3/sqrt(2))
or
b) p = 1-erf(3/sqrt(2))**1000
or
c) something else?
On 11/8/2022 2:56 PM, Ian Tickle wrote:
Hi James
I don't think it's meaningful to ask whether the deviation of a
single bond length (or anything else that's single) from its
expected value is significant, since as you say there's always some
finite probability that it occurred purely by chance. Statistics can
only meaningfully be applied to samples of a 'reasonable' size. I
know there are statistics designed for small samples but not for
samples of size 1 ! It's more meaningful to talk about
distributions. For example if 1% of the sample contained deviations
> 3 sigma when you expected there to be only 0.3 %, that is probably
significant (but it still has a finite probability of occurring by
chance), as would be finding no deviations > 3 sigma (for a
reasonably large sample to avoid sampling errors).
Cheers
-- Ian
On Tue, Nov 8, 2022, 22:22 James Holton <jmhol...@lbl.gov> wrote:
OK, so lets suppose there is this bond in your structure that is
stretched a bit. Is that for real? Or just a random fluke? Let's
say
for example its a CA-CB bond that is supposed to be 1.529 A long,
but in
your model its 1.579 A. This is 0.05 A too long. Doesn't seem like
much, right? But the "sigma" given to such a bond in our geometry
libraries is 0.016 A. These sigmas are typically derived from a
database of observed bonds of similar type found in highly accurate
structures, like small molecules. So, that makes this a 3-sigma
outlier.
Assuming the distribution of deviations is Gaussian, that's a
pretty
unlikely thing to happen. You expect 3-sigma deviates to appear
less
than 0.3% of the time. So, is that significant?
But, then again, there are lots of other bonds in the structure.
Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at least
once. That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation of 1.0.
Then, look
for the maximum over all 1000 trials. Is that one > 3 sigma? It
probably
is. If you do this "experiment" millions of times it turns out
seeing at
least one 3-sigma deviate in 1000 tries is very common.
Specifically,
about 93% of the time. It is rare indeed to have every member of a
1000-deviate set all lie within 3 sigmas. So, we have gone from
one
3-sigma deviate being highly unlikely to being a virtual
certainty if
you look at enough samples.
So, my question is: is a 3-sigma deviate significant? Is it
significant
only if you have one bond in the structure? What about angles?
What if
you have 500 bonds and 500 angles? Do they count as 1000 deviates
together? Or separately?
I'm sure the more mathematically inclined out there will have some
intelligent answers for the rest of us, however, if you are not a
mathematician, how about a vote? Is a 3-sigma bond length
deviation
significant? Or not?
Looking forward to both kinds of responses,
-James Holton
MAD Scientist
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
This message was issued to members of www.jiscmail.ac.uk/CCP4BB
<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by
www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions
are available at https://www.jiscmail.ac.uk/policyandsecurity/
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/
--
Dr. Manfred S. Weiss
Macromolecular Crystallography
Helmholtz-Zentrum Berlin
Albert-Einstein-Str. 15
D-12489 Berlin
________________________________
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Dr. Volkmar Dietz, stv. Vorsitzende Dr. Jutta
Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech, Thomas Frederking
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Postadresse:
Hahn-Meitner-Platz 1
14109 Berlin
Deutschland
Diese E-Mail kann vertrauliche und/oder rechtlich geschützte Informationen
enthalten. Wenn Sie diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den*die Absender*in und vernichten Sie diese Mail. Das unerlaubte
Kopieren, die Veröffentlichung sowie die unbefugte Weitergabe dieser Mail ist
nicht gestattet.
This email may contain confidential and/or proprietary information. If you have
received this e-mail in error, please inform the sender immediately and destroy
this e-mail. Unauthorized copying, publishing or distribution of this e-mail is
not permitted.
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/