*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
Hi Tassos,
Thanks for the comments; I'm pleasantly surprised at the level of
interest the superposition problem has elicited here.
On Oct 12, 2006, at 4:58 AM, Anastassis Perrakis wrote:
On the specific issue of superposition, I feel its much more
important to take into account the coordinate errors.
The short answer is that Theseus of course estimates the coordinate
errors, as they are necessary for the maximum likelihood
calculation. Technical answer below ...
In a proper ML formulation, I think this is (part of) your prior,
so if you don't know these (and as far as I could see Theseus does
not use these) your ML function would be fundamentally flawed by
definition, at least is the context of X-ray crystallography as
your experimental method for deriving model coordinates. (sorry if
indeed you do estimate proper coordinate errors and I missed that,
but could not see that in an admittedly quick scan of your web site
and paper).
In a Bayesian and likelihoodist framework, there are two
(nonexclusive) ways to treat the coordinate errors: one is based on
direct estimation from the data, the other is through prior
information. The Schneider paper you cite below estimates the errors
only via the latter method, by assuming that the errors are
proportional to the B-factors. Theseus does both, although currently
using B-factors is an undocumented feature [1]. If you wish to
include the prior information from the B-factors in Theseus, you can
use the '-b2' command line option.
However, contrary to your intuition (and also mine initially), the
prior coordinate error is not nearly as important as the error
component estimated from structural differences. In my experience,
prior B-factor information has a relatively minor effect on the final
ML superposition [2]. As is often the case in likelihood and
Bayesian treatments, the data dominate the prior.
In the case of protein superpositions, regions with high B-factors
are very unlikely to have similar conformations by chance. Thus
estimates of the coordinate error based on only structural
differences naturally assign a high variance to those regions with
high B-factors. On the other hand, if a region has low B-factors yet
large structural differences, the error component from conformational
differences overwhelms the component from the prior. In the end, the
prior B-factors provide mostly redundant information that can be
inferred from structural differences alone. You can verify this
yourself using the '-b2' flag in Theseus.
I am surprised no-one brought up ESCET as very useful tool for
objective superposition, especially in a crystallographic context.
Please read:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=11807243&query_hl=4
&itool=pubmed_docsum
Thanks for the ref. There are many similar treatments for finding
the "conformationally invariant" regions of structures so that one
can subsequently perform a reasonable least-squares (LS)
superposition. However, they all depend on arbitrary cutoffs (in
Schneider's case, the penalty term p and the distance parameter
\epsilon_l). Again, this is one of the great advantages of the ML
method vs LS -- there is no need to specify subjective cutoffs, as
the ML method already accounts for structural variability in a
statistically proper manner. I'll also note that the objective ML
solution provides a nice formal justification for the common and
intuitive practice of structural biologists, that conformationally
variable regions should be down-weighted when superposing
macromolecules.
Cheers,
Douglas
http://www.theseus3d.org
[1] The ML solution using crystallographic B-factors (and/or NMR
order parameters) as prior variances is a non-trivial problem, and it
will be given in a forthcoming publication.
[2] In contrast, prior B-factor information usually has a large
effect when used as a basis for weights in a weighted LS
superposition, which may explain our intuition.
On a more general note, one can 'hide' behind ML formalisms many
things. Understanding the source of errors in your experimental
quantities though, ie your experiment in depth, is also very
important. And, sometimes, it lets you get away with bad
statistics: structures were solved with LS methods, no wonder ML
was a massive step forward by all means, but thats because it was
implemented for crystallographers by crystallographers having a
clear understanding of the nature of errors and the exact
experimental methods is use, not only of bayess theorem.
Tassos - newly appointed press officer for Thomas R Schneider ;-)
On Oct 11, 2006, at 21:40, Douglas L. Theobald wrote:
On Oct 11, 2006, at 2:52 PM, Santarsiero, Bernard D. wrote:
It really has nothing to do with least-squares itself.
But it does. When I've used the term 'least-squares', I mean
ordinary (unweighted) least-squares (OLS), which has been used
historically for superpositions. OLS is fundamentally predicated
on two strong statistical assumptions (as stated in the Gauss-
Markov theorem): (1) all atoms in the superposition have the same
variance and (2) none of the atoms are correlated with each
other. Both of these OLS assumptions are false, in general, with
macromolecular superpositions.
It's the implementation, and how you want to define the minimal
function. You don't want to superimpose elements that are really
different.
Sort of. The problem is knowing what is "really different" before
doing a superposition. Probably the primary motivation for
performing a superposition is to get an idea of what is different
and what is similar among multiple structures. In most cases,
regions of structures that are different are also similar to some
extent. Thus different regions carry at least some structural
information that should be incorporated in the superposition.
Similar vs different is a matter of degree; it is not an absolute
category. The maximum likelihood (ML) method implemented in
THESEUS naturally accounts for these issues by weighting
structural regions according to their structural similarity.
And not to argue the point much further, but least-squares IS a
maximum likelihood estimator.
The method of least squares can be justified in terms of
likelihood, given some additional assumptions, but least squares
is not maximum likelihood. Specifically, the OLS solution is
identical to the ML solution if you assume a Gaussian distribution
for the data, and if all data points have the same variance and
are uncorrelated. On the other hand, the statistical
justification for OLS is given by the Gauss-Markov theorem, and it
guarantees the optimality of the OLS solution (by frequentist
measures, not likelihoodist or Bayesian measures) regardless of
whether the data have a Gaussian distribution or not.
IOW, it is best to realize that LS and ML are very different
methods, historically, philosophically, and practically. In
general, they will lead to different solutions to the same
problem. In certain special cases they can lead to the same
solution, as can Bayesian methods, but that doesn't mean that OLS
is ML or Bayesian.
Cheers,
Douglas
http://www.theseus3d.org/
bds
On Wed, October 11, 2006 1:34 pm, Douglas L. Theobald wrote:
On Oct 11, 2006, at 1:49 PM, Santarsiero, Bernard D. wrote:
the graphics program "O" is excellent at things like this.
Basically you do a rough superposition with lsq-exp (explicit)
and then improve it with lsq-imp. It leaves out calpha's that
aren't close, but pulls everything else in.
Yes, the way that O does it with lsq is much preferred over many
other methods. However, it really is just a kludge to overcome
the problems with least-squares, an optimization method that
really is inappropriate for the macromolecular superposition
problem. The advantage of the maximum likelihood method that
THESEUS implements, is that it removes the arbitrary and
subjective parameters for deciding what "isn't close". Maximum
likelihood instead inherently down-weights the variable parts
exactly by the (statistically) proper amount.
Additionally, lsq does not do a bona fide simultaneous
superposition with more than two structures, whereas THESEUS does.
Cheers,
Douglas
Bernie Santarsiero
On Wed, October 11, 2006 10:38 am, Douglas L. Theobald wrote:
On Oct 11, 2006, at 11:08 AM, Jenny wrote:
Hi, All,
I have three proteins that only differ in one big loop(resi
46-59).So I'm trying to superimpose three proteins and keep
the same part fixed.( basically, superimpose by residue 1-45
and residue 60-120).Is there easy way to do this?
Hi Jenny,
It's easy to do with THESEUS from the command line:
http://www.theseus3d.org/
For a least squares fit, just use something like:
theseus -l -s1-45:60-120 protein1.pdb protein2.pdb protein3.pdb
or equivalently, exclude the range with:
theseus -l -S46-59 protein1.pdb protein2.pdb protein3.pdb
However, since THESEUS uses maximum likelihood you shouldn't
even have to specify a residue range, just do:
theseus protein1.pdb protein2.pdb protein3.pdb
and it should work well.
^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Department of Biochemistry
Brandeis University
Waltham, MA 02454-9110
[EMAIL PROTECTED]
GPG key ID: 38E9EB53
https://www.molevo.org/keys/38E9EB53.gpgkey
^\
/` /^. / /\
/ / /`/ / . /`
/ / ' '
'