Re: [ccp4bb]: superimpose

Douglas L. Theobald Wed, 11 Oct 2006 12:46:21 -0700

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***



On Oct 11, 2006, at 2:52 PM, Santarsiero, Bernard D. wrote:

It really has nothing to do with least-squares itself.

But it does. When I've used the term 'least-squares', I mean ordinary(unweighted) least-squares (OLS), which has been used historicallyfor superpositions. OLS is fundamentally predicated on two strongstatistical assumptions (as stated in the Gauss-Markov theorem): (1)all atoms in the superposition have the same variance and (2) none ofthe atoms are correlated with each other. Both of these OLSassumptions are false, in general, with macromolecular superpositions.

It's the implementation, and how you want to define the minimalfunction. You don't want to superimpose elements that are reallydifferent.

Sort of. The problem is knowing what is "really different" beforedoing a superposition. Probably the primary motivation for performinga superposition is to get an idea of what is different and what issimilar among multiple structures. In most cases, regions ofstructures that are different are also similar to some extent. Thusdifferent regions carry at least some structural information thatshould be incorporated in the superposition. Similar vs different isa matter of degree; it is not an absolute category. The maximumlikelihood (ML) method implemented in THESEUS naturally accounts forthese issues by weighting structural regions according to theirstructural similarity.

And not to argue the point much further, but least-squares IS amaximum likelihood estimator.

The method of least squares can be justified in terms of likelihood,given some additional assumptions, but least squares is not maximumlikelihood. Specifically, the OLS solution is identical to the MLsolution if you assume a Gaussian distribution for the data, and ifall data points have the same variance and are uncorrelated. On theother hand, the statistical justification for OLS is given by theGauss-Markov theorem, and it guarantees the optimality of the OLSsolution (by frequentist measures, not likelihoodist or Bayesianmeasures) regardless of whether the data have a Gaussian distributionor not.

IOW, it is best to realize that LS and ML are very different methods,historically, philosophically, and practically. In general, they willlead to different solutions to the same problem. In certain specialcases they can lead to the same solution, as can Bayesian methods,but that doesn't mean that OLS is ML or Bayesian.


Cheers,

Douglas

http://www.theseus3d.org/

bds

On Wed, October 11, 2006 1:34 pm, Douglas L. Theobald wrote:
On Oct 11, 2006, at 1:49 PM, Santarsiero, Bernard D. wrote:
the graphics program "O" is excellent at things like this.Basically you do a rough superposition with lsq-exp (explicit)and then improve it with lsq-imp. It leaves out calpha's thataren't close, but pulls everything else in.
Yes, the way that O does it with lsq is much preferred over manyother methods. However, it really is just a kludge to overcomethe problems with least-squares, an optimization method thatreally is inappropriate for the macromolecular superpositionproblem. The advantage of the maximum likelihood method thatTHESEUS implements, is that it removes the arbitrary andsubjective parameters for deciding what "isn't close". Maximumlikelihood instead inherently down-weights the variable partsexactly by the (statistically) proper amount.
Additionally, lsq does not do a bona fide simultaneoussuperposition with more than two structures, whereas THESEUS does.
Cheers,

Douglas
Bernie Santarsiero

On Wed, October 11, 2006 10:38 am, Douglas L. Theobald wrote:
On Oct 11, 2006, at 11:08 AM, Jenny wrote:
Hi, All,
I have three proteins that only differ in one big loop(resi46-59).So I'm trying to superimpose three proteins and keep thesame part fixed.( basically, superimpose by residue 1-45 andresidue 60-120).Is there easy way to do this?
Hi Jenny,

It's easy to do with THESEUS from the command line:

http://www.theseus3d.org/

For a least squares fit, just use something like:

theseus -l -s1-45:60-120 protein1.pdb protein2.pdb protein3.pdb

or equivalently, exclude the range with:

theseus -l -S46-59 protein1.pdb protein2.pdb protein3.pdb
However, since THESEUS uses maximum likelihood you shouldn'teven have to specify a residue range, just do:
theseus protein1.pdb protein2.pdb protein3.pdb

and it should work well.



^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

[EMAIL PROTECTED]
GPG key ID: 38E9EB53
https://www.molevo.org/keys/38E9EB53.gpgkey

             ^\
   /`  /^.  / /\
  / / /`/  / . /`
 / /  '   '
'

Re: [ccp4bb]: superimpose

Reply via email to