Re: [ccp4bb]: superimpose

Douglas L. Theobald Thu, 12 Oct 2006 09:05:21 -0700

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***



Hi Tassos,

Thanks for the comments; I'm pleasantly surprised at the level ofinterest the superposition problem has elicited here.


On Oct 12, 2006, at 4:58 AM, Anastassis Perrakis wrote:

On the specific issue of superposition, I feel its much moreimportant to take into account the coordinate errors.

The short answer is that Theseus of course estimates the coordinateerrors, as they are necessary for the maximum likelihoodcalculation. Technical answer below ...

In a proper ML formulation, I think this is (part of) your prior,so if you don't know these (and as far as I could see Theseus doesnot use these) your ML function would be fundamentally flawed bydefinition, at least is the context of X-ray crystallography asyour experimental method for deriving model coordinates. (sorry ifindeed you do estimate proper coordinate errors and I missed that,but could not see that in an admittedly quick scan of your web siteand paper).

In a Bayesian and likelihoodist framework, there are two(nonexclusive) ways to treat the coordinate errors: one is based ondirect estimation from the data, the other is through priorinformation. The Schneider paper you cite below estimates the errorsonly via the latter method, by assuming that the errors areproportional to the B-factors. Theseus does both, although currentlyusing B-factors is an undocumented feature [1]. If you wish toinclude the prior information from the B-factors in Theseus, you canuse the '-b2' command line option.

However, contrary to your intuition (and also mine initially), theprior coordinate error is not nearly as important as the errorcomponent estimated from structural differences. In my experience,prior B-factor information has a relatively minor effect on the finalML superposition [2]. As is often the case in likelihood andBayesian treatments, the data dominate the prior.

In the case of protein superpositions, regions with high B-factorsare very unlikely to have similar conformations by chance. Thusestimates of the coordinate error based on only structuraldifferences naturally assign a high variance to those regions withhigh B-factors. On the other hand, if a region has low B-factors yetlarge structural differences, the error component from conformationaldifferences overwhelms the component from the prior. In the end, theprior B-factors provide mostly redundant information that can beinferred from structural differences alone. You can verify thisyourself using the '-b2' flag in Theseus.

I am surprised no-one brought up ESCET as very useful tool forobjective superposition, especially in a crystallographic context.
Please read:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=11807243&query_hl=4&itool=pubmed_docsum

Thanks for the ref. There are many similar treatments for findingthe "conformationally invariant" regions of structures so that onecan subsequently perform a reasonable least-squares (LS)superposition. However, they all depend on arbitrary cutoffs (inSchneider's case, the penalty term p and the distance parameter\epsilon_l). Again, this is one of the great advantages of the MLmethod vs LS -- there is no need to specify subjective cutoffs, asthe ML method already accounts for structural variability in astatistically proper manner. I'll also note that the objective MLsolution provides a nice formal justification for the common andintuitive practice of structural biologists, that conformationallyvariable regions should be down-weighted when superposingmacromolecules.


Cheers,

Douglas

http://www.theseus3d.org

[1] The ML solution using crystallographic B-factors (and/or NMRorder parameters) as prior variances is a non-trivial problem, and itwill be given in a forthcoming publication.

[2] In contrast, prior B-factor information usually has a largeeffect when used as a basis for weights in a weighted LSsuperposition, which may explain our intuition.

On a more general note, one can 'hide' behind ML formalisms manythings. Understanding the source of errors in your experimentalquantities though, ie your experiment in depth, is also veryimportant. And, sometimes, it lets you get away with badstatistics: structures were solved with LS methods, no wonder MLwas a massive step forward by all means, but thats because it wasimplemented for crystallographers by crystallographers having aclear understanding of the nature of errors and the exactexperimental methods is use, not only of bayess theorem.
Tassos - newly appointed press officer for Thomas R Schneider ;-)

On Oct 11, 2006, at 21:40, Douglas L. Theobald wrote:
On Oct 11, 2006, at 2:52 PM, Santarsiero, Bernard D. wrote:
It really has nothing to do with least-squares itself.
But it does. When I've used the term 'least-squares', I meanordinary (unweighted) least-squares (OLS), which has been usedhistorically for superpositions. OLS is fundamentally predicatedon two strong statistical assumptions (as stated in the Gauss-Markov theorem): (1) all atoms in the superposition have the samevariance and (2) none of the atoms are correlated with eachother. Both of these OLS assumptions are false, in general, withmacromolecular superpositions.
It's the implementation, and how you want to define the minimalfunction. You don't want to superimpose elements that are reallydifferent.
Sort of. The problem is knowing what is "really different" beforedoing a superposition. Probably the primary motivation forperforming a superposition is to get an idea of what is differentand what is similar among multiple structures. In most cases,regions of structures that are different are also similar to someextent. Thus different regions carry at least some structuralinformation that should be incorporated in the superposition.Similar vs different is a matter of degree; it is not an absolutecategory. The maximum likelihood (ML) method implemented inTHESEUS naturally accounts for these issues by weightingstructural regions according to their structural similarity.
And not to argue the point much further, but least-squares IS amaximum likelihood estimator.
The method of least squares can be justified in terms oflikelihood, given some additional assumptions, but least squaresis not maximum likelihood. Specifically, the OLS solution isidentical to the ML solution if you assume a Gaussian distributionfor the data, and if all data points have the same variance andare uncorrelated. On the other hand, the statisticaljustification for OLS is given by the Gauss-Markov theorem, and itguarantees the optimality of the OLS solution (by frequentistmeasures, not likelihoodist or Bayesian measures) regardless ofwhether the data have a Gaussian distribution or not.
IOW, it is best to realize that LS and ML are very differentmethods, historically, philosophically, and practically. Ingeneral, they will lead to different solutions to the sameproblem. In certain special cases they can lead to the samesolution, as can Bayesian methods, but that doesn't mean that OLSis ML or Bayesian.
Cheers,

Douglas

http://www.theseus3d.org/
bds

On Wed, October 11, 2006 1:34 pm, Douglas L. Theobald wrote:
On Oct 11, 2006, at 1:49 PM, Santarsiero, Bernard D. wrote:
the graphics program "O" is excellent at things like this.Basically you do a rough superposition with lsq-exp (explicit)and then improve it with lsq-imp. It leaves out calpha's thataren't close, but pulls everything else in.
Yes, the way that O does it with lsq is much preferred over manyother methods. However, it really is just a kludge to overcomethe problems with least-squares, an optimization method thatreally is inappropriate for the macromolecular superpositionproblem. The advantage of the maximum likelihood method thatTHESEUS implements, is that it removes the arbitrary andsubjective parameters for deciding what "isn't close". Maximumlikelihood instead inherently down-weights the variable partsexactly by the (statistically) proper amount.
Additionally, lsq does not do a bona fide simultaneoussuperposition with more than two structures, whereas THESEUS does.
Cheers,

Douglas
Bernie Santarsiero

On Wed, October 11, 2006 10:38 am, Douglas L. Theobald wrote:
On Oct 11, 2006, at 11:08 AM, Jenny wrote:
Hi, All,
I have three proteins that only differ in one big loop(resi46-59).So I'm trying to superimpose three proteins and keepthe same part fixed.( basically, superimpose by residue 1-45and residue 60-120).Is there easy way to do this?
Hi Jenny,

It's easy to do with THESEUS from the command line:

http://www.theseus3d.org/

For a least squares fit, just use something like:

theseus -l -s1-45:60-120 protein1.pdb protein2.pdb protein3.pdb

or equivalently, exclude the range with:

theseus -l -S46-59 protein1.pdb protein2.pdb protein3.pdb
However, since THESEUS uses maximum likelihood you shouldn'teven have to specify a residue range, just do:
theseus protein1.pdb protein2.pdb protein3.pdb

and it should work well.


^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`
Douglas L. Theobald
Department of Biochemistry
Brandeis University
Waltham, MA  02454-9110

[EMAIL PROTECTED]
GPG key ID: 38E9EB53
https://www.molevo.org/keys/38E9EB53.gpgkey

             ^\
   /`  /^.  / /\
  / / /`/  / . /`
 / /  '   '
'

Re: [ccp4bb]: superimpose

Reply via email to