Re: [ccp4bb] Rfree in similar data set

Tom Terwilliger Thu, 24 Sep 2009 08:58:49 -0700

Hi Ian,

Surely you are correct that "...once all issues of local optima areresolved, by whatever means it takes, you will end up at the sameunique global optimum no matter where you started from." However thekey here is "by whatever means it takes". I think that in practicethere are a vast number of local minima in this problem. You canrebuild a model from the PDB that is highly refined and find manyother models that have R-factors that are the same or better, and allcan be refined to a stable "minimum". All of course are very similarand differ principally in side-chain conformations and small mainchain differences. I think that means it is very difficult to findthe global minimum.

In practice, relative to the Rfree set discussion that started this, Ithink this also means that once an Rfree set is chosen and a model hasbeen refined using that Rfree set, the Rfree set should be kept.


All the best,
Tom T

On Sep 24, 2009, at 9:41 AM, Ian Tickle wrote:

-----Original Message-----
From: owner-ccp...@jiscmail.ac.uk [mailto:owner-ccp...@jiscmail.ac.uk]
On
Behalf Of Eric Bennett
Sent: 24 September 2009 13:31
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Rfree in similar data set

Ian Tickle wrote:
For that to
be true it would have to be possible to arrive at a different
unbiased
Rfree from another starting point.  But provided your starting point
wasn't a local maximum LL and you haven't gotten into a localmaximum
along the way, convergence will be to a unique global maximum of the
LL,
so the Rfree must be the same whatever starting point is used(within
the radius of convergence of course).
But if you're using a different set of data the minima and maxima of
the function aren't necessarily going to be in the same place.  Rfree
is supposed to inform about overfitting.  In an overfitting situation
there are multiple possible models which describe the data well and
which overfit solution you end up with could be sensitive to the data
set used.  The provisions that you haven't gotten stuck in a local
maximum and are within radius of convergence don't seem safe
considering historical situations that led to the introduction of
Rfree.  What algorithm is going to converge main chain tracing errors
to the correct maximum?  Thinking about that situation, isn't part of
the goal of Rfree to give you a hint in situations where you have, in
fact, gotten stuck in a local maximum due to a significant error in
the model that places it outside the radius of convergence of the
refinement algorithm?
Hi Eric,

Yes clearly the function optima won't necessarily be in the same place
for different datasets; the question is whether the distance betweentheoptima is less than the convergence radius. This will dependlargely onwhether the datasets have similar dmin; if they do then thedifferences
will be largely random measurement errors (I'm assuming that there's
nothing fundamentally wrong with the data).  Then there should be no
problem re-refining against the 2nd dataset, and the Rfree will be
unbiased at the global optimum.  The more common situation perhaps is
that the 2nd dataset is at much higher resolution; in that case it's
quite likely that there are undetected local optima in the model from
the 1st dataset that only become apparent in the maps when the 2nd
dataset is used.  In that case refinement is almost certainly not the
answer (or at least not the whole answer), you're going to have to go
back to the maps and model building.

On the question of overfitting, again any problems of local optima
(possibly indicated by a higher than expected Rfree as you say) haveto
be resolved first for each of your candidate parameterizations of the
model, as best as the data will allow.  Then if you find that Rfree at
convergence is higher (or LLfree lower) for one parameterization than
another, you choose the parameterization with the lower Rfree (higher
LLfree) to go forward.  You cannot safely reject a model as being
overfitted if the refinement generating the Rfree didn't converge, so
that the Rfree is unbiased.  I don't see the problem there (except of
course in choosing which parameterizations to try).
I think you misunderstood my provisos, I was only doing that tosimplify
the argument; if there are local optima then they have to be resolved,
most likely by means other than refinement, but their presence doesnot
affect the argument about Rfree bias.  My contention is that once all
issues of local optima are resolved, by whatever means it takes, you
will end up at the same unique global optimum no matter where you
started from (unless of course you're very unlucky and there are
multiple global optima with identical likelihoods but I think we can
discount that as unlikely!), and therefore Rfree must be unbiased at
that point.  At intermediate points in this process (i.e. on the paths
connecting optima), Rfree has no meaning or indeed usefulness and
therefore the question whether it's biased or not is also meaningless.

Cheers

-- Ian


Disclaimer
This communication is confidential and may contain privilegedinformation intended solely for the named addressee(s). It may notbe used or disclosed except for the purpose for which it has beensent. If you are not the intended recipient you must not review,use, disclose, copy, distribute or take any action in reliance uponit. If you have received this communication in error, please notifyAstex Therapeutics Ltd by emailing i.tic...@astex-therapeutics.comand destroy all copies of the message and any attached documents.Astex Therapeutics Ltd monitors, controls and protects all itsmessaging traffic in compliance with its corporate email policy. TheCompany accepts no liability or responsibility for any onwardtransmission or use of emails and attachments having left the AstexTherapeutics domain. Unless expressly stated, opinions in thismessage are those of the individual sender and not of AstexTherapeutics Ltd. The recipient should check this email and anyattachments for the presence of computer viruses. Astex TherapeuticsLtd accepts no liability for damage caused by any virus transmittedby this email. E-mail is susceptible to data corruption,interception, unauthorized amendment, and tampering, AstexTherapeutics Ltd only send and receive e-mails on the basis that theCompany is not liable for any such alteration or any consequencesthereof.Astex Therapeutics Ltd., Registered in England at 436 CambridgeScience Park, Cambridge CB4 0QA under number 3751674



Thomas C. Terwilliger
Mail Stop M888
Los Alamos National Laboratory
Los Alamos, NM 87545

Tel:  505-667-0072                 email: terwilli...@lanl.gov
Fax: 505-665-3024                 SOLVE web site: http://solve.lanl.gov
PHENIX web site: http:www.phenix-online.org
ISFI Integrated Center for Structure and Function Innovation web site: 
http://techcenter.mbi.ucla.edu
TB Structural Genomics Consortium web site: http://www.doe-mbi.ucla.edu/TB
CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss

Re: [ccp4bb] Rfree in similar data set

Reply via email to