refinement programs

marc . schiltz Mon, 06 Oct 2008 13:18:28 -0700

Hi Ian,

I did not follow up our recent debate about the respective merits ofvarious "truncate" procedures, in particular the comparison of theFrench & Wilson (1978) and Sivia & David (1994) methods. Since youhave extended the previous simulations, I feel that the discussionshould be continued.

I can still not understand why you are so focused on the shellaverages (i.e. intensity averages over many reflections in intensityor resolution shells). This is apparently the criterion that you useto claim that the FW method is superior to all others.

The truncate method (whichever prior is used) is a procedure toestimate amplitudes for individual reflections. This is the goal ofthe whole procedure. So we have an estimator E(Im,Sm) for the trueintensity J (or amplitude sqrt(J)) of one reflection, given its"measured" intensity Im and sigma Sm. According to the usualstatistical definition of bias, this estimator is unbiased if itsexpectation value (NOT its average aver many different reflections) isequal to the true value J (or sqrt(J)), for all values of J. We seemto agree that neither the estimator proposed by F&W, nor the oneproposed by S&D are unbiased, and I gave a simple example in anearlier mail : for J=0, both estimators will return an estimate E > 0,whatever the measured data (Im,Sm) are. They are thus biased.

Now, you seem to be be highly preoccupied by the intensity averagesover many different reflections in a resolution/intensity shell. Ofcourse, both the F&W and the S&D methods also return an estimate forsuch shell averages, and your simulations indicate that the F&Westimates for these shell averages are unbiased. But the quantitiesthat are used in all subsequent computations are individual reflectionintensities/amplitudes. The shell averages are not used in anyimportant crystallographic computation (apart maybe the Wilson plot).So what matters really is to get good estimates for individualreflection intensities/amplitudes. Shell averages are not used incrystallographic refinements: we refine against individual reflections! They are not used in phasing: we phase individual reflections. Sowhat matters is the bias on individual reflections, not on shellaverages. I think that you can not simply claim that, because the F&Wmethod returns unbiased shell averages, it is necessarily superior toother methods.


In that sense, your previous statement that

the average bias of <J> is the same as the bias of the average <J>may be formally correct, but I doubt that it is of any relevance.Because even if the average bias is zero, this does not mean that theestimator is unbiased. Otherwise, I would suggest a truncate procedurewhere the intensities of all reflections are simply set equal to theirshell averages. Clearly, this would yield unbiased estimates for theshell averages (and would therefore be a "good" procedure according toyour criteria), but the estimates for individual reflectionintensities would be highly biased.

Also, your simulations are flawed by the fact that you assume youexactly know S. However, this is not the case in reality. In the F&Wprocedure, S is estimated in shells of resolution from the measuredintensities. This turns S into a random variable and you will neverhave S=0 exactly. If you now imagine the case of data collected andintegrated when there is no diffraction at all, you would get somerandom number for S in each shell. Half of the time it would benegative and so you could not even apply the F&W procedure. I havetested this: TRUNCATE (with the option truncate yes) fails on such data.

More importantly, to decide whether your crystal is diffracting up toa certain resolution (or whether there is any diffraction at all), noone would look at the average I or average F. This is not a meaningfulquantity because the I's or F's are not on an absolute scale andbecause, in any event, they would have to be gauged against a measureof their uncertainty. Therefore, what really matters is the averageI/sigI. This is the sensible criterion to decide whether you have asignal (or crystal) or not. The average I is of little use for thispurpose. Now, if there is no crystal at all, the S&D procedure returnsaverage I/sigI (corrected I's) of 1 and I believe that mostcrystallographers would then conclude that the data are at the noiselevel (i.e. there is no measurable diffraction). No one would look atthe average I. So, contrary to what you wrote earlier, the S&D resultsare perfectly acceptable, even in the case when there is nodiffraction (or crystal) at all.

Regarding your simulations, I am much more concerned about the columnswhich you label by "rmsE". You state that this is the RMS error foreach average. I presume (but I am not sure; please provide a formula)that this is the sample second moment about the sample mean (i.e. thesample variance); the sample being all reflections in a given shell ?If so, then the F&W procedure appears to be somewhat problematic forweak data. It seems that the F&W procedure completely distorts thesecond moment about the mean of the intensity distributions. For weakdata it is close to 0, whereas the true second moment about the meanof the distribution is 1.0. Now, I would find this much more worrisomethan a slightly distorted shell average.

So I am not yet convinced that your simulations demonstrate in any waythe superiority of the FW method over other methods.


Cheers,

Marc

Ian J. Tickle wrote:

<[EMAIL PROTECTED]>[1]<[EMAIL PROTECTED]>[2] <[EMAIL PROTECTED]>[3] From:"Ian Tickle" <[EMAIL PROTECTED]>[4] To: "Frank vonDelft" <[EMAIL PROTECTED]>[5], "Pavel Afonine"<[EMAIL PROTECTED]>[6] Cc: <CCP4BB@JISCMAIL.AC.UK>[7] Return-Path:[EMAIL PROTECTED] X-OriginalArrivalTime: 03 Oct 200816:38:18.0609 (UTC) FILETIME=[70D42610:01C92576]I agree completelywith Frank, IMO this is not something you should be doing,particularly as the likelihood function for intensities can handlenegative & zero intensities perfectly well (Randy assures me). Out ofinterest I've updated my simulation where I calculate the averageintensity after correction by the various methods in use, to includethe case where you simply drop I <= 0. I didn't include this casebefore because I didn't think anyone would be using it!Here:S is theWilson distribution parameter (true intensity), assuming an acentricdistribution; Iav is the average uncorrected (raw) intensity; I'av isthe average intensity after thresholding at zero (i.e. I' = max(I,0)); I"av is the new case, the average ignoring I <= 0; <Ja>av is theaverage of the Bayesian estimate assuming a uniform distribution ofthe true intensity as the prior (Sivia & David); <Jb>av is the averageof the Bayesian estimate assuming an acentric Wilson distribution ofthe true intensity as the prior (French & Wilson a la TRUNCATE); rmsEare the respective RMS errors (RMS difference between the respective'corrected' intensity and the true intensity).sigma(I) = 1throughout.S Iav rmsE I'av rmsE I"av rmsE <Ja>av rmsE<Jb>av rmsE0.0 0.00 1.00 0.40 0.71 0.80 1.00 0.901.00 0.00 0.00 0.5 0.50 1.01 0.74 0.78 1.06 0.921.17 0.89 0.51 0.43 1.0 1.00 1.00 1.16 0.84 1.40 0.901.50 0.86 1.00 0.64 1.5 1.50 0.99 1.62 0.87 1.800.91 1.90 0.86 1.49 0.74 2.0 2.00 1.00 2.10 0.902.25 0.93 2.34 0.88 2.00 0.80 2.5 2.50 1.00 2.58 0.912.71 0.94 2.79 0.89 2.50 0.83 3.0 3.00 1.00 3.070.93 3.19 0.95 3.26 0.91 3.00 0.86 3.5 3.50 1.003.56 0.93 3.66 0.95 3.72 0.91 3.50 0.88 4.0 4.00 0.994.05 0.94 4.13 0.95 4.20 0.91 4.00 0.89 4.5 4.500.99 4.55 0.94 4.62 0.95 4.68 0.92 4.50 0.90 5.05.00 1.00 5.04 0.95 5.08 0.96 5.17 0.93 5.00 0.91Itcan be seen that the new case (I"av) is worse than using all theuncorrected intensities (Iav) in terms of average bias (differencebetween average corrected I and true average S), and only marginallybetter in terms of rmsE, and is significantly worse than including thenegative intensities as zero (I'av) on both counts, particularly forlow average intensity (<= 1 sigma). The Bayesian-correctedintensities are not needed in practice for refinement (but may bebetter for other purposes such as twinning tests) because thelikelihood function can handle the uncorrected negative & zerointensities.Cheers-- Ian

-----Original Message----- From: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED] On Behalf Of Frank vonDelft Sent: 03 October 2008 10:41 To: Pavel Afonine Cc:[EMAIL PROTECTED] Subject: Re: [ccp4bb] Reading the oldliterature / truncate / refinement programs I mentionedpreviously phenix.refine tosses your weak data


  if IMEAN,

   SIGIMEAN are chosen during refinement.

phenix.refine does not automatically remove the data based on sigma(it does it by user's request only). phenix.refine removes onlynegative or zero values for Iobs (Fobs).

That is in fact the same as removing based on sigma: standardpractice has been for some time that no data is removed, ever.At therefinement stage, that is. Of course, we do remove data at merging,for various reasons which probably also need investigating (e.g. "highres cutoff" = truncation; cf .Free Lunch).phx.

Disclaimer This communication is confidential and may containprivileged information intended solely for the named addressee(s). Itmay not be used or disclosed except for the purpose for which it hasbeen sent. If you are not the intended recipient you must not review,use, disclose, copy, distribute or take any action in reliance uponit. If you have received this communication in error, please notifyAstex Therapeutics Ltd by emailing [EMAIL PROTECTED]and destroy all copies of the message and any attached documents.Astex Therapeutics Ltd monitors, controls and protects all itsmessaging traffic in compliance with its corporate email policy. TheCompany accepts no liability or responsibility for any onwardtransmission or use of emails and attachments having left the AstexTherapeutics domain. Unless expressly stated, opinions in thismessage are those of the individual sender and not of AstexTherapeutics Ltd. The recipient should check this email and anyattachments for the presence of computer viruses. Astex TherapeuticsLtd accepts no liability for damage caused by any virus transmitted bythis email. E-mail is susceptible to data corruption, interception,unauthorized amendment, and tampering, Astex Therapeutics Ltd onlysend and receive e-mails on the basis that the Company is not liablefor any such alteration or any consequences thereof. AstexTherapeutics Ltd., Registered in England at 436 Cambridge SciencePark, Cambridge CB4 0QA under number 3751674


  -- Marc SCHILTZ      http://lcr.epfl.ch[13]

Links:
------

[1]mailto:[EMAIL PROTECTED]

[2] mailto:[EMAIL PROTECTED]
[3] mailto:[EMAIL PROTECTED]
[4] mailto:[EMAIL PROTECTED]
[5] mailto:[EMAIL PROTECTED]
[6] mailto:[EMAIL PROTECTED]
[7] mailto:CCP4BB@JISCMAIL.AC.UK
[8] mailto:[EMAIL PROTECTED]
[9] mailto:[EMAIL PROTECTED]
[10] mailto:[EMAIL PROTECTED]
[11] mailto:CCP4BB@JISCMAIL.AC.UK
[12] mailto:[EMAIL PROTECTED]
[13] http://lcr.epfl.ch

Hi Ian,

I did not follow up our recent discussion about the respective merits of 
various "truncate" procedures, in particular the comparison of the French & 
Wilson (1978) and Sivia & David (1994) methods. Following your mail to the BB 
yesterday, which extends the previous simulations, I feel that I should 
relaunch the debate.

My main objection is that I can still not understand why you are so focused on 
the shell averages (i.e. intensity averages over many reflections in resolution 
shells). This is apparently the criterion that you use to claim that the FW 
method is superior to all others.

The truncate method (whichever prior is used) is a procedure to estimate 
amplitudes for individual reflections. This is the goal of the whole procedure. 
So we have an estimator E(Im,Sm) for the true intensity J (or amplitude 
sqrt(J)) of one reflection, given its "measured" intensity Im and sigma Sm. 
According to the usual statistical definition of bias, this estimator is 
unbiased if its expectation value (NOT its average aver many different 
reflections) is equal to the true value J (or sqrt(J)), for all values of J. We 
seem to agree that neither the estimator proposed by F&W, nor the one proposed 
by S&D are unbiased, and I gave a simple example in an earlier mail : for J=0, 
both estimators will return an estimate greater than 0, whatever the measured 
data are. They are thus biased.

Now, you seem to be be highly preoccupied by the reflection averages in 
resolution shells. Why ? The quantities that are used in all subsequent 
computations are individual reflection intensities/amplitudes. The shell 
averages are not used in any important crystallographic computation (apart 
maybe the Wilson plot). So what matters really is to get good estimates for 
individual reflection intensities/amplitudes.

Of course, both the F&W and the S&D methods also return an estimate for the 
shell averages, and your simulations seem to show that the F&W estimates for 
these shell averages are unbiased. But again,  shell averages are not used in 
any crystallographic refinement: we refine against individual reflections ! 
They are not used in phasing: we phase individual reflections. So what matters 
is the bias on individual reflections, not on shell averages. I think that you 
can not simply claim that, because the F&W method returns unbiased shell 
averages, it is necessarily superior to the S&D method.

In that sense, your previous statement that

the average bias of <J> is the same as the bias of the average <J>

may be formally correct, but is completely irrelevant. Because even if the 
average bias is zero, this does not mean that the estimator is unbiased. 
Otherwise, I would suggest a truncate procedure where the intensities of all 
reflections are simply set equal to their shell averages. Clearly, this would 
yield unbiased estimates for the shell averages, but the estimates for 
individual reflection intensities would be highly biased.

Also, your simulations are flawed by the fact that you assume you exactly know 
S. However, this is not the case in reality. In the F&W procedure, S is 
estimated in shells of resolution from the measured intensities. This turns S 
into a random variable and you will never have S=0 exactly. If you now imagine 
the case of data collected and integrated when there is no diffraction at all, 
you would get some random number for S in each shell. Half of the time it would 
be negative and so you could not even apply the F&W procedure. I have tested 
this: TRUNCATE (with the option truncate yes) fails on such data.

More importantly, to decide whether your crystal is diffracting up to a certain 
resolution (or whether there is any diffraction at all), no one would look at 
the average I or average F. This is not a meaningful quantity because the I's 
or F's are not on an absolute scale and because they will have to be gauged 
against a measure of their uncertainty. Therefore, what really matters is the 
average I/sigI. This is the sensible criterion to decide whether you have a 
signal (or crystal) or not. The average I is of little use for this purpose. 
Now, if there is no crystal at all, the S&D procedure returns average I/sigI 
(corrected I's) of 1 and I believe that most crystallographers would then 
conclude that the data are at the noise level (i.e. there is no measurable 
diffraction). No one would look at the average I. So, contrary to what you 
write, the S&D results are perfectly acceptable, even in the case when there is 
no diffraction (or crystal) at all.

Regarding your simulations, I am much more concerned about the columns which 
you label by "rmsE". You state that this is the RMS error for each average. I 
presume (but I am not sure; please provide a formula) that this is the sample 
second moment about the sample mean (i.e. the sample variance); the sample 
being all reflections in a given bin ? If so, then the F&W procedure is highly 
problematic for weak data. It seems that the F&W procedure completely distorts 
the second moment about the mean of the intensity distributions. For weak data 
it is close to 0, whereas the true second moment about the mean of the 
distribution is 1.0. Now, I would find this much more problematic than a 
slightly distorted shell average.


So I am not yet convinced that your simulations demonstrate in any way the 
superiority of the FW method over other methods.

Cheers,

Marc



Ian J. Tickle wrote:

    <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
               <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> <[EMAIL 
PROTECTED]><mailto:[EMAIL PROTECTED]>
From: "Ian Tickle" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
To: "Frank von Delft" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>, "Pavel 
Afonine"
    <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
Cc: <CCP4BB@JISCMAIL.AC.UK><mailto:CCP4BB@JISCMAIL.AC.UK>
Return-Path: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
X-OriginalArrivalTime: 03 Oct 2008 16:38:18.0609 (UTC)
    FILETIME=[70D42610:01C92576]


I agree completely with Frank, IMO this is not something you should be
doing, particularly as the likelihood function for intensities can
handle negative & zero intensities perfectly well (Randy assures me).
Out of interest I've updated my simulation where I calculate the average
intensity after correction by the various methods in use, to include the
case where you simply drop I <= 0.  I didn't include this case before
because I didn't think anyone would be using it!

Here:

S is the Wilson distribution parameter (true intensity), assuming an
acentric distribution;
Iav is the average uncorrected (raw) intensity;
I'av is the average intensity after thresholding at zero (i.e. I' =
max(I,0) );
I"av is the new case, the average ignoring I <= 0;
<Ja>av is the average of the Bayesian estimate assuming a uniform
distribution of the true intensity as the prior (Sivia & David);
<Jb>av is the average of the Bayesian estimate assuming an acentric
Wilson distribution of the true intensity as the prior (French & Wilson
a la TRUNCATE);
rmsE are the respective RMS errors (RMS difference between the
respective 'corrected' intensity and the true intensity).

sigma(I) = 1 throughout.

 S     Iav   rmsE    I'av  rmsE    I"av  rmsE   <Ja>av rmsE   <Jb>av
rmsE

0.0    0.00  1.00    0.40  0.71    0.80  1.00    0.90  1.00    0.00
0.00
0.5    0.50  1.01    0.74  0.78    1.06  0.92    1.17  0.89    0.51
0.43
1.0    1.00  1.00    1.16  0.84    1.40  0.90    1.50  0.86    1.00
0.64
1.5    1.50  0.99    1.62  0.87    1.80  0.91    1.90  0.86    1.49
0.74
2.0    2.00  1.00    2.10  0.90    2.25  0.93    2.34  0.88    2.00
0.80
2.5    2.50  1.00    2.58  0.91    2.71  0.94    2.79  0.89    2.50
0.83
3.0    3.00  1.00    3.07  0.93    3.19  0.95    3.26  0.91    3.00
0.86
3.5    3.50  1.00    3.56  0.93    3.66  0.95    3.72  0.91    3.50
0.88
4.0    4.00  0.99    4.05  0.94    4.13  0.95    4.20  0.91    4.00
0.89
4.5    4.50  0.99    4.55  0.94    4.62  0.95    4.68  0.92    4.50
0.90
5.0    5.00  1.00    5.04  0.95    5.08  0.96    5.17  0.93    5.00
0.91

It can be seen that the new case (I"av) is worse than using all the
uncorrected intensities (Iav) in terms of average bias (difference
between average corrected I and true average S), and only marginally
better in terms of rmsE, and is significantly worse than including the
negative intensities as zero (I'av) on both counts, particularly for low
average intensity (<= 1 sigma).  The Bayesian-corrected intensities are
not needed in practice for refinement (but may be better for other
purposes such as twinning tests) because the likelihood function can
handle the uncorrected negative & zero intensities.

Cheers

-- Ian



-----Original Message-----
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
[mailto:[EMAIL PROTECTED] On Behalf Of Frank von Delft
Sent: 03 October 2008 10:41
To: Pavel Afonine
Cc: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] Reading the old literature / truncate /
refinement programs




I mentioned previously phenix.refine tosses your weak data


if IMEAN,


SIGIMEAN are chosen during refinement.



phenix.refine does not automatically remove the data based on sigma
(it does it by user's request only). phenix.refine removes only
negative or zero values for Iobs (Fobs).


That is in fact the same as removing based on sigma:
standard practice
has been for some time that no data is removed, ever.

At the refinement stage, that is.  Of course, we do remove data at
merging, for various reasons which probably also need investigating
(e.g. "high res cutoff" = truncation; cf .Free Lunch).

phx.






Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL 
PROTECTED]<mailto:[EMAIL PROTECTED]> and destroy all copies of the message and 
any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674




--
Marc SCHILTZ      http://lcr.epfl.ch

Re: [ccp4bb] Reading the old literature / truncate / refinement programs

Reply via email to