Hi Ian,

I did not follow up our recent debate about the respective merits of various "truncate" procedures, in particular the comparison of the French & Wilson (1978) and Sivia & David (1994) methods. Since you have extended the previous simulations, I feel that the discussion should be continued.

I can still not understand why you are so focused on the shell averages (i.e. intensity averages over many reflections in intensity or resolution shells). This is apparently the criterion that you use to claim that the FW method is superior to all others.

The truncate method (whichever prior is used) is a procedure to estimate amplitudes for individual reflections. This is the goal of the whole procedure. So we have an estimator E(Im,Sm) for the true intensity J (or amplitude sqrt(J)) of one reflection, given its "measured" intensity Im and sigma Sm. According to the usual statistical definition of bias, this estimator is unbiased if its expectation value (NOT its average aver many different reflections) is equal to the true value J (or sqrt(J)), for all values of J. We seem to agree that neither the estimator proposed by F&W, nor the one proposed by S&D are unbiased, and I gave a simple example in an earlier mail : for J=0, both estimators will return an estimate E > 0, whatever the measured data (Im,Sm) are. They are thus biased.

Now, you seem to be be highly preoccupied by the intensity averages over many different reflections in a resolution/intensity shell. Of course, both the F&W and the S&D methods also return an estimate for such shell averages, and your simulations indicate that the F&W estimates for these shell averages are unbiased. But the quantities that are used in all subsequent computations are individual reflection intensities/amplitudes. The shell averages are not used in any important crystallographic computation (apart maybe the Wilson plot). So what matters really is to get good estimates for individual reflection intensities/amplitudes. Shell averages are not used in crystallographic refinements: we refine against individual reflections ! They are not used in phasing: we phase individual reflections. So what matters is the bias on individual reflections, not on shell averages. I think that you can not simply claim that, because the F&W method returns unbiased shell averages, it is necessarily superior to other methods.

In that sense, your previous statement that

the average bias of <J> is the same as the bias of the average <J> may be formally correct, but I doubt that it is of any relevance. Because even if the average bias is zero, this does not mean that the estimator is unbiased. Otherwise, I would suggest a truncate procedure where the intensities of all reflections are simply set equal to their shell averages. Clearly, this would yield unbiased estimates for the shell averages (and would therefore be a "good" procedure according to your criteria), but the estimates for individual reflection intensities would be highly biased.

Also, your simulations are flawed by the fact that you assume you exactly know S. However, this is not the case in reality. In the F&W procedure, S is estimated in shells of resolution from the measured intensities. This turns S into a random variable and you will never have S=0 exactly. If you now imagine the case of data collected and integrated when there is no diffraction at all, you would get some random number for S in each shell. Half of the time it would be negative and so you could not even apply the F&W procedure. I have tested this: TRUNCATE (with the option truncate yes) fails on such data.

More importantly, to decide whether your crystal is diffracting up to a certain resolution (or whether there is any diffraction at all), no one would look at the average I or average F. This is not a meaningful quantity because the I's or F's are not on an absolute scale and because, in any event, they would have to be gauged against a measure of their uncertainty. Therefore, what really matters is the average I/sigI. This is the sensible criterion to decide whether you have a signal (or crystal) or not. The average I is of little use for this purpose. Now, if there is no crystal at all, the S&D procedure returns average I/sigI (corrected I's) of 1 and I believe that most crystallographers would then conclude that the data are at the noise level (i.e. there is no measurable diffraction). No one would look at the average I. So, contrary to what you wrote earlier, the S&D results are perfectly acceptable, even in the case when there is no diffraction (or crystal) at all.

Regarding your simulations, I am much more concerned about the columns which you label by "rmsE". You state that this is the RMS error for each average. I presume (but I am not sure; please provide a formula) that this is the sample second moment about the sample mean (i.e. the sample variance); the sample being all reflections in a given shell ? If so, then the F&W procedure appears to be somewhat problematic for weak data. It seems that the F&W procedure completely distorts the second moment about the mean of the intensity distributions. For weak data it is close to 0, whereas the true second moment about the mean of the distribution is 1.0. Now, I would find this much more worrisome than a slightly distorted shell average.

So I am not yet convinced that your simulations demonstrate in any way the superiority of the FW method over other methods.

Cheers,

Marc

Ian J. Tickle wrote:

<[EMAIL PROTECTED]>[1] <[EMAIL PROTECTED]>[2] <[EMAIL PROTECTED]>[3] From: "Ian Tickle" <[EMAIL PROTECTED]>[4] To: "Frank von Delft" <[EMAIL PROTECTED]>[5], "Pavel Afonine" <[EMAIL PROTECTED]>[6] Cc: <CCP4BB@JISCMAIL.AC.UK>[7] Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 03 Oct 2008 16:38:18.0609 (UTC) FILETIME=[70D42610:01C92576]I agree completely with Frank, IMO this is not something you should be doing, particularly as the likelihood function for intensities can handle negative & zero intensities perfectly well (Randy assures me). Out of interest I've updated my simulation where I calculate the average intensity after correction by the various methods in use, to include the case where you simply drop I <= 0. I didn't include this case before because I didn't think anyone would be using it!Here:S is the Wilson distribution parameter (true intensity), assuming an acentric distribution; Iav is the average uncorrected (raw) intensity; I'av is the average intensity after thresholding at zero (i.e. I' = max(I,0) ); I"av is the new case, the average ignoring I <= 0; <Ja>av is the average of the Bayesian estimate assuming a uniform distribution of the true intensity as the prior (Sivia & David); <Jb>av is the average of the Bayesian estimate assuming an acentric Wilson distribution of the true intensity as the prior (French & Wilson a la TRUNCATE); rmsE are the respective RMS errors (RMS difference between the respective 'corrected' intensity and the true intensity).sigma(I) = 1 throughout.S Iav rmsE I'av rmsE I"av rmsE <Ja>av rmsE <Jb>av rmsE0.0 0.00 1.00 0.40 0.71 0.80 1.00 0.90 1.00 0.00 0.00 0.5 0.50 1.01 0.74 0.78 1.06 0.92 1.17 0.89 0.51 0.43 1.0 1.00 1.00 1.16 0.84 1.40 0.90 1.50 0.86 1.00 0.64 1.5 1.50 0.99 1.62 0.87 1.80 0.91 1.90 0.86 1.49 0.74 2.0 2.00 1.00 2.10 0.90 2.25 0.93 2.34 0.88 2.00 0.80 2.5 2.50 1.00 2.58 0.91 2.71 0.94 2.79 0.89 2.50 0.83 3.0 3.00 1.00 3.07 0.93 3.19 0.95 3.26 0.91 3.00 0.86 3.5 3.50 1.00 3.56 0.93 3.66 0.95 3.72 0.91 3.50 0.88 4.0 4.00 0.99 4.05 0.94 4.13 0.95 4.20 0.91 4.00 0.89 4.5 4.50 0.99 4.55 0.94 4.62 0.95 4.68 0.92 4.50 0.90 5.0 5.00 1.00 5.04 0.95 5.08 0.96 5.17 0.93 5.00 0.91It can be seen that the new case (I"av) is worse than using all the uncorrected intensities (Iav) in terms of average bias (difference between average corrected I and true average S), and only marginally better in terms of rmsE, and is significantly worse than including the negative intensities as zero (I'av) on both counts, particularly for low average intensity (<= 1 sigma). The Bayesian-corrected intensities are not needed in practice for refinement (but may be better for other purposes such as twinning tests) because the likelihood function can handle the uncorrected negative & zero intensities.Cheers-- Ian

-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank von Delft Sent: 03 October 2008 10:41 To: Pavel Afonine Cc: [EMAIL PROTECTED] Subject: Re: [ccp4bb] Reading the old literature / truncate / refinement programs I mentioned previously phenix.refine tosses your weak data

  if IMEAN,

   SIGIMEAN are chosen during refinement.

phenix.refine does not automatically remove the data based on sigma (it does it by user's request only). phenix.refine removes only negative or zero values for Iobs (Fobs).

That is in fact the same as removing based on sigma: standard practice has been for some time that no data is removed, ever.At the refinement stage, that is. Of course, we do remove data at merging, for various reasons which probably also need investigating (e.g. "high res cutoff" = truncation; cf .Free Lunch).phx.

Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674

  -- Marc SCHILTZ      http://lcr.epfl.ch[13]

Links:
------
[1] mailto:[EMAIL PROTECTED]
[2] mailto:[EMAIL PROTECTED]
[3] mailto:[EMAIL PROTECTED]
[4] mailto:[EMAIL PROTECTED]
[5] mailto:[EMAIL PROTECTED]
[6] mailto:[EMAIL PROTECTED]
[7] mailto:CCP4BB@JISCMAIL.AC.UK
[8] mailto:[EMAIL PROTECTED]
[9] mailto:[EMAIL PROTECTED]
[10] mailto:[EMAIL PROTECTED]
[11] mailto:CCP4BB@JISCMAIL.AC.UK
[12] mailto:[EMAIL PROTECTED]
[13] http://lcr.epfl.ch

Hi Ian,

I did not follow up our recent discussion about the respective merits of 
various "truncate" procedures, in particular the comparison of the French & 
Wilson (1978) and Sivia & David (1994) methods. Following your mail to the BB 
yesterday, which extends the previous simulations, I feel that I should 
relaunch the debate.

My main objection is that I can still not understand why you are so focused on 
the shell averages (i.e. intensity averages over many reflections in resolution 
shells). This is apparently the criterion that you use to claim that the FW 
method is superior to all others.

The truncate method (whichever prior is used) is a procedure to estimate 
amplitudes for individual reflections. This is the goal of the whole procedure. 
So we have an estimator E(Im,Sm) for the true intensity J (or amplitude 
sqrt(J)) of one reflection, given its "measured" intensity Im and sigma Sm. 
According to the usual statistical definition of bias, this estimator is 
unbiased if its expectation value (NOT its average aver many different 
reflections) is equal to the true value J (or sqrt(J)), for all values of J. We 
seem to agree that neither the estimator proposed by F&W, nor the one proposed 
by S&D are unbiased, and I gave a simple example in an earlier mail : for J=0, 
both estimators will return an estimate greater than 0, whatever the measured 
data are. They are thus biased.

Now, you seem to be be highly preoccupied by the reflection averages in 
resolution shells. Why ? The quantities that are used in all subsequent 
computations are individual reflection intensities/amplitudes. The shell 
averages are not used in any important crystallographic computation (apart 
maybe the Wilson plot). So what matters really is to get good estimates for 
individual reflection intensities/amplitudes.

Of course, both the F&W and the S&D methods also return an estimate for the 
shell averages, and your simulations seem to show that the F&W estimates for 
these shell averages are unbiased. But again,  shell averages are not used in 
any crystallographic refinement: we refine against individual reflections ! 
They are not used in phasing: we phase individual reflections. So what matters 
is the bias on individual reflections, not on shell averages. I think that you 
can not simply claim that, because the F&W method returns unbiased shell 
averages, it is necessarily superior to the S&D method.

In that sense, your previous statement that

the average bias of <J> is the same as the bias of the average <J>

may be formally correct, but is completely irrelevant. Because even if the 
average bias is zero, this does not mean that the estimator is unbiased. 
Otherwise, I would suggest a truncate procedure where the intensities of all 
reflections are simply set equal to their shell averages. Clearly, this would 
yield unbiased estimates for the shell averages, but the estimates for 
individual reflection intensities would be highly biased.

Also, your simulations are flawed by the fact that you assume you exactly know 
S. However, this is not the case in reality. In the F&W procedure, S is 
estimated in shells of resolution from the measured intensities. This turns S 
into a random variable and you will never have S=0 exactly. If you now imagine 
the case of data collected and integrated when there is no diffraction at all, 
you would get some random number for S in each shell. Half of the time it would 
be negative and so you could not even apply the F&W procedure. I have tested 
this: TRUNCATE (with the option truncate yes) fails on such data.

More importantly, to decide whether your crystal is diffracting up to a certain 
resolution (or whether there is any diffraction at all), no one would look at 
the average I or average F. This is not a meaningful quantity because the I's 
or F's are not on an absolute scale and because they will have to be gauged 
against a measure of their uncertainty. Therefore, what really matters is the 
average I/sigI. This is the sensible criterion to decide whether you have a 
signal (or crystal) or not. The average I is of little use for this purpose. 
Now, if there is no crystal at all, the S&D procedure returns average I/sigI 
(corrected I's) of 1 and I believe that most crystallographers would then 
conclude that the data are at the noise level (i.e. there is no measurable 
diffraction). No one would look at the average I. So, contrary to what you 
write, the S&D results are perfectly acceptable, even in the case when there is 
no diffraction (or crystal) at all.

Regarding your simulations, I am much more concerned about the columns which 
you label by "rmsE". You state that this is the RMS error for each average. I 
presume (but I am not sure; please provide a formula) that this is the sample 
second moment about the sample mean (i.e. the sample variance); the sample 
being all reflections in a given bin ? If so, then the F&W procedure is highly 
problematic for weak data. It seems that the F&W procedure completely distorts 
the second moment about the mean of the intensity distributions. For weak data 
it is close to 0, whereas the true second moment about the mean of the 
distribution is 1.0. Now, I would find this much more problematic than a 
slightly distorted shell average.


So I am not yet convinced that your simulations demonstrate in any way the 
superiority of the FW method over other methods.

Cheers,

Marc



Ian J. Tickle wrote:

    <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
               <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> <[EMAIL 
PROTECTED]><mailto:[EMAIL PROTECTED]>
From: "Ian Tickle" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
To: "Frank von Delft" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>, "Pavel 
Afonine"
    <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
Cc: <CCP4BB@JISCMAIL.AC.UK><mailto:CCP4BB@JISCMAIL.AC.UK>
Return-Path: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
X-OriginalArrivalTime: 03 Oct 2008 16:38:18.0609 (UTC)
    FILETIME=[70D42610:01C92576]


I agree completely with Frank, IMO this is not something you should be
doing, particularly as the likelihood function for intensities can
handle negative & zero intensities perfectly well (Randy assures me).
Out of interest I've updated my simulation where I calculate the average
intensity after correction by the various methods in use, to include the
case where you simply drop I <= 0.  I didn't include this case before
because I didn't think anyone would be using it!

Here:

S is the Wilson distribution parameter (true intensity), assuming an
acentric distribution;
Iav is the average uncorrected (raw) intensity;
I'av is the average intensity after thresholding at zero (i.e. I' =
max(I,0) );
I"av is the new case, the average ignoring I <= 0;
<Ja>av is the average of the Bayesian estimate assuming a uniform
distribution of the true intensity as the prior (Sivia & David);
<Jb>av is the average of the Bayesian estimate assuming an acentric
Wilson distribution of the true intensity as the prior (French & Wilson
a la TRUNCATE);
rmsE are the respective RMS errors (RMS difference between the
respective 'corrected' intensity and the true intensity).

sigma(I) = 1 throughout.

 S     Iav   rmsE    I'av  rmsE    I"av  rmsE   <Ja>av rmsE   <Jb>av
rmsE

0.0    0.00  1.00    0.40  0.71    0.80  1.00    0.90  1.00    0.00
0.00
0.5    0.50  1.01    0.74  0.78    1.06  0.92    1.17  0.89    0.51
0.43
1.0    1.00  1.00    1.16  0.84    1.40  0.90    1.50  0.86    1.00
0.64
1.5    1.50  0.99    1.62  0.87    1.80  0.91    1.90  0.86    1.49
0.74
2.0    2.00  1.00    2.10  0.90    2.25  0.93    2.34  0.88    2.00
0.80
2.5    2.50  1.00    2.58  0.91    2.71  0.94    2.79  0.89    2.50
0.83
3.0    3.00  1.00    3.07  0.93    3.19  0.95    3.26  0.91    3.00
0.86
3.5    3.50  1.00    3.56  0.93    3.66  0.95    3.72  0.91    3.50
0.88
4.0    4.00  0.99    4.05  0.94    4.13  0.95    4.20  0.91    4.00
0.89
4.5    4.50  0.99    4.55  0.94    4.62  0.95    4.68  0.92    4.50
0.90
5.0    5.00  1.00    5.04  0.95    5.08  0.96    5.17  0.93    5.00
0.91

It can be seen that the new case (I"av) is worse than using all the
uncorrected intensities (Iav) in terms of average bias (difference
between average corrected I and true average S), and only marginally
better in terms of rmsE, and is significantly worse than including the
negative intensities as zero (I'av) on both counts, particularly for low
average intensity (<= 1 sigma).  The Bayesian-corrected intensities are
not needed in practice for refinement (but may be better for other
purposes such as twinning tests) because the likelihood function can
handle the uncorrected negative & zero intensities.

Cheers

-- Ian



-----Original Message-----
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
[mailto:[EMAIL PROTECTED] On Behalf Of Frank von Delft
Sent: 03 October 2008 10:41
To: Pavel Afonine
Cc: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] Reading the old literature / truncate /
refinement programs




I mentioned previously phenix.refine tosses your weak data


if IMEAN,


SIGIMEAN are chosen during refinement.



phenix.refine does not automatically remove the data based on sigma
(it does it by user's request only). phenix.refine removes only
negative or zero values for Iobs (Fobs).


That is in fact the same as removing based on sigma:
standard practice
has been for some time that no data is removed, ever.

At the refinement stage, that is.  Of course, we do remove data at
merging, for various reasons which probably also need investigating
(e.g. "high res cutoff" = truncation; cf .Free Lunch).

phx.






Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL 
PROTECTED]<mailto:[EMAIL PROTECTED]> and destroy all copies of the message and 
any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674




--
Marc SCHILTZ      http://lcr.epfl.ch

Reply via email to