Hi Ian,
I did not follow up our recent debate about the respective merits of
various "truncate" procedures, in particular the comparison of the
French & Wilson (1978) and Sivia & David (1994) methods. Since you
have extended the previous simulations, I feel that the discussion
should be continued.
I can still not understand why you are so focused on the shell
averages (i.e. intensity averages over many reflections in intensity
or resolution shells). This is apparently the criterion that you use
to claim that the FW method is superior to all others.
The truncate method (whichever prior is used) is a procedure to
estimate amplitudes for individual reflections. This is the goal of
the whole procedure. So we have an estimator E(Im,Sm) for the true
intensity J (or amplitude sqrt(J)) of one reflection, given its
"measured" intensity Im and sigma Sm. According to the usual
statistical definition of bias, this estimator is unbiased if its
expectation value (NOT its average aver many different reflections) is
equal to the true value J (or sqrt(J)), for all values of J. We seem
to agree that neither the estimator proposed by F&W, nor the one
proposed by S&D are unbiased, and I gave a simple example in an
earlier mail : for J=0, both estimators will return an estimate E > 0,
whatever the measured data (Im,Sm) are. They are thus biased.
Now, you seem to be be highly preoccupied by the intensity averages
over many different reflections in a resolution/intensity shell. Of
course, both the F&W and the S&D methods also return an estimate for
such shell averages, and your simulations indicate that the F&W
estimates for these shell averages are unbiased. But the quantities
that are used in all subsequent computations are individual reflection
intensities/amplitudes. The shell averages are not used in any
important crystallographic computation (apart maybe the Wilson plot).
So what matters really is to get good estimates for individual
reflection intensities/amplitudes. Shell averages are not used in
crystallographic refinements: we refine against individual reflections
! They are not used in phasing: we phase individual reflections. So
what matters is the bias on individual reflections, not on shell
averages. I think that you can not simply claim that, because the F&W
method returns unbiased shell averages, it is necessarily superior to
other methods.
In that sense, your previous statement that
the average bias of <J> is the same as the bias of the average <J>
may be formally correct, but I doubt that it is of any relevance.
Because even if the average bias is zero, this does not mean that the
estimator is unbiased. Otherwise, I would suggest a truncate procedure
where the intensities of all reflections are simply set equal to their
shell averages. Clearly, this would yield unbiased estimates for the
shell averages (and would therefore be a "good" procedure according to
your criteria), but the estimates for individual reflection
intensities would be highly biased.
Also, your simulations are flawed by the fact that you assume you
exactly know S. However, this is not the case in reality. In the F&W
procedure, S is estimated in shells of resolution from the measured
intensities. This turns S into a random variable and you will never
have S=0 exactly. If you now imagine the case of data collected and
integrated when there is no diffraction at all, you would get some
random number for S in each shell. Half of the time it would be
negative and so you could not even apply the F&W procedure. I have
tested this: TRUNCATE (with the option truncate yes) fails on such data.
More importantly, to decide whether your crystal is diffracting up to
a certain resolution (or whether there is any diffraction at all), no
one would look at the average I or average F. This is not a meaningful
quantity because the I's or F's are not on an absolute scale and
because, in any event, they would have to be gauged against a measure
of their uncertainty. Therefore, what really matters is the average
I/sigI. This is the sensible criterion to decide whether you have a
signal (or crystal) or not. The average I is of little use for this
purpose. Now, if there is no crystal at all, the S&D procedure returns
average I/sigI (corrected I's) of 1 and I believe that most
crystallographers would then conclude that the data are at the noise
level (i.e. there is no measurable diffraction). No one would look at
the average I. So, contrary to what you wrote earlier, the S&D results
are perfectly acceptable, even in the case when there is no
diffraction (or crystal) at all.
Regarding your simulations, I am much more concerned about the columns
which you label by "rmsE". You state that this is the RMS error for
each average. I presume (but I am not sure; please provide a formula)
that this is the sample second moment about the sample mean (i.e. the
sample variance); the sample being all reflections in a given shell ?
If so, then the F&W procedure appears to be somewhat problematic for
weak data. It seems that the F&W procedure completely distorts the
second moment about the mean of the intensity distributions. For weak
data it is close to 0, whereas the true second moment about the mean
of the distribution is 1.0. Now, I would find this much more worrisome
than a slightly distorted shell average.
So I am not yet convinced that your simulations demonstrate in any way
the superiority of the FW method over other methods.
Cheers,
Marc
Ian J. Tickle wrote:
<[EMAIL PROTECTED]>[1]
<[EMAIL PROTECTED]>[2] <[EMAIL PROTECTED]>[3] From:
"Ian Tickle" <[EMAIL PROTECTED]>[4] To: "Frank von
Delft" <[EMAIL PROTECTED]>[5], "Pavel Afonine"
<[EMAIL PROTECTED]>[6] Cc: <CCP4BB@JISCMAIL.AC.UK>[7] Return-Path:
[EMAIL PROTECTED] X-OriginalArrivalTime: 03 Oct 2008
16:38:18.0609 (UTC) FILETIME=[70D42610:01C92576]I agree completely
with Frank, IMO this is not something you should be doing,
particularly as the likelihood function for intensities can handle
negative & zero intensities perfectly well (Randy assures me). Out of
interest I've updated my simulation where I calculate the average
intensity after correction by the various methods in use, to include
the case where you simply drop I <= 0. I didn't include this case
before because I didn't think anyone would be using it!Here:S is the
Wilson distribution parameter (true intensity), assuming an acentric
distribution; Iav is the average uncorrected (raw) intensity; I'av is
the average intensity after thresholding at zero (i.e. I' = max(I,0)
); I"av is the new case, the average ignoring I <= 0; <Ja>av is the
average of the Bayesian estimate assuming a uniform distribution of
the true intensity as the prior (Sivia & David); <Jb>av is the average
of the Bayesian estimate assuming an acentric Wilson distribution of
the true intensity as the prior (French & Wilson a la TRUNCATE); rmsE
are the respective RMS errors (RMS difference between the respective
'corrected' intensity and the true intensity).sigma(I) = 1
throughout.S Iav rmsE I'av rmsE I"av rmsE <Ja>av rmsE
<Jb>av rmsE0.0 0.00 1.00 0.40 0.71 0.80 1.00 0.90
1.00 0.00 0.00 0.5 0.50 1.01 0.74 0.78 1.06 0.92
1.17 0.89 0.51 0.43 1.0 1.00 1.00 1.16 0.84 1.40 0.90
1.50 0.86 1.00 0.64 1.5 1.50 0.99 1.62 0.87 1.80
0.91 1.90 0.86 1.49 0.74 2.0 2.00 1.00 2.10 0.90
2.25 0.93 2.34 0.88 2.00 0.80 2.5 2.50 1.00 2.58 0.91
2.71 0.94 2.79 0.89 2.50 0.83 3.0 3.00 1.00 3.07
0.93 3.19 0.95 3.26 0.91 3.00 0.86 3.5 3.50 1.00
3.56 0.93 3.66 0.95 3.72 0.91 3.50 0.88 4.0 4.00 0.99
4.05 0.94 4.13 0.95 4.20 0.91 4.00 0.89 4.5 4.50
0.99 4.55 0.94 4.62 0.95 4.68 0.92 4.50 0.90 5.0
5.00 1.00 5.04 0.95 5.08 0.96 5.17 0.93 5.00 0.91It
can be seen that the new case (I"av) is worse than using all the
uncorrected intensities (Iav) in terms of average bias (difference
between average corrected I and true average S), and only marginally
better in terms of rmsE, and is significantly worse than including the
negative intensities as zero (I'av) on both counts, particularly for
low average intensity (<= 1 sigma). The Bayesian-corrected
intensities are not needed in practice for refinement (but may be
better for other purposes such as twinning tests) because the
likelihood function can handle the uncorrected negative & zero
intensities.Cheers-- Ian
-----Original Message----- From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank von
Delft Sent: 03 October 2008 10:41 To: Pavel Afonine Cc:
[EMAIL PROTECTED] Subject: Re: [ccp4bb] Reading the old
literature / truncate / refinement programs I mentioned
previously phenix.refine tosses your weak data
if IMEAN,
SIGIMEAN are chosen during refinement.
phenix.refine does not automatically remove the data based on sigma
(it does it by user's request only). phenix.refine removes only
negative or zero values for Iobs (Fobs).
That is in fact the same as removing based on sigma: standard
practice has been for some time that no data is removed, ever.At the
refinement stage, that is. Of course, we do remove data at merging,
for various reasons which probably also need investigating (e.g. "high
res cutoff" = truncation; cf .Free Lunch).phx.
Disclaimer This communication is confidential and may contain
privileged information intended solely for the named addressee(s). It
may not be used or disclosed except for the purpose for which it has
been sent. If you are not the intended recipient you must not review,
use, disclose, copy, distribute or take any action in reliance upon
it. If you have received this communication in error, please notify
Astex Therapeutics Ltd by emailing [EMAIL PROTECTED]
and destroy all copies of the message and any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its
messaging traffic in compliance with its corporate email policy. The
Company accepts no liability or responsibility for any onward
transmission or use of emails and attachments having left the Astex
Therapeutics domain. Unless expressly stated, opinions in this
message are those of the individual sender and not of Astex
Therapeutics Ltd. The recipient should check this email and any
attachments for the presence of computer viruses. Astex Therapeutics
Ltd accepts no liability for damage caused by any virus transmitted by
this email. E-mail is susceptible to data corruption, interception,
unauthorized amendment, and tampering, Astex Therapeutics Ltd only
send and receive e-mails on the basis that the Company is not liable
for any such alteration or any consequences thereof. Astex
Therapeutics Ltd., Registered in England at 436 Cambridge Science
Park, Cambridge CB4 0QA under number 3751674
-- Marc SCHILTZ http://lcr.epfl.ch[13]
Links:
------
[1]
mailto:[EMAIL PROTECTED]
[2] mailto:[EMAIL PROTECTED]
[3] mailto:[EMAIL PROTECTED]
[4] mailto:[EMAIL PROTECTED]
[5] mailto:[EMAIL PROTECTED]
[6] mailto:[EMAIL PROTECTED]
[7] mailto:CCP4BB@JISCMAIL.AC.UK
[8] mailto:[EMAIL PROTECTED]
[9] mailto:[EMAIL PROTECTED]
[10] mailto:[EMAIL PROTECTED]
[11] mailto:CCP4BB@JISCMAIL.AC.UK
[12] mailto:[EMAIL PROTECTED]
[13] http://lcr.epfl.ch
Hi Ian,
I did not follow up our recent discussion about the respective merits of
various "truncate" procedures, in particular the comparison of the French &
Wilson (1978) and Sivia & David (1994) methods. Following your mail to the BB
yesterday, which extends the previous simulations, I feel that I should
relaunch the debate.
My main objection is that I can still not understand why you are so focused on
the shell averages (i.e. intensity averages over many reflections in resolution
shells). This is apparently the criterion that you use to claim that the FW
method is superior to all others.
The truncate method (whichever prior is used) is a procedure to estimate
amplitudes for individual reflections. This is the goal of the whole procedure.
So we have an estimator E(Im,Sm) for the true intensity J (or amplitude
sqrt(J)) of one reflection, given its "measured" intensity Im and sigma Sm.
According to the usual statistical definition of bias, this estimator is
unbiased if its expectation value (NOT its average aver many different
reflections) is equal to the true value J (or sqrt(J)), for all values of J. We
seem to agree that neither the estimator proposed by F&W, nor the one proposed
by S&D are unbiased, and I gave a simple example in an earlier mail : for J=0,
both estimators will return an estimate greater than 0, whatever the measured
data are. They are thus biased.
Now, you seem to be be highly preoccupied by the reflection averages in
resolution shells. Why ? The quantities that are used in all subsequent
computations are individual reflection intensities/amplitudes. The shell
averages are not used in any important crystallographic computation (apart
maybe the Wilson plot). So what matters really is to get good estimates for
individual reflection intensities/amplitudes.
Of course, both the F&W and the S&D methods also return an estimate for the
shell averages, and your simulations seem to show that the F&W estimates for
these shell averages are unbiased. But again, shell averages are not used in
any crystallographic refinement: we refine against individual reflections !
They are not used in phasing: we phase individual reflections. So what matters
is the bias on individual reflections, not on shell averages. I think that you
can not simply claim that, because the F&W method returns unbiased shell
averages, it is necessarily superior to the S&D method.
In that sense, your previous statement that
the average bias of <J> is the same as the bias of the average <J>
may be formally correct, but is completely irrelevant. Because even if the
average bias is zero, this does not mean that the estimator is unbiased.
Otherwise, I would suggest a truncate procedure where the intensities of all
reflections are simply set equal to their shell averages. Clearly, this would
yield unbiased estimates for the shell averages, but the estimates for
individual reflection intensities would be highly biased.
Also, your simulations are flawed by the fact that you assume you exactly know
S. However, this is not the case in reality. In the F&W procedure, S is
estimated in shells of resolution from the measured intensities. This turns S
into a random variable and you will never have S=0 exactly. If you now imagine
the case of data collected and integrated when there is no diffraction at all,
you would get some random number for S in each shell. Half of the time it would
be negative and so you could not even apply the F&W procedure. I have tested
this: TRUNCATE (with the option truncate yes) fails on such data.
More importantly, to decide whether your crystal is diffracting up to a certain
resolution (or whether there is any diffraction at all), no one would look at
the average I or average F. This is not a meaningful quantity because the I's
or F's are not on an absolute scale and because they will have to be gauged
against a measure of their uncertainty. Therefore, what really matters is the
average I/sigI. This is the sensible criterion to decide whether you have a
signal (or crystal) or not. The average I is of little use for this purpose.
Now, if there is no crystal at all, the S&D procedure returns average I/sigI
(corrected I's) of 1 and I believe that most crystallographers would then
conclude that the data are at the noise level (i.e. there is no measurable
diffraction). No one would look at the average I. So, contrary to what you
write, the S&D results are perfectly acceptable, even in the case when there is
no diffraction (or crystal) at all.
Regarding your simulations, I am much more concerned about the columns which
you label by "rmsE". You state that this is the RMS error for each average. I
presume (but I am not sure; please provide a formula) that this is the sample
second moment about the sample mean (i.e. the sample variance); the sample
being all reflections in a given bin ? If so, then the F&W procedure is highly
problematic for weak data. It seems that the F&W procedure completely distorts
the second moment about the mean of the intensity distributions. For weak data
it is close to 0, whereas the true second moment about the mean of the
distribution is 1.0. Now, I would find this much more problematic than a
slightly distorted shell average.
So I am not yet convinced that your simulations demonstrate in any way the
superiority of the FW method over other methods.
Cheers,
Marc
Ian J. Tickle wrote:
<[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
<[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> <[EMAIL
PROTECTED]><mailto:[EMAIL PROTECTED]>
From: "Ian Tickle" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
To: "Frank von Delft" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>, "Pavel
Afonine"
<[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
Cc: <CCP4BB@JISCMAIL.AC.UK><mailto:CCP4BB@JISCMAIL.AC.UK>
Return-Path: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
X-OriginalArrivalTime: 03 Oct 2008 16:38:18.0609 (UTC)
FILETIME=[70D42610:01C92576]
I agree completely with Frank, IMO this is not something you should be
doing, particularly as the likelihood function for intensities can
handle negative & zero intensities perfectly well (Randy assures me).
Out of interest I've updated my simulation where I calculate the average
intensity after correction by the various methods in use, to include the
case where you simply drop I <= 0. I didn't include this case before
because I didn't think anyone would be using it!
Here:
S is the Wilson distribution parameter (true intensity), assuming an
acentric distribution;
Iav is the average uncorrected (raw) intensity;
I'av is the average intensity after thresholding at zero (i.e. I' =
max(I,0) );
I"av is the new case, the average ignoring I <= 0;
<Ja>av is the average of the Bayesian estimate assuming a uniform
distribution of the true intensity as the prior (Sivia & David);
<Jb>av is the average of the Bayesian estimate assuming an acentric
Wilson distribution of the true intensity as the prior (French & Wilson
a la TRUNCATE);
rmsE are the respective RMS errors (RMS difference between the
respective 'corrected' intensity and the true intensity).
sigma(I) = 1 throughout.
S Iav rmsE I'av rmsE I"av rmsE <Ja>av rmsE <Jb>av
rmsE
0.0 0.00 1.00 0.40 0.71 0.80 1.00 0.90 1.00 0.00
0.00
0.5 0.50 1.01 0.74 0.78 1.06 0.92 1.17 0.89 0.51
0.43
1.0 1.00 1.00 1.16 0.84 1.40 0.90 1.50 0.86 1.00
0.64
1.5 1.50 0.99 1.62 0.87 1.80 0.91 1.90 0.86 1.49
0.74
2.0 2.00 1.00 2.10 0.90 2.25 0.93 2.34 0.88 2.00
0.80
2.5 2.50 1.00 2.58 0.91 2.71 0.94 2.79 0.89 2.50
0.83
3.0 3.00 1.00 3.07 0.93 3.19 0.95 3.26 0.91 3.00
0.86
3.5 3.50 1.00 3.56 0.93 3.66 0.95 3.72 0.91 3.50
0.88
4.0 4.00 0.99 4.05 0.94 4.13 0.95 4.20 0.91 4.00
0.89
4.5 4.50 0.99 4.55 0.94 4.62 0.95 4.68 0.92 4.50
0.90
5.0 5.00 1.00 5.04 0.95 5.08 0.96 5.17 0.93 5.00
0.91
It can be seen that the new case (I"av) is worse than using all the
uncorrected intensities (Iav) in terms of average bias (difference
between average corrected I and true average S), and only marginally
better in terms of rmsE, and is significantly worse than including the
negative intensities as zero (I'av) on both counts, particularly for low
average intensity (<= 1 sigma). The Bayesian-corrected intensities are
not needed in practice for refinement (but may be better for other
purposes such as twinning tests) because the likelihood function can
handle the uncorrected negative & zero intensities.
Cheers
-- Ian
-----Original Message-----
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
[mailto:[EMAIL PROTECTED] On Behalf Of Frank von Delft
Sent: 03 October 2008 10:41
To: Pavel Afonine
Cc: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: Re: [ccp4bb] Reading the old literature / truncate /
refinement programs
I mentioned previously phenix.refine tosses your weak data
if IMEAN,
SIGIMEAN are chosen during refinement.
phenix.refine does not automatically remove the data based on sigma
(it does it by user's request only). phenix.refine removes only
negative or zero values for Iobs (Fobs).
That is in fact the same as removing based on sigma:
standard practice
has been for some time that no data is removed, ever.
At the refinement stage, that is. Of course, we do remove data at
merging, for various reasons which probably also need investigating
(e.g. "high res cutoff" = truncation; cf .Free Lunch).
phx.
Disclaimer
This communication is confidential and may contain privileged information
intended solely for the named addressee(s). It may not be used or disclosed
except for the purpose for which it has been sent. If you are not the intended
recipient you must not review, use, disclose, copy, distribute or take any
action in reliance upon it. If you have received this communication in error,
please notify Astex Therapeutics Ltd by emailing [EMAIL
PROTECTED]<mailto:[EMAIL PROTECTED]> and destroy all copies of the message and
any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its messaging
traffic in compliance with its corporate email policy. The Company accepts no
liability or responsibility for any onward transmission or use of emails and
attachments having left the Astex Therapeutics domain. Unless expressly
stated, opinions in this message are those of the individual sender and not of
Astex Therapeutics Ltd. The recipient should check this email and any
attachments for the presence of computer viruses. Astex Therapeutics Ltd
accepts no liability for damage caused by any virus transmitted by this email.
E-mail is susceptible to data corruption, interception, unauthorized amendment,
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the
basis that the Company is not liable for any such alteration or any
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park,
Cambridge CB4 0QA under number 3751674
--
Marc SCHILTZ http://lcr.epfl.ch