Re: [ccp4bb] truncate ignorance

Bart Hazes Mon, 08 Sep 2008 15:54:36 -0700

How a seemingly innocent question can explode ...

I actually thought I understood this but little of what has beendiscussed matches my "mental picture" of the truncate process.

Truncate can do multiple things, but the truncate part I believe reallyjust deals with converting I to F and the inherent problems due toexperimental error and mathematical problems in deriving SigF from SigIwhen I is near zero. This only depends on how close I is to zero(relative to SigI), and not on the Wilson distribution itself.


My mental picture is as follows:

Visualize a gaussian distribution representing I and its standarddeviation, with I being close to zero (either positive or negative).Part of the gaussian will stretch into negative-I territory, which isfine for the experimental I (because of experimental error) but not thetrue I. Given this prior knowledge you can re-estimate I by TRUNCATEingthe negative tail of the gaussian and integrating just the positive partto find the new mean and standard deviation. As a result any reflectionwill become positive (including those starting out with negative I). Theextend to which the method affects the intensity depends on how much ofa negative tail it has, so nearly no effect on I/SigI>=2 reflections andnot really that much on even I/SigI=2 reflections.

I actually think this is a very elegant solution. The only thing better,is to use I directly and avoid the entire issue. I personally think youwant to use the experimental I without correcting it as explained abovesince it will introduce bias and the refinement procedure should takeproper care of random experimental error, unless you mess around withit. However, when you need amplitudes, truncate is the way to go.


Bart

Ian Tickle wrote:

But there's a fundamental difference in approach, the authors here
assume the apparently simpler prior distribution P(I) = 0 for I < 0 &
P(I) = const for I >= 0.  As users of Bayesian priors well know this is
an improper prior since it integrates to infinity instead of unity.
This means that, unlike the case I described for the French & Wilson
formula based on the Wilson distribution which gives unbiased estimates
of the true I's and their average, the effect on the corrected
intensities of using this prior really will be to increase all
intensities (since the mean I for this prior PDF is also infinite!),
hence the intensities and their average must be biased (& I'm sure the
same goes for the corresponding F's).  But as you say in practice the
errors introduced may well not be significant compared with those
introduced by (for example) deconvoluting the overlapping peaks in the
powder pattern.  Also I'm not sure the F vs I argument can be carried
over from the powder to the single crystal case because the kinds of
errors encountered in each case are quite different.

-- Ian

-----Original Message-----

From: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]

Sent: 08 September 2008 22:20
To: Jacob Keller
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] truncate ignorance

I would also recommend reading of the following paper:

D.S. Sivia & W.I.F. David (1994), Acta Cryst. A50, 703-714. ABayesianApproach to Extracting Structure-Factor Amplitudes from PowderDiffraction Data.

Despite of the title, most of the analysis presented in this paperapplies equally well to single-crystal data (see especiallysections 3and 5). If you are not interested in the specific powder-diffractionproblems (i.e. overlapping peaks), you can simply skipsections 4 and 6.


A few interesting points from this paper :

(1) The conversion from I's to F's can be done (in a Bayesianway) byapplying two simple formula (equations 11 and 12 in thepaper), which,for all practical purposes, are as valid as the more complicatedFrench & Wilson procedure (see discussion in section 5).

(2) Re. the use of I's rather than F's : this is discussed onpage 710(final part of section 5). The authors seem to be more in favor ofusing F's.




Marc Schiltz





Quoting Jacob Keller <[EMAIL PROTECTED]>:

Does somebody have a .pdf of that French and Wilson paper?

Thanks in advance,

Jacob

*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
Dallos Laboratory
F. Searle 1-240
2240 Campus Drive
Evanston IL 60208
lab: 847.491.2438
cel: 773.608.9185
email: [EMAIL PROTECTED]
*******************************************

----- Original Message -----
From: "Ethan Merritt" <[EMAIL PROTECTED]>
To: <CCP4BB@JISCMAIL.AC.UK>
Sent: Monday, September 08, 2008 3:03 PM
Subject: Re: [ccp4bb] truncate ignorance

On Monday 08 September 2008 12:30:29 Phoebe Rice wrote:

Dear Experts,

At the risk of exposing excess ignorance, truncate makes me
very nervous because I don't quite get exactly what it is
doing with my data and what its assumptions are.

From the documentation:
========================================================
... the "truncate" procedure (keyword TRUNCATE YES, the
default) calculates a best estimate of F from I, sd(I), and
the distribution of intensities in resolution shells (see
below). This has the effect of forcing all negative
observations to be positive, and inflating the weakest
reflections (less than about 3 sd), because an observation
significantly smaller than the average intensity is likely
to be underestimated.
=========================================================

But is it really true, with data from nice modern detectors,
that the weaklings are underestimated?


It isn't really an issue of the detector per se, although in
principle you could worry about non-linear response to the
input rate of arriving photons.

In practice the issue, now as it was in 1977 (French&Wilson),
arises from the background estimation, profile fitting, and
rescaling that are applied to the individual pixel contents
before they are bundled up into a nice "Iobs".

I will try to restate the original French & Wilson argument,

avoiding the terminology of maximum likelihood and


Bayesian statistics.

1) We know the true intensity cannot be negative.
2) The existence of Iobs<0 reflections in the data set means
 that whatever we are doing is producing some values of
 Iobs that are too low.
3) Assuming that all weak-ish reflections are being processed
 equivalently, then whatever we doing wrong for reflections with
 Iobs near zero on the negative side surely is also going wrong
 for their neighbors that happen to be near Iobs=0 on the positive
 side.
4) So if we "correct" the values of Iobs that went negative, for
 consistency we should also correct the values that are nearly
 the same but didn't quite tip over into the negative range.

Do I really want to inflate them?


Yes.

Exactly what assumptions is it making about the expected
distributions?


Primarily that
1) The histogram of true Iobs is smooth
2) No true Iobs are negative

How compatible are those assumptions with serious anisotropy
and the wierd Wilson plots that nucleic acids give?


Not relevant

Note the original 1978 French and Wilson paper says:
"It is nevertheless important to validate this agreement for
each set of data independently, as the presence of atoms in
special positions or the existence of noncrystallographic
elements of symmetry (or pseudosymmetry) may abrogate the
application of these prior beliefs for some crystal
structures."


It is true that such things matter when you get down to the
nitty-gritty details of what to use as the "expected distribution".
But *all* plausible expected distributions will be non-negative
and smooth.

Please help truncate my ignorance ...

   Phoebe

==========================================================
Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 1723


http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01
_Faculty_Alphabetically.php?faculty_id=123

RNA is really nifty
DNA is over fifty
We have put them
 both in one book
Please do take a
 really good look
http://www.rsc.org/shop/books/2008/9780854042722.asp




--
Ethan A Merritt
Biomolecular Structure Center
University of Washington, Seattle 98195-7742



Disclaimer

This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents.Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof.

Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674



--

Bart Hazes (Associate Professor)
Dept. of Medical Microbiology & Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:    1-780-492-7521

Re: [ccp4bb] truncate ignorance

Reply via email to