Re: [ccp4bb] ctruncate bug?

2014-08-15 Thread Eleanor Dodson
Hmm - Phaser doesn't usually use such high resolution data? Surprised you
are getting any stuff from resolutions higher that 2A.

Whether the intensity  at that resolution is meaningful would need careful
inspection of the truncate logs - is the wilson plot reasonable? Are the
4th moments linear, etc etc..
Lets look on Monday

Eleanor


On 15 August 2014 08:14, Huw Jenkins  wrote:

> Hi
>
> I seem to be getting a lot of outliers rejected by Phaser with data
> processed with the latest ctruncate which are not present when data is
> processed with  the older version (or old truncate) - has something been
> changed in the code that would cause this?
>
> With CCP4 6.4: ctruncate version 1.15.9 : 15/07/14
>
> Phaser logfile:
>
> Outliers with a probability less than 1e-06 will be rejected
>There were 489 (0.5620%) reflections rejected
>
>   HKL   resoF probability
>   94   49   1.65  126.969   8.862e-41
>  -25   50   1.66  124.322   2.404e-30
>   05   50   1.66  131.982   3.516e-35
> -147   47   1.66  123.548   3.541e-27
>  20   20   35   1.66   87.870   5.191e-88
>  -1   15   46   1.66  114.059  8.326e-130
>  -6   21   41   1.67  104.887   7.047e-50
>  -94   49   1.67  120.765   5.821e-22
>   3   34   20   1.67   93.244   6.844e-41
>   74   49   1.67  131.702   2.985e-36
>  -3   34   20   1.67  106.603   5.786e-57
>   4   370   1.67  119.036   3.934e-48
>   1   16   45   1.67  107.599   2.356e-55
> -149   46   1.67  122.033   2.371e-24
>  -5   33   22   1.67  110.802   3.492e-60
>   3   33   22   1.68  102.663   6.189e-46
>  -74   49   1.68  120.855   1.875e-20
>  -26   49   1.68  132.650   6.233e-84
>   9   19   41   1.68   91.717   2.857e-21
> -17   12   43   1.69  112.959   5.298e-37
>More than 20 outliers (see VERBOSE output)
>
>
> With  CCP4 6.4: ctruncate version 1.15.5 : 05/06/14
>
> Phaser logfile:
>
> Outliers with a probability less than 1e-06 will be rejected
>There were 1 (0.0011%) reflections rejected
>
>   HKL   resoF probability
>   020  31.00  913.757   1.369e-08
>
> Thanks,
>
>
> Huw


[ccp4bb] ctruncate bug?

2014-08-15 Thread Huw Jenkins
Hi

I seem to be getting a lot of outliers rejected by Phaser with data processed 
with the latest ctruncate which are not present when data is processed with  
the older version (or old truncate) - has something been changed in the code 
that would cause this?

With CCP4 6.4: ctruncate version 1.15.9 : 15/07/14

Phaser logfile:

Outliers with a probability less than 1e-06 will be rejected
   There were 489 (0.5620%) reflections rejected

  HKL   resoF probability
  94   49   1.65  126.969   8.862e-41
 -25   50   1.66  124.322   2.404e-30
  05   50   1.66  131.982   3.516e-35
-147   47   1.66  123.548   3.541e-27
 20   20   35   1.66   87.870   5.191e-88
 -1   15   46   1.66  114.059  8.326e-130
 -6   21   41   1.67  104.887   7.047e-50
 -94   49   1.67  120.765   5.821e-22
  3   34   20   1.67   93.244   6.844e-41
  74   49   1.67  131.702   2.985e-36
 -3   34   20   1.67  106.603   5.786e-57
  4   370   1.67  119.036   3.934e-48
  1   16   45   1.67  107.599   2.356e-55
-149   46   1.67  122.033   2.371e-24
 -5   33   22   1.67  110.802   3.492e-60
  3   33   22   1.68  102.663   6.189e-46
 -74   49   1.68  120.855   1.875e-20
 -26   49   1.68  132.650   6.233e-84
  9   19   41   1.68   91.717   2.857e-21
-17   12   43   1.69  112.959   5.298e-37
   More than 20 outliers (see VERBOSE output)


With  CCP4 6.4: ctruncate version 1.15.5 : 05/06/14

Phaser logfile:

Outliers with a probability less than 1e-06 will be rejected
   There were 1 (0.0011%) reflections rejected

  HKL   resoF probability
  020  31.00  913.757   1.369e-08

Thanks,


Huw

Re: [ccp4bb] ctruncate bug?

2013-07-25 Thread Douglas Theobald
On Jul 13, 2013, at 5:36 PM, Ian Tickle  wrote:

> On 8 July 2013 18:29, Douglas Theobald  wrote:
> 
> > > Photons only have a Poisson distribution when you can count them:
> > > QM says it meaningless to talk about something you can't observe.  
> >
> > Aw, come on --- QM is a theory, it says no such thing.  The claim
> > that "it's meaningless to talk about something you can't observe" is
> > a philosophical principle, not science.  There are many
> > interpretations of QM, some involving hidden variables, which are
> > precisely things that exist that you can't observe.  Heck, I'd say
> > all of science is *exactly* about the existence of things that we
> > only infer and cannot observe directly.  Remember, when you get the
> > readout from a detector, you are not directly observing photons even
> > then --- you are formally inferring things that you can't observe.
> > There's a whole chain of theory and interpretation that gets you
> > from the electronic readout to the claim that the readout actually
> > represents some number of photons.
> 
> In science we make a clear separation of what is 'data' from what is
> 'model' (or 'inference').  

Often we do, but it is not all that clear cut.  Especially in Bayesian
analyses, "data" and inference are often interchangeable. One man's
inference is another man's data.  As a very pertinent example, the true
intensities are inference, but in terms of the Wilson distribution they
are data (p(J|sdw)).  

> Of course one can be pedantic and argue that everything is really
> inference since our brains interpret everything that goes on 'out
> there' by means of inference from its sensory inputs. Obviously I
> don't take seriously the premise of the 'Matrix' movies (excellent
> though they may be!) that such inputs are just a simulation!  

You can accept that everything is inference without slipping down the
slope to the invalid conclusion that the Matrix is true.  

> At some point you have to believe what your eyes are telling you, as
> long as there is a clear chain of believable cause-and-effect between
> the observation and the inference of that observation.  However, we
> are rightly suspicious of any model that is not supported by data (in
> fact inference in science requires data). Note that I always mean
> 'observation' to be synonymous with 'measurement' (as in 'Fobs'), not
> 'observation' in the weaker sense of 'seeing'.

So you agree that all scientific measurements/observations, aside from
trivially "seeing" something, are indirect.  My argument is that you can
in fact measure (or estimate) the background under the intensity using
the model:

Is = Ib' + Ij

Ib

where we experimentally measure Is (the spot) and Ib (the background
around the spot), and we assume that both Ib' and Ib come from the same
(Poisson) distribution with mean B, and that Ij (a sample from the true
spot intensity J) comes from a Poisson as well.

Given that model, you can actually measure Ib' and Ij.  It's an indirect
measurement, but all measurements/observations are.  Here there's really 
no conceptual difference from how we measure photons with a detector:
we assume some model for how photons interact with our detector and we
behave as if we are measuring photons with the output from the detector
(in your words, the model is a "clear chain of believable
cause-and-effect"). All (non-seeing) measurements are ultimately
indirect and model-based, a form of inference.  Your claim that we can't
measure Ib' or Ij is based on the your rejection of the above model ---
but it is a circular argument to reject that model by saying we can't
measure Ib' or Ij.

Bringing in QM is of no help, since QM is consistent with (non-local)
hidden variable interpretations where photons are in fact
distinguishable and exist as real things with definite physical
properties before we measure them.  

> > Again, this is your own personal philosophical interpretation of QM
> > --- QM itself says nothing of the sort.  For instance, Bohm's pilot
> > wave interpretation of QM, which is completely consistent with
> > observation and QM theory and calculation, states that individual
> > photons *do* go through one slit or the other.  But this is really
> > off point here, I think --- as I said, I don't want to get into a QM
> > debate.
>
> Neither do I, I would just observe that under no QM interpretation can
> you determine which slit any individual photon went through and I
> would argue that it's therefore not even a meaningful question to ask
> which slit it went through.

Again, that is one of many possible QM philosophies.  I think, however,
that if we are to ever find a better theory than QM, then we must ask
questions like "which slit did the photon go through".  Otherwise we
have a science-stopper. Just because QM, in its present form, can't tell
us which slit a photon went through does not logically imply that the
photon did not go through a slit.  

> > I disagree.  Following that logic, we could not t

Re: [ccp4bb] ctruncate bug?

2013-07-24 Thread Douglas Theobald
Hi Randy,

So I've been playing around with equations myself, and I have some alternative 
results.  

As I understand your Mathematica stuff, you are using the data model:

ip = ij + ib'

ib

where ip is the measured peak (before any background correction), and ij is a 
random sample from the true intensity j.  Here ib is the measured background, 
whereas ib' is the background absorbed into ip.  Both ib and ib' are a random 
sample from background jb.  Again, only ip and ib are observed; ij and ib' are 
"hidden" variables.  

Now let me recap your treatment of that model (hopefully I get this right).

You assume Poisson distributions for ip, ij, ib, and ib', and find the joint 
probability of observed ip and ib given j and jb, p(ip,ib|j,jb).  You can 
consider ip and ib as statistically independent, since ip depends on ib', not 
ib.  You then marginalize over jb (the true background intensity) using a flat 
uninformative prior, giving p(ip,ib|j).  You find that p(ip,ib|j) is similar to 
F&W's p(ip-ib|j, sdj), where sdj=sqrt(ip+ib).  

Some sort of scaling is necessary, since in practice ib and ip are counted from 
different numbers of pixels.  You find that, for roughly equal scaling, the 
Poisson version is similar to F&W's Gaussian approximation for even moderate 
counts.

However, in practice, we measure the background from a much larger area than 
the spot.  For example, in the mosflm window I have open now, the background 
area is > 20 times the spot area, for high res, low SNR spots.  Similarly, in 
xds the spot-to-background ratio, in terms of pixel #, is > 10 on average and > 
5 for the great majority of spots.  Therefore, we typically know the value of 
jb to a much better precision than what we can get from ip (which is 
essentially an estimate of j+jb).  

If the relative sd of the background is about 2 or 3 times less than that of 
the spot ip, we can approximate the background estimate of jb as a constant 
(ie, ignore the uncertainty in its value).  This will be valid if the total 
area used for the background measurement is roughly >5 times the area of the 
spot (even less for "negative" peaks).  So what we can do is estimate jb using 
ib, and then find the conditional distribution of j given ip and jb.  Using 
your notation, this distribution is given by:

p(j|ip,jb) = exp(-(jb+j)) (jb+j)^ip / Gamma(ip+1,jb)

where Gamma(.,.) is the upper incomplete gamma function.  

The moments of this distribution have nice analytical forms (well, at least as 
nice as F&W's).  Here's a table comparing the F&W estimates to this Poisson 
treatment, using Randy's ip and jb values, plus some others:

ip   jbExp[j]_fw  SD[j]_fw  h Exp[j]_dt  SD[j]_dt  %diff
   -    ---   -    -
  55   45 11.36.3   1.3   11.9   6.8 5.3
  45   55  3.02.6  -1.53.7   3.3 5.4
  35   65  1.11.1  -5.12.0   2.086
   6   10  1.00.91 -1.61.8   1.780
   13  0.37   0.34 -2.01.3   1.2   240
   4   12  0.45   0.43 -4.01.4   1.3   210 

 100  100  8.06.0   0  8.6   6.6 7.4
  85  100  3.93.4  -1.64.7   4.220
  75  100  2.52.4  -2.93.4   3.235
 500  500 17.8   13.5   0 18.4  14.0 3.3
 440  500  6.25.8  -2.97.0   6.614
1000 1000 25.2   19.1   0 25.8  21   2.3
 920 1000  9.48.8  -2.6   10.3   9.5 9.1
 940 1000 11.6   10.5  -2.0   12.4  11   7

In this table I've used sdj=sqrt(ip) for F&W, since I'm ignoring the 
uncertainty in jb --- Randy used sqrt(ip+ib).  

h = (ip-jb)/sdj  

%diff = (Exp[j]_dt - Exp[j]_fw)/Exp[j]_fw  

Here jb is the # background counts normalized to have the same pixel area as 
ip.  

Whether these would be considered important differences, I'm not sure.  The 
differences are greatest when ip wrote:

> Hi,
> 
> I've been following this discussion, and I was particularly interested by the 
> suggestion that some information might be lost by turning the separate peak 
> and background measurements into a single difference.  I accept the point 
> that there might be value in, e.g., TDS models that pay explicit attention to 
> non-Bragg intensities, but this whole discussion started from the point of 
> what estimates to use for diffracted Bragg intensities in processes such as 
> molecular replacement, refinement, and map calculations.
> 
> I thought I'd run this past the two of you, in case I've missed something.  
> What I decided to look at is the probability distribution for the true 
> diffraction intensity, given the peak and background measurements.  I'm 
> assuming that the peak and background measurements have a Poisson 
> distribution from counting statistics, which seems fine because I'm comparing 
> the Poisso

Re: [ccp4bb] ctruncate bug?

2013-07-13 Thread Ian Tickle
On 8 July 2013 18:29, Douglas Theobald  wrote:

> That's all very interesting --- do you have a good ref for TDS where I
> can read up on the theory/practice?  My protein xtallography books say
> even less than S&J about TDS.  Anyway, this appears to be a problem
> beyond the scope of this present discussion --- in an ideal world we'd
> be modeling all the forms of TDS, and Bragg diffraction, and comparing
> those predictions to the intensity pattern over the entire detector ---
> not just integrating near the reciprocal lattice points.  Going on what
> you said above, it seems the acoustic component can't really be measured
> independently of the Bragg peak, while the optic and Einstein components
> can, or least can be estimated pretty well from the intensity around the
> Bragg peak (which means we can treat it as "background").  In any case,
> I'm going to ignore the TDS complications for now. :).
>

James gave one good reference (Welberry).  Also there's some info here:
http://people.cryst.bbk.ac.uk/~tickle/iucr99/s60.html and ditto with s61,
s62, s63, s64 & s70 (last one is a reference list: ref nos. 31-36 are for
TDS).  I agree with everything you said, as others have also said, about
the need to compare the model of the total coherent scattering (Bragg +
TDS), also including the incoherent contribution, with the actual data
(i.e. the image from the detector).

That's all true, but you can detect peaks independently of one another
> on a detector, so obviously there is some minimal distance away from a
> crystal where you could completely block any given reflection and
> nothing else. Clearly the "reflection stop" would have to be the size of
> the crystal (or at least the beam).
>

As James pointed out most of the background comes from the same place as
the Bragg diffraction (i.e. the crystal) so your reflection stop would
inevitably block both.  There is no distance at which you could block the
Bragg but not at the same time block the TDS.  In fact the theory shows
that Bragg & TDS are simply different terms in the total coherent
scattering (see 's61' page above for details) that are really separated
completely arbitrarily.  This separation of the terms is an artifact of our
attempt to model the Bragg term alone.  However the only viable models of
MX structures are Bragg ones so we have no option but to work with the
Bragg component of the data alone.  In reality there's no distinction
between Bragg and TDS: they are both parts of the same coherent scattering
and it's meaningless to ask whether a particular photon 'belongs' to one
rather than the other, just as with the slit experiment.  The best you can
say is that it belongs to both.  F & W is really just a work-around of our
initial false assumption that the data consist of Bragg diffraction alone.

If Iback' and Iback" come from the same process, then one informs the
> other. Of course you'd have to account for statistical fluctuations.
> This is exactly the same principle behind using Iback to give us
> information about Iback' in French and Wilson's method.
>
> Aw, come on --- QM is a theory, it says no such thing.  The claim that
> "it's meaningless to talk about something you can't observe" is a
> philosophical principle, not science.  There are many interpretations of
> QM, some involving hidden variables, which are precisely things that
> exist that you can't observe.  Heck, I'd say all of science is *exactly*
> about the existence of things that we only infer and cannot observe
> directly.  Remember, when you get the readout from a detector, you are
> not directly observing photons even then --- you are formally inferring
> things that you can't observe.  There's a whole chain of theory and
> interpretation that gets you from the electronic readout to the claim
> that the readout actually represents some number of photons.
>

In science we make a clear separation of what is 'data' from what is
'model' (or 'inference').  Of course one can be pedantic and argue that
everything is really inference since our brains interpret everything that
goes on 'out there' by means of inference from its sensory inputs.
Obviously I don't take seriously the premise of the 'Matrix' movies
(excellent though they may be!) that such inputs are just a simulation!  At
some point you have to believe what your eyes are telling you, as long as
there is a clear chain of believable cause-and-effect between the
observation and the inference of that observation.  However, we are rightly
suspicious of any model that is not supported by data (in fact inference in
science requires data).  Note that I always mean 'observation' to be
synonymous with 'measurement' (as in 'Fobs'), not 'observation' in the
weaker sense of 'seeing'.


> Again, this is your own personal philosophical interpretation of QM ---
> QM itself says nothing of the sort.  For instance, Bohm's pilot wave
> interpretation of QM, which is completely consistent with observation
> and QM theory and calculatio

Re: [ccp4bb] ctruncate bug?

2013-07-09 Thread James Holton

On 6/28/2013 5:13 PM, Douglas Theobald wrote:
I admittedly don't understand TDS well. But I thought it was generally 
assumed that TDS contributes rather little to the conventional 
background measurement outside of the spot (so Stout and Jensen tells 
me :). So I was not even really considering TDS, which I see as a 
different problem from measuring background (am I mistaken here?). I 
thought the background we measure (in the area surrounding the spot) 
mostly came from diffuse solvent scatter, air scatter, loop scatter, 
etc. If so, then we can just consider Itrue = Ibragg + Itds, and worry 
about modeling the different components of Itrue at a different stage. 
And then it would make sense to think about blocking a reflection 
(say, with a minuscule, precisely positioned beam stop very near the 
crystal) and measuring the background in the spot where the reflection 
would hit. That background should be approximated pretty well by 
Iback, the background around the spot (especially if we move far 
enough away from the spot so that TDS is negligible there).


Actually, almost by definition, the resolution at which the disorder in 
the crystal is enough to make the Bragg peaks fade away is also the 
"resolution" where the background due to diffuse scatter is maximized.  
Basically, it's conservation of scattered photons.  The interaction 
cross section is fixed, and the photons that don't go into Bragg peaks 
have to go somewhere.  For those who like equations, the Bragg peaks 
fade with:


Ibragg = I0 * exp(-2*B*s^2)

where "B" is the average atomic B factor (aka "Wilson B"), "s" is 
sin(theta)/lambda (0.5/d), and "I0" is the spot intensity you would see 
if the B factor was zero (perfect crystal).


The background however, goes as:

Ibg = Igas * (1 - exp(-2*B*s^2))

Where Igas is the background intensity you would see if all the atoms in 
the crystal were converted into a gas (infinite B factor) but still 
somehow remained contained within the x-ray beam.  At the so-called 
"resolution limit", the 1-exp() thing is pretty much equal to 1.


In the diffuse scattering field this 1-exp() thing is called the 
"centrosymmetric term", and the first step of data processing is to 
"subtract it out".  What is left over is signatures of correlated 
motions, like "TDS", although strictly speaking TDS is the component due 
to thermally-induced motions only.  At 100K, there is not much "TDS" 
left, but there is still plenty of "diffuse scattering" (DS) due to a 
myriad of other things.


As long as the path the incident x-ray beam takes through "loop" 
(solvent, nylon, etc) is less than the path through the crystal itself, 
and the "air path" (exposed to incident beam and visible from the 
detector) is less than 1000x the path through the crystal, then most of 
the background is actually coming from the crystal "lattice" itself.  
You could put a little spot-specific beamstop up, but all that would do 
is make what we beamline scientists call a "shadow".  Best possible case 
would be to mask off everything coming from the crystal, but since most 
of the background you need to subtract is coming from the crystal itself 
anyway, the "spot specific beamstop" experiment is not really going to 
tell you much. Unless, of course, you are trying to study the diffuse 
scattering. for these experiments spots are annoying because they are 
thousands of times brighter than the effect you are trying to measure.  
Some DS studies have actually taken great pains to avoid putting any 
Bragg peaks on the Ewald sphere.  You can read all about it in T. R. 
Welberry's Oxford University Press book: "Diffuse Scattering and Models 
of Disorder".  Apparently, urea is a classic model system for DS.


A common misconception, however, is that "TDS" can somehow "build up 
under the spot" and give it some "extra" intensity that doesn't have 
anything to do with the average electron density in a unit cell. This is 
absolutely impossible.  Anything that contributes intensity to the 
regions of reciprocal space "under the spots" must have a repeat that is 
identical to the unit cell repeat, and it must also repeat many times in 
a row to make the feature "sharp" enough to hide itself "under the 
spot".  That sounds like a unit cell repeat to me.  Yes, there is such a 
thing as modulated lattices, and also something called "Huang 
scattering" where long-range correlations (cracks and other mechanical 
effects) can give Bragg spots "tails", but where the "tails" end and the 
"unit cell" begins is really just a matter of semantics.  The molecules 
don't actually care what you think the "unit cell" is.


-James Holton
MAD Scientist


Re: [ccp4bb] ctruncate bug?

2013-07-08 Thread Douglas Theobald
On Jul 7, 2013, at 1:44 PM, Ian Tickle  wrote:
>
> On 29 June 2013 01:13, Douglas Theobald 
> wrote:
> 
> > I admittedly don't understand TDS well.  But I thought it was
> > generally assumed that TDS contributes rather little to the
> > conventional background measurement outside of the spot (so Stout
> > and Jensen tells me :).  So I was not even really considering TDS,
> > which I see as a different problem from measuring background (am I
> > mistaken here?).  I thought the background we measure (in the area
> > surrounding the spot) mostly came from diffuse solvent scatter, air
> > scatter, loop scatter, etc.  If so, then we can just consider Itrue
> > = Ibragg + Itds, and worry about modeling the different components
> > of Itrue at a different stage.  And then it would make sense to
> > think about blocking a reflection (say, with a minuscule, precisely
> > positioned beam stop very near the crystal) and measuring the
> > background in the spot where the reflection would hit.  That
> > background should be approximated pretty well by Iback, the
> > background around the spot (especially if we move far enough away
> > from the spot so that TDS is negligible there).
> 
> Stout & Jensen would not be my first choice to learn about TDS!  It's
> a textbook of small-molecule crystallography (I know, it was my main
> textbook during my doctorate on small-molecule structures), and small
> molecules are generally more highly ordered than macromolecules and
> therefore exhibit TDS on a much smaller scale (there are exceptions of
> course).  I think what you are talking about is "acoustic mode" TDS
> (so-called because of its relationship with sound transmission through
> a crystal), which peaks under the Bragg spots and is therefore very
> hard to distinguish from it.  The other two contributors to TDS that
> are often observed in MX are "optic mode" and "Einstein model".  TDS
> arises from correlated motions within the crystal, for acoustic mode
> it's correlated motions of whole unit cells within the lattice, for
> optic mode it's correlations of different parts of a unit cell (e.g.
> correlated domain motions in a protein), and for Einstein model it's
> correlations of the movement of electrons as they are carried along by
> vibrating atoms (an "Einstein solid" is a simple model of a crystal
> proposed by A. Einstein consisting of a collection of independent
> quantised harmonic-isotropic oscillators; I doubt he was aware of its
> relevance to TDS, that came later).  Here's an example of TDS:
> http://people.cryst.bbk.ac.uk/~tickle/iucr99/tds2f.gif .  The acoustic
> mode gives the haloes around the Bragg spots (but as I said mainly
> coincides with the spots), the optic mode gives the nebulous blobs,
> wisps and streaks that are uncorrelated with the Bragg spots (you can
> make out an inner ring of 14 blobs due to the 7-fold NCS), and the
> Einstein model gives the isotropic uniform greying increasing towards
> the outer edge (makes it look like the diffraction pattern has been
> projected onto a sphere).  So I leave you to decide whether TDS
> contributes to the background!

That's all very interesting --- do you have a good ref for TDS where I
can read up on the theory/practice?  My protein xtallography books say
even less than S&J about TDS.  Anyway, this appears to be a problem
beyond the scope of this present discussion --- in an ideal world we'd
be modeling all the forms of TDS, and Bragg diffraction, and comparing
those predictions to the intensity pattern over the entire detector ---
not just integrating near the reciprocal lattice points.  Going on what
you said above, it seems the acoustic component can't really be measured
independently of the Bragg peak, while the optic and Einstein components
can, or least can be estimated pretty well from the intensity around the
Bragg peak (which means we can treat it as "background").  In any case,
I'm going to ignore the TDS complications for now. :)

> As for the blocking beam stop, every part of the crystal (or at least
> every part that's in the beam) contributes to every part of the
> diffraction pattern (i.e. Fourier transform).  This means that your
> beam stop would have to mask the whole crystal - any small bit of the
> crystal left unmasked and exposed to the beam would give a complete
> diffraction pattern!  That means you wouldn't see anything, not even
> the background!  

That's all true, but you can detect peaks independently of one another
on a detector, so obviously there is some minimal distance away from a
crystal where you could completely block any given reflection and
nothing else. Clearly the "reflection stop" would have to be the size of
the crystal (or at least the beam).

> You could leave a small hole in the centre for the direct beam and
> that would give you the air scatter contribution, but usually the air
> path is minimal anyway so that's only a very small contribution to the
> total background.  But let's say by some magic you

Re: [ccp4bb] ctruncate bug?

2013-07-07 Thread Ian Tickle
On 29 June 2013 01:13, Douglas Theobald  wrote:

> Just because the detectors spit out positive numbers (unsigned ints) does
> not mean that those values are Poisson distributed.  As I understand it,
> the readout can introduce non-Poisson noise, which is usually modeled as
> Gaussian.
>

OK but positive numbers would seem to rule out a Gaussian model.  I wonder
has anyone actually done the experiment of obtaining the distribution of
photon counts from a source at various intensities and using different
types of detectors?  My suspicion is that the distributions would all be
pretty close to Poisson.


> I think you mean that the Poisson has the property that mean(x) = var(x)
> (and since the ML estimate of the mean = count, you get your equation).
>  Many other distributions can approximate that (most of the binomial
> variants with small p).  Also, the standard gamma distribution with scale
> parameter=1 has that exact property.
>

Yes.


> Maybe it is, but that has its own problems.  I imagine that most people
> who collect an X-ray dataset think that the intensities in their mtz are
> indeed estimates of the true intensities from their crystal.  Seems like a
> reasonable thing to expect, especially since the fourier of our model is
> supposed to predict Itrue.  If Iobs is not an estimate of Itrue, what
> exactly is its relevance to the structure inference problem?  Maybe it only
> serves as a way-station on the road to the French-Wilson correction?  As I
> understand it, not everyone uses ctruncate.
>

I assumed from the subject line that we were talking about the case where
(c)truncate is used.  Those who don't are on their own AFAIC!


> I admittedly don't understand TDS well.  But I thought it was generally
> assumed that TDS contributes rather little to the conventional background
> measurement outside of the spot (so Stout and Jensen tells me :).  So I was
> not even really considering TDS, which I see as a different problem from
> measuring background (am I mistaken here?).  I thought the background we
> measure (in the area surrounding the spot) mostly came from diffuse solvent
> scatter, air scatter, loop scatter, etc.  If so, then we can just consider
> Itrue = Ibragg + Itds, and worry about modeling the different components of
> Itrue at a different stage.  And then it would make sense to think about
> blocking a reflection (say, with a minuscule, precisely positioned beam
> stop very near the crystal) and measuring the background in the spot where
> the reflection would hit.  That background should be approximated pretty
> well by Iback, the background around the spot (especially if we move far
> enough away from the spot so that TDS is negligible there).
>

Stout & Jensen would not be my first choice to learn about TDS!  It's a
textbook of small-molecule crystallography (I know, it was my main textbook
during my doctorate on small-molecule structures), and small molecules are
generally more highly ordered than macromolecules and therefore exhibit TDS
on a much smaller scale (there are exceptions of course).  I think what you
are talking about is "acoustic mode" TDS (so-called because of its
relationship with sound transmission through a crystal), which peaks under
the Bragg spots and is therefore very hard to distinguish from it.  The
other two contributors to TDS that are often observed in MX are "optic
mode" and "Einstein model".  TDS arises from correlated motions within the
crystal, for acoustic mode it's correlated motions of whole unit cells
within the lattice, for optic mode it's correlations of different parts of
a unit cell (e.g. correlated domain motions in a protein), and for Einstein
model it's correlations of the movement of electrons as they are carried
along by vibrating atoms (an "Einstein solid" is a simple model of a
crystal proposed by A. Einstein consisting of a collection of independent
quantised harmonic-isotropic oscillators; I doubt he was aware of its
relevance to TDS, that came later).  Here's an example of TDS:
http://people.cryst.bbk.ac.uk/~tickle/iucr99/tds2f.gif .  The acoustic mode
gives the haloes around the Bragg spots (but as I said mainly coincides
with the spots), the optic mode gives the nebulous blobs, wisps and streaks
that are uncorrelated with the Bragg spots (you can make out an inner ring
of 14 blobs due to the 7-fold NCS), and the Einstein model gives the
isotropic uniform greying increasing towards the outer edge (makes it look
like the diffraction pattern has been projected onto a sphere).  So I leave
you to decide whether TDS contributes to the background!

As for the blocking beam stop, every part of the crystal (or at least every
part that's in the beam) contributes to every part of the diffraction
pattern (i.e. Fourier transform).  This means that your beam stop would
have to mask the whole crystal - any small bit of the crystal left unmasked
and exposed to the beam would give a complete diffraction pattern!  That
means you wouldn't see anyth

Re: [ccp4bb] ctruncate bug?

2013-07-06 Thread Pavel Afonine
Hi James,


On Sat, Jul 6, 2013 at 6:31 PM, James Holton  wrote:

>
> I think it is also important to point out here that the "resolution
> cutoff" of the data you provide to refmac or phenix.refine is not
> necessarily the "resolution of the structure".  This latter quantity,
> although emotionally charged, really does need to be more well-defined
>

I guess something along these lines is in upcoming Acta D:

http://journals.iucr.org/d/services/readerservices.html

Pavel


Re: [ccp4bb] ctruncate bug?

2013-07-06 Thread James Holton


The dominant source of error in an intensity measurement actually 
depends on the magnitude of the intensity.  For intensities near zero 
and with zero background, the "read-out noise" of image plate or 
CCD-based detectors becomes important.  On most modern CCD detectors, 
however, the read-out noise is quite low: equivalent to the noise 
induced by having only a few "extra" photons/pixel (if any).  For 
intensities of more than ~1000 photons, the calibration of the detector 
(~2-3% error) starts to dominate.  It is only for a "midrange" between 
~2 photons/pixel and 1000 integrated photons that "shot noise" (aka 
"photon counting error" or "Poisson statistics") plays the major role.  
So it is perhaps a bit ironic that the "photon counting error" we worry 
so much about is only significant for a very narrow range of intensities 
in any given data set.


But yes, there does seem to be something "wrong" with ctruncate. It can 
throw out a great deal of hkls that both xdsconv and the "old truncate" 
keep.  Graph of the resulting Wilson plots here:

http://bl831.als.lbl.gov/~jamesh/bugreports/ctruncate/truncated_wilsons.png
and the script for producing the data for this plot from "scratch":
http://bl831.als.lbl.gov/~jamesh/bugreports/ctruncate/truncate_notes.com

Note that only 3 bins are even populated in the ctruncate result, 
whereas "truncate" and "xdsconv" seem to reproduce the true Wilson plot 
faithfully down to well below the noise, which in this case is a 
Gaussian deviate with RMS = 1.0 added to each F^2.


The "plateau" in the result from xdsconv is something I've been working 
with Kay to understand, but it seems to be a problem with the 
French-Wilson algorithm itself, and not any particular implementation of 
it.  Basically, French and Wilson did not want to assume that the Wilson 
plot was straight and therefore don't use the "prior information" that 
if the intensities dropped into the noise at 2.0 A then the average 
value of "F" and 1.0 A is much much less than "sigma"!  As a result, the 
French-Wilson values for "F" far above the traditional "resolution 
limit" can be overestimated by as much as a factor of a million.  
Perhaps this is why truncate and ctruncate complain bitterly about "data 
beyond useful resolution limit".


A shame really, because if the Wilson plot of the "truncated" data is 
made to follow the linear trend we see in the low-angle data, then we 
wouldn't need to argue so much.  After all, the only reason we apply a 
resolution cutoff is to try and suppress the "noise" coming from all 
those background-only spots at high angle.  But, on the other hand, we 
don't want to cut the data too harshly or we will get series-termination 
errors.  So, we must strike a compromise between these two sources of 
error and call that the "resolution cutoff".  But, if the conversion of 
I to F actually used the "prior knowledge" of the fall-off of the Wilson 
plot with resolution, then there would be no need for a "resolution 
cutoff" at all.  The current situation is portrayed in this graph:


http://bl831.als.lbl.gov/~jamesh/wilson/error_breakdown.png

which just showed the noise induced in an electron density map by 
applying a resolution cutoff to otherwise "perfect" data, vs the error 
due to adding noise and running truncate.  If the noisy data were 
down-weighted only a little bit, then the "total noise" curve would 
continue to drop, even at "infinite resolution".


I think it is also important to point out here that the "resolution 
cutoff" of the data you provide to refmac or phenix.refine is not 
necessarily the "resolution of the structure".  This latter quantity, 
although emotionally charged, really does need to be more well-defined 
by this community and preferably in a way that is historically 
"stable".  You can't just take data that goes to 5.0A and call it "4.5A 
data" by changing your criterion.  Yes, it is "better" to refine out to 
4.5A when the intensities drop into the noise at 5A, but that is never 
going to be as good as using data that does not drop into the noise 
until 4.5A.


-James Holton
MAD Scientist

On 6/27/2013 9:30 AM, Ian Tickle wrote:
On 22 June 2013 19:39, Douglas Theobald > wrote:



So I'm no detector expert by any means, but I have been assured by
those who are that there are non-Poissonian sources of noise --- I
believe mostly in the readout, when photon counts get amplified.
 Of course this will depend on the exact type of detector, maybe
the newest have only Poisson noise.


Sorry for delay in responding, I've been thinking about it.  It's 
indeed possible that the older detectors had non-Poissonian noise as 
you say, but AFAIK all detectors return _unsigned_ integers (unless 
possibly the number is to be interpreted as a flag to indicate some 
error condition, but then obviously you wouldn't interpret it as a 
count).  So whatever the detector AFAIK it's physically impossible for 
it to r

Re: [ccp4bb] ctruncate bug?

2013-06-30 Thread Ian Tickle
Ed, sorry, not sure what happened to the 1st attachment, it seems to have
vanished!

Cheers

-- Ian
<>

Re: [ccp4bb] ctruncate bug?

2013-06-30 Thread Ian Tickle
On 21 June 2013 13:36, Ed Pozharski  wrote:

> Replacing Iobs with E(J) is not only unnecessary, it's ill-advised as it
will distort intensity statistics.

On 21 June 2013 18:40, Ed Pozharski  wrote:

> I think this is exactly what I was trying to emphasize, that applying
some conversion to raw intensities may have negative impact when conversion
is based on incorrect or incomplete assumptions.

Ed, I think you may have missed the point I was trying to make (or more
likely I didn't make it sufficiently explicit).

Let me re-phrase your first response above slightly (I know you didn't say
this, but it's equivalent to what you did say): "Replacing sqrt(I) with
E(F) is not only unnecessary, it's ill-advised as it will distort the
structure refinement.".   Does that make sense?  I assume you're using
(c)truncate for all datasets, or do you only use it where t-NCS is absent?
If you use (c)truncate with t-NCS then you are already having a "negative
impact" via use of an incorrectly estimated Epost(F)  ('post' = 'posterior'
to distinguish from the prior).  If not (i.e. you only use (c)truncate when
t-NCS is absent) then clearly the use of Epost(J) in place of J cannot
"distort intensity statistics".

The negative impact comes from the failure to properly account for t-NCS,
not from use of Epost(J), since Epost(F) is equally affected.  AFAIK all
current software that performs F & W conversion take no account of t-NCS
and averages Iobs in spherical shells of constant d-spacing (or possibly
more sophisticated ellipsoidal shells, but that doesn't affect the
argument), in order to estimate Eprior(J) as  for use in the Wilson
prior (e = symmetry enhancement factor).  T-NCS will affect both Eprior(J)
and Epost(J) (and Epost(F)) equally, since the only difference between
these is the factor P(I|J,sigmaI) (Gaussian experimental error), which
doesn't depend on t-NCS.  So if you are using (c)truncate with t-NCS you
are already using incorrect estimates of Eprior(J) and hence Epost(F)!

Your statement "replacing Iobs with E(J) is not only unnecessary, it's
ill-advised" neglects the fact that there are always 2 sides to an
argument, both pros and cons.  Let me illustrate a couple of examples of
severe negative impacts of the use of Iobs in place of Epost(J) in
intensity stats: I leave you to judge which will "distort intensity
statistics" more!

The first concerns the P & Y L test for twinning I mentioned previously as
an example where problems arise from use of Iobs.  L is defined as |I1 -
I2| / (I1 + I2) where I1 & I2 are unrelated intensities close in reciprocal
space (i.e. where |h1-h2| + |k1-k2| + |l1-l2| <= 4 and no |index
difference| equals 1 to try to avoid t-NCS issues).  The distribution of L
is confined to the range 0 to 1 so clearly if you have an L outside that
you have a problem.  Let's say we have a -ve intensity I2 = -1 and we vary
I1.  The output from gnuplot (x = I1, y = L) is attached (Ltest-1.png).
Note that L never falls _inside_ the allowed range (and there's a
singularity going off to - & +inf at I1 = 1).  Now say we use E(J) in place
of Iobs.  Now I2 = -1 will become (say) E(J2) = 0.1 and E(J1) can't be <=
0.  See attached plot (Ltest-2.png) for the result: no value of L is now
_outside_ the allowed range.  Of course you could "fix" the first case by
ignoring all I <= 0 but then you wouldn't need to use F & W!

The second example concerns the moments of Iobs (or the moments of Z where
Z = (Iobs/e)/).  The n'th moment of Z is  and (c)truncate
calculates it for n = 1 to 4.  Lets say we have 2 reflections Z = -2 and Z
= 2.  So the 4th moments are both +16.  Does that make any intuitive
sense?  A -ve intensity contributes the same as the corresponding +ve one?
Now say we use E(J) or E(Z) instead.  Z = -2 will become (say) +0.2 and Z
=2 will become 2.1.  Now the moments are intuitive & sensible: the -ve
intensity barely contributes to the stats.

Finally a point arising from a response to one of Douglas's posts:

On 21 June 2013 19:48, Ed Pozharski  wrote:

> If you replace negative Iobs with E(J), you would systematically inflate
the averages, which may turn problematic in some cases.  It is probably
better to stick with "raw intensities" and construct theoretical
predictions properly to account for their properties.

This is simply wrong: the corrected intensities are unbiased, so the
average E(J) is exactly equal to the average I (as I demonstrated some time
back in a previous discussion of this topic).

Cheers

-- Ian
<>

Re: [ccp4bb] ctruncate bug?

2013-06-28 Thread Douglas Theobald
On Jun 27, 2013, at 12:30 PM, Ian Tickle  wrote:

> On 22 June 2013 19:39, Douglas Theobald  wrote:
> 
>> So I'm no detector expert by any means, but I have been assured by those who 
>> are that there are non-Poissonian sources of noise --- I believe mostly in 
>> the readout, when photon counts get amplified.  Of course this will depend 
>> on the exact type of detector, maybe the newest have only Poisson noise.
> 
> Sorry for delay in responding, I've been thinking about it.  It's indeed 
> possible that the older detectors had non-Poissonian noise as you say, but 
> AFAIK all detectors return _unsigned_ integers (unless possibly the number is 
> to be interpreted as a flag to indicate some error condition, but then 
> obviously you wouldn't interpret it as a count).  So whatever the detector 
> AFAIK it's physically impossible for it to return a negative number that is 
> to be interpreted as a photon count (of course the integration program may 
> interpret the count as a _signed_ integer but that's purely a technical 
> software issue).  

Just because the detectors spit out positive numbers (unsigned ints) does not 
mean that those values are Poisson distributed.  As I understand it, the 
readout can introduce non-Poisson noise, which is usually modeled as Gaussian.  

> I think we're all at least agreed that, whatever the true distribution of 
> Ispot (and Iback) is, it's not in general Gaussian, except as an 
> approximation in the limit of large Ispot and Iback (with the proviso that 
> under this approximation Ispot & Iback can never be negative).  Certainly the 
> assumption (again AFAIK) has always been that var(count) = count and I think 
> I'm right in saying that only a Poisson distribution has that property?

I think you mean that the Poisson has the property that mean(x) = var(x) (and 
since the ML estimate of the mean = count, you get your equation).  Many other 
distributions can approximate that (most of the binomial variants with small 
p).  Also, the standard gamma distribution with scale parameter=1 has that 
exact property.  

>> No, its just terminology.  For you, Iobs is defined as Ispot-Iback, and 
>> that's fine.  (As an aside, assuming the Poisson model, this Iobs will have 
>> a Skellam distribution, which can take negative values and asymptotically 
>> approaches a Gaussian.)  The photons contributed to Ispot from Itrue will 
>> still be Poisson.  Let's call them something besides Iobs, how about Ireal?  
>> Then, the Poisson model is
>> 
>> Ispot = Ireal + Iback'
>> 
>> where Ireal comes from a Poisson with mean Itrue, and Iback' comes from a 
>> Poisson with mean Iback_true.  The same likelihood function follows, as well 
>> as the same points.  You're correct that we can't directly estimate Iback', 
>> but I assume that Iback (the counts around the spot) come from the same 
>> Poisson with mean Iback_true (as usual).  
>> 
>> So I would say, sure, you have defined Iobs, and it has a Skellam 
>> distribution, but what, if anything, does that Iobs have to do with Itrue?  
>> My point still holds, that your Iobs is not a valid estimate of Itrue when 
>> Ispot> namely that photon counts can be negative.  It is impossible to derive 
>> Ispot-Iback as an estimate for Itrue (when Ispot> that unphysical assumption (like the Gaussian model).
> 
> Please note that I have never claimed that Iobs = Ispot - Iback is to be 
> interpreted as an estimate of Itrue, indeed quite the opposite: I agree 
> completely that Iobs has little to do with Itrue when Iobs is negative.  In 
> fact I don't believe anyone else is claiming that Iobs is to be interpreted 
> as an estimate of Itrue either, so maybe this is the source of the 
> misunderstanding?  

Maybe it is, but that has its own problems.  I imagine that most people who 
collect an X-ray dataset think that the intensities in their mtz are indeed 
estimates of the true intensities from their crystal.  Seems like a reasonable 
thing to expect, especially since the fourier of our model is supposed to 
predict Itrue.  If Iobs is not an estimate of Itrue, what exactly is its 
relevance to the structure inference problem?  Maybe it only serves as a 
way-station on the road to the French-Wilson correction?  As I understand it, 
not everyone uses ctruncate.  

> Certainly for me Ispot - Iback is merely the difference between the two 
> measurements, nothing more.  Maybe if we called it something other than Iobs 
> (say Idiff), or even avoided giving it a name altogether that would avoid any 
> further confusion?  Perhaps this whole discussion has been merely about 
> terminology?
>  
>> I'm also puzzled as to your claim that Iback' is not Poisson.  I don't think 
>> your QM argument is relevant, since we can imagine what we would have 
>> detected at the spot if we'd blocked the reflection, and that # of photon 
>> counts would be Poisson.  That is precisely the conventional logic behind 
>> estimating Iback' with Iback (from around the spot), i

Re: [ccp4bb] ctruncate bug?

2013-06-27 Thread Ian Tickle
On 22 June 2013 19:39, Douglas Theobald  wrote:

>
> So I'm no detector expert by any means, but I have been assured by those
> who are that there are non-Poissonian sources of noise --- I believe mostly
> in the readout, when photon counts get amplified.  Of course this will
> depend on the exact type of detector, maybe the newest have only Poisson
> noise.
>

Sorry for delay in responding, I've been thinking about it.  It's indeed
possible that the older detectors had non-Poissonian noise as you say, but
AFAIK all detectors return _unsigned_ integers (unless possibly the number
is to be interpreted as a flag to indicate some error condition, but then
obviously you wouldn't interpret it as a count).  So whatever the detector
AFAIK it's physically impossible for it to return a negative number that is
to be interpreted as a photon count (of course the integration program may
interpret the count as a _signed_ integer but that's purely a technical
software issue).  I think we're all at least agreed that, whatever the true
distribution of Ispot (and Iback) is, it's not in general Gaussian, except
as an approximation in the limit of large Ispot and Iback (with the proviso
that under this approximation Ispot & Iback can never be negative).
Certainly the assumption (again AFAIK) has always been that var(count) =
count and I think I'm right in saying that only a Poisson distribution has
that property?

No, its just terminology.  For you, Iobs is defined as Ispot-Iback, and
> that's fine.  (As an aside, assuming the Poisson model, this Iobs will have
> a Skellam distribution, which can take negative values and asymptotically
> approaches a Gaussian.)  The photons contributed to Ispot from Itrue will
> still be Poisson.  Let's call them something besides Iobs, how about Ireal?
>  Then, the Poisson model is
>
> Ispot = Ireal + Iback'
>
> where Ireal comes from a Poisson with mean Itrue, and Iback' comes from a
> Poisson with mean Iback_true.  The same likelihood function follows, as
> well as the same points.  You're correct that we can't directly estimate
> Iback', but I assume that Iback (the counts around the spot) come from the
> same Poisson with mean Iback_true (as usual).
>
> So I would say, sure, you have defined Iobs, and it has a Skellam
> distribution, but what, if anything, does that Iobs have to do with Itrue?
>  My point still holds, that your Iobs is not a valid estimate of Itrue when
> Ispot namely that photon counts can be negative.  It is impossible to
> derive Ispot-Iback as an estimate for Itrue (when Ispot make that unphysical assumption (like the Gaussian model).
>

Please note that I have never claimed that Iobs = Ispot - Iback is to be
interpreted as an estimate of Itrue, indeed quite the opposite: I agree
completely that Iobs has little to do with Itrue when Iobs is negative.  In
fact I don't believe anyone else is claiming that Iobs is to be interpreted
as an estimate of Itrue either, so maybe this is the source of the
misunderstanding?  Certainly for me Ispot - Iback is merely the difference
between the two measurements, nothing more.  Maybe if we called it
something other than Iobs (say Idiff), or even avoided giving it a name
altogether that would avoid any further confusion?  Perhaps this whole
discussion has been merely about terminology?


> I'm also puzzled as to your claim that Iback' is not Poisson.  I don't
> think your QM argument is relevant, since we can imagine what we would have
> detected at the spot if we'd blocked the reflection, and that # of photon
> counts would be Poisson.  That is precisely the conventional logic behind
> estimating Iback' with Iback (from around the spot), it's supposedly a
> reasonable control.  It doesn't matter that in reality the photons are
> indistinguishable --- that's exactly what the probability model is for.
>

I'm not clear how you would "block the reflection"?  How could you do that
without also blocking the background under it?  A large part of the
background comes from the TDS which is coming from the same place that the
Bragg diffraction is coming from, i.e. the crystal.  I know of no way of
stopping the Bragg diffraction without also stopping the TDS (or vice
versa).  Indeed the theory shows that there is in reality no distinction
between Bragg diffraction and TDS; they are just components of the total
scattering that we find convenient to imagine as separate in the dynamical
model of scattering (see
http://people.cryst.bbk.ac.uk/~tickle/iucr99/s61.html for the relevant
equations).

Any given photon "experiences" the whole crystal on its way from the source
to the detector (in fact it experiences more than that: it traverses all
possible trajectories simultaneously, it's just that the vast majority
cancel by destructive interference).  The resulting wave function of the
photon only collapses to a single point on hitting the detector, with a
frequency proportional to the square of the wave function at that point, so
it's meaningless to 

Re: [ccp4bb] ctruncate bug?

2013-06-24 Thread Jrh
Dear Pavel,
Diffuse scattering is probably the most difficult topic I have worked on.
Reading Peter Moore's new book and his insights give me renewed hope we could 
make much more of it, as I mentioned to Tim re 'structure and dynamics'. 
You describe more aspects below obviously.
Greetings,
John
Prof John R Helliwell DSc 
 
 

On 24 Jun 2013, at 17:12, Pavel Afonine  wrote:

> Refinement against images is a nice old idea. 
> From refinement technical point of view it's going to be challenging. 
> Refining just two flat bulk solvent model ksol&Bsol simultaneously may be 
> tricky, or occupancy + individual B-factor + TLS, or ask multipolar 
> refinement folk about whole slew of magic they use to refine different 
> multipolar parameters at different stages of refinement proces and in 
> different order and applied to different atom types (H vs non-H) 
> ...etc...etc. Now if you convolute all this with the whole diffraction 
> experiment parameters through using images in refinement that will be big 
> fun, I'm sure.
> Pavel
> 
> 
> 
> On Sun, Jun 23, 2013 at 11:13 PM, Jrh  wrote:
> Dear Tom,
> I find this suggestion of using the full images an excellent and visionary 
> one.
> So, how to implement it?
> We are part way along the path with James Holton's reverse Mosflm.
> The computer memory challenge could be ameliorated by simple pixel averaging 
> at least initially.
> The diffuse scattering would be the ultimate gold at the end of the rainbow. 
> Peter Moore's new book, inter alia, carries many splendid insights into the 
> diffuse scattering in our diffraction patterns.
> Fullprof analyses have become a firm trend in other fields, admittedly with 
> simpler computing overheads.
> Greetings,
> John
> 
> Prof John R Helliwell DSc FInstP
> 
> 
> 
> On 21 Jun 2013, at 23:16, "Terwilliger, Thomas C"  
> wrote:
> 
> > I hope I am not duplicating too much of this fascinating discussion with 
> > these comments:  perhaps the main reason there is confusion about what to 
> > do is that neither F nor I is really the most suitable thing to use in 
> > refinement.  As pointed out several times in different ways, we don't 
> > measure F or I, we only measure counts on a detector.  As a convenience, we 
> > "process" our diffraction images to estimate I or F and their uncertainties 
> > and model these uncertainties as simple functions (e.g., a Gaussian).  
> > There is no need in principle to do that, and if we were to refine instead 
> > against the raw image data these issues about positivity would disappear 
> > and our structures might even be a little better.
> >
> > Our standard procedure is to estimate F or I from counts on the detector, 
> > then to use these estimates of F or I in refinement.  This is not so easy 
> > to do right because F or I contain many terms coming from many pixels and 
> > it is hard to model their statistics in detail.  Further, attempts we make 
> > to estimate either F or I as physically plausible values (e.g., using the 
> > fact that they are not negative) will generally be biased (the values after 
> > correction will generally be systematically low or systematically high, as 
> > is true for the French and Wilson correction and as would be true for the 
> > truncation of I at zero or above).
> >
> > Randy's method for intensity refinement is an improvement because the 
> > statistics are treated more fully than just using an estimate of F or I and 
> > assuming its uncertainty has a simple distribution.  So why not avoid all 
> > the problems with modeling the statistics of processed data and instead 
> > refine against the raw data.  From the structural model you calculate F, 
> > from F and a detailed model of the experiment (the same model that is 
> > currently used in data processing) you calculate the counts expected on 
> > each pixel. Then you calculate the likelihood of the data given your models 
> > of the structure and of the experiment.  This would have lots of benefits 
> > because it would allow improved descriptions of the experiment (decay, 
> > absorption, detector sensitivity, diffuse scattering and other "background" 
> > on the images,on and on) that could lead to more accurate structures in 
> > the end.  Of course there are some minor issues about putting all this in 
> > computer memory for refinement
> >
> > -Tom T
> > 
> > From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Phil 
> > [p...@mrc-lmb.cam.ac.uk]
> > Sent: Friday, June 21, 2013 2:50 PM
> > To:

Re: [ccp4bb] ctruncate bug?

2013-06-24 Thread Pavel Afonine
Refinement against images is a nice old idea.
>From refinement technical point of view it's going to be challenging.
Refining just two flat bulk solvent model ksol&Bsol simultaneously may be
tricky, or occupancy + individual B-factor + TLS, or ask multipolar
refinement folk about whole slew of magic they use to refine different
multipolar parameters at different stages of refinement proces and in
different order and applied to different atom types (H vs non-H)
...etc...etc. Now if you convolute all this with the whole diffraction
experiment parameters through using images in refinement that will be big
fun, I'm sure.
Pavel



On Sun, Jun 23, 2013 at 11:13 PM, Jrh  wrote:

> Dear Tom,
> I find this suggestion of using the full images an excellent and visionary
> one.
> So, how to implement it?
> We are part way along the path with James Holton's reverse Mosflm.
> The computer memory challenge could be ameliorated by simple pixel
> averaging at least initially.
> The diffuse scattering would be the ultimate gold at the end of the
> rainbow. Peter Moore's new book, inter alia, carries many splendid insights
> into the diffuse scattering in our diffraction patterns.
> Fullprof analyses have become a firm trend in other fields, admittedly
> with simpler computing overheads.
> Greetings,
> John
>
> Prof John R Helliwell DSc FInstP
>
>
>
> On 21 Jun 2013, at 23:16, "Terwilliger, Thomas C" 
> wrote:
>
> > I hope I am not duplicating too much of this fascinating discussion with
> these comments:  perhaps the main reason there is confusion about what to
> do is that neither F nor I is really the most suitable thing to use in
> refinement.  As pointed out several times in different ways, we don't
> measure F or I, we only measure counts on a detector.  As a convenience, we
> "process" our diffraction images to estimate I or F and their uncertainties
> and model these uncertainties as simple functions (e.g., a Gaussian).
>  There is no need in principle to do that, and if we were to refine instead
> against the raw image data these issues about positivity would disappear
> and our structures might even be a little better.
> >
> > Our standard procedure is to estimate F or I from counts on the
> detector, then to use these estimates of F or I in refinement.  This is not
> so easy to do right because F or I contain many terms coming from many
> pixels and it is hard to model their statistics in detail.  Further,
> attempts we make to estimate either F or I as physically plausible values
> (e.g., using the fact that they are not negative) will generally be biased
> (the values after correction will generally be systematically low or
> systematically high, as is true for the French and Wilson correction and as
> would be true for the truncation of I at zero or above).
> >
> > Randy's method for intensity refinement is an improvement because the
> statistics are treated more fully than just using an estimate of F or I and
> assuming its uncertainty has a simple distribution.  So why not avoid all
> the problems with modeling the statistics of processed data and instead
> refine against the raw data.  From the structural model you calculate F,
> from F and a detailed model of the experiment (the same model that is
> currently used in data processing) you calculate the counts expected on
> each pixel. Then you calculate the likelihood of the data given your models
> of the structure and of the experiment.  This would have lots of benefits
> because it would allow improved descriptions of the experiment (decay,
> absorption, detector sensitivity, diffuse scattering and other "background"
> on the images,on and on) that could lead to more accurate structures in
> the end.  Of course there are some minor issues about putting all this in
> computer memory for refinement
> >
> > -Tom T
> > 
> > From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Phil [
> p...@mrc-lmb.cam.ac.uk]
> > Sent: Friday, June 21, 2013 2:50 PM
> > To: CCP4BB@JISCMAIL.AC.UK
> > Subject: Re: [ccp4bb] ctruncate bug?
> >
> > However you decide to argue the point, you must consider _all_ the
> observations of a reflection (replicates and symmetry related) together
> when you infer Itrue or F etc, otherwise you will bias the result even
> more. Thus you cannot (easily) do it during integration
> >
> > Phil
> >
> > Sent from my iPad
> >
> > On 21 Jun 2013, at 20:30, Douglas Theobald 
> wrote:
> >
> >> On Jun 21, 2013, at 2:48 PM, Ed Pozharski 
> wrote:
> >>
> >>> Douglas,
> >>>>> Observed intens

Re: [ccp4bb] ctruncate bug?

2013-06-24 Thread Terwilliger, Thomas C
Implementing refinement against images will be pretty challenging.  As far as I 
know the problem isn't in saying what has to happen, but rather in the enormous 
amount of bookkeeping necessary to relate a model of a structure and a model of 
the entire experiment (including such details as parameters defining spot 
shape, absorption etc) to a very long list of counts on pixels...and to 
calculate derivatives so as to optimize likelihood.   As you suggest, there 
could be payoff in modeling diffuse scattering.  Also I imagine that the 
structure factors could be estimated more accurately by refining against the 
raw images.  

One question will be whether all this would make a lot of difference with 
today's models. My guess is it won't make a substantial difference in most 
cases because our biggest problem is the inadequacy of these models and not 
deficiencies in our analysis of the data. However there might be some cases 
where it could help.  The bigger question is whether it will make a difference 
in the future when we have more advanced models that have the potential to 
explain the data better. I think that yes, at that point all the effort will be 
worth it.

Tom T

From: Jrh [jrhelliw...@gmail.com]
Sent: Monday, June 24, 2013 12:13 AM
To: Terwilliger, Thomas C
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] ctruncate bug?

Dear Tom,
I find this suggestion of using the full images an excellent and visionary one.
So, how to implement it?
We are part way along the path with James Holton's reverse Mosflm.
The computer memory challenge could be ameliorated by simple pixel averaging at 
least initially.
The diffuse scattering would be the ultimate gold at the end of the rainbow. 
Peter Moore's new book, inter alia, carries many splendid insights into the 
diffuse scattering in our diffraction patterns.
Fullprof analyses have become a firm trend in other fields, admittedly with 
simpler computing overheads.
Greetings,
John

Prof John R Helliwell DSc FInstP



On 21 Jun 2013, at 23:16, "Terwilliger, Thomas C"  wrote:

> I hope I am not duplicating too much of this fascinating discussion with 
> these comments:  perhaps the main reason there is confusion about what to do 
> is that neither F nor I is really the most suitable thing to use in 
> refinement.  As pointed out several times in different ways, we don't measure 
> F or I, we only measure counts on a detector.  As a convenience, we "process" 
> our diffraction images to estimate I or F and their uncertainties and model 
> these uncertainties as simple functions (e.g., a Gaussian).  There is no need 
> in principle to do that, and if we were to refine instead against the raw 
> image data these issues about positivity would disappear and our structures 
> might even be a little better.
>
> Our standard procedure is to estimate F or I from counts on the detector, 
> then to use these estimates of F or I in refinement.  This is not so easy to 
> do right because F or I contain many terms coming from many pixels and it is 
> hard to model their statistics in detail.  Further, attempts we make to 
> estimate either F or I as physically plausible values (e.g., using the fact 
> that they are not negative) will generally be biased (the values after 
> correction will generally be systematically low or systematically high, as is 
> true for the French and Wilson correction and as would be true for the 
> truncation of I at zero or above).
>
> Randy's method for intensity refinement is an improvement because the 
> statistics are treated more fully than just using an estimate of F or I and 
> assuming its uncertainty has a simple distribution.  So why not avoid all the 
> problems with modeling the statistics of processed data and instead refine 
> against the raw data.  From the structural model you calculate F, from F and 
> a detailed model of the experiment (the same model that is currently used in 
> data processing) you calculate the counts expected on each pixel. Then you 
> calculate the likelihood of the data given your models of the structure and 
> of the experiment.  This would have lots of benefits because it would allow 
> improved descriptions of the experiment (decay, absorption, detector 
> sensitivity, diffuse scattering and other "background" on the images,on 
> and on) that could lead to more accurate structures in the end.  Of course 
> there are some minor issues about putting all this in computer memory for 
> refinement
>
> -Tom T
> 
> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Phil 
> [p...@mrc-lmb.cam.ac.uk]
> Sent: Friday, June 21, 2013 2:50 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] ctruncate bug?
>
> However yo

Re: [ccp4bb] ctruncate bug?

2013-06-23 Thread Jrh
Dear Tom,
I find this suggestion of using the full images an excellent and visionary one.
So, how to implement it? 
We are part way along the path with James Holton's reverse Mosflm.
The computer memory challenge could be ameliorated by simple pixel averaging at 
least initially.
The diffuse scattering would be the ultimate gold at the end of the rainbow. 
Peter Moore's new book, inter alia, carries many splendid insights into the 
diffuse scattering in our diffraction patterns.
Fullprof analyses have become a firm trend in other fields, admittedly with 
simpler computing overheads.
Greetings,
John

Prof John R Helliwell DSc FInstP 
 
 

On 21 Jun 2013, at 23:16, "Terwilliger, Thomas C"  wrote:

> I hope I am not duplicating too much of this fascinating discussion with 
> these comments:  perhaps the main reason there is confusion about what to do 
> is that neither F nor I is really the most suitable thing to use in 
> refinement.  As pointed out several times in different ways, we don't measure 
> F or I, we only measure counts on a detector.  As a convenience, we "process" 
> our diffraction images to estimate I or F and their uncertainties and model 
> these uncertainties as simple functions (e.g., a Gaussian).  There is no need 
> in principle to do that, and if we were to refine instead against the raw 
> image data these issues about positivity would disappear and our structures 
> might even be a little better.
> 
> Our standard procedure is to estimate F or I from counts on the detector, 
> then to use these estimates of F or I in refinement.  This is not so easy to 
> do right because F or I contain many terms coming from many pixels and it is 
> hard to model their statistics in detail.  Further, attempts we make to 
> estimate either F or I as physically plausible values (e.g., using the fact 
> that they are not negative) will generally be biased (the values after 
> correction will generally be systematically low or systematically high, as is 
> true for the French and Wilson correction and as would be true for the 
> truncation of I at zero or above).
> 
> Randy's method for intensity refinement is an improvement because the 
> statistics are treated more fully than just using an estimate of F or I and 
> assuming its uncertainty has a simple distribution.  So why not avoid all the 
> problems with modeling the statistics of processed data and instead refine 
> against the raw data.  From the structural model you calculate F, from F and 
> a detailed model of the experiment (the same model that is currently used in 
> data processing) you calculate the counts expected on each pixel. Then you 
> calculate the likelihood of the data given your models of the structure and 
> of the experiment.  This would have lots of benefits because it would allow 
> improved descriptions of the experiment (decay, absorption, detector 
> sensitivity, diffuse scattering and other "background" on the images,on 
> and on) that could lead to more accurate structures in the end.  Of course 
> there are some minor issues about putting all this in computer memory for 
> refinement
> 
> -Tom T
> 
> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Phil 
> [p...@mrc-lmb.cam.ac.uk]
> Sent: Friday, June 21, 2013 2:50 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] ctruncate bug?
> 
> However you decide to argue the point, you must consider _all_ the 
> observations of a reflection (replicates and symmetry related) together when 
> you infer Itrue or F etc, otherwise you will bias the result even more. Thus 
> you cannot (easily) do it during integration
> 
> Phil
> 
> Sent from my iPad
> 
> On 21 Jun 2013, at 20:30, Douglas Theobald  wrote:
> 
>> On Jun 21, 2013, at 2:48 PM, Ed Pozharski  wrote:
>> 
>>> Douglas,
>>>>> Observed intensities are the best estimates that we can come up with in 
>>>>> an experiment.
>>>> I also agree with this, and this is the clincher.  You are arguing that 
>>>> Ispot-Iback=Iobs is the best estimate we can come up with.  I claim that 
>>>> is absurd.  How are you quantifying "best"?  Usually we have some sort of 
>>>> discrepancy measure between true and estimate, like RMSD, mean absolute 
>>>> distance, log distance, or somesuch.  Here is the important point --- by 
>>>> any measure of discrepancy you care to use, the person who estimates Iobs 
>>>> as 0 when Iback>Ispot will *always*, in *every case*, beat the person who 
>>>> estimates Iobs with a negative value.   This is an indisputable fact.
>>> 
>>> First off, you may find it u

Re: [ccp4bb] ctruncate bug?

2013-06-23 Thread Boaz Shaanan
Hi Douglas,

So will you and/or other participants in this fascinating and informative 
thread pick up the glove and implement the suggestions made here? At least 
we'll know if it makes a change to our data. In any case I doubt that it can 
harm.

 Cheers,

          Boaz


Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Douglas Theobald 
[dtheob...@brandeis.edu]
Sent: Sunday, June 23, 2013 1:52 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] ctruncate bug?

On Jun 22, 2013, at 6:18 PM, Frank von Delft  
wrote:

> A fascinating discussion (I've learnt a lot!);  a quick sanity check, though:
>
> In what scenarios would these improved estimates make a significant 
> difference?

Who knows?  I always think that improved estimates are always a good thing, 
ignoring computational complexity (by "improved" I mean making more accurate 
physical assumptions).  This may all be academic --- estimating Itrue with 
unphysical negative values, and then later correcting w/French-Wilson, may give 
approximately the same answers and make no tangible difference in the models.  
But that all seems a bit convoluted, ad hoc, and unnecessary, esp. now with the 
available computational power.  It might make a difference.

> Or rather:  are there any existing programs (as opposed to vapourware) that 
> would benefit significantly?
>
> Cheers
> phx
>
>
>
> On 22/06/2013 18:04, Douglas Theobald wrote:
>> Ian, I really do think we are almost saying the same thing.  Let me try to 
>> clarify.
>>
>> You say that the Gaussian model is not the "correct" data model, and that 
>> the Poisson is correct.  I more-or-less agree.  If I were being pedantic 
>> (me?) I would say that the Poisson is *more* physically realistic than the 
>> Gaussian, and more realistic in a very important and relevant way --- but in 
>> truth the Poisson model does not account for other physical sources of error 
>> that arise from real crystals and real detectors, such as dark noise and 
>> read noise (that's why I would prefer a gamma distribution).  I also agree 
>> that for x>10 the Gaussian is a good approximation to the Poisson.  I 
>> basically agree with every point you make about the Poisson vs the Gaussian, 
>> except for the following.
>>
>> The Iobs=Ispot-Iback equation cannot be derived from a Poisson assumption, 
>> except as an approximation when  Ispot > Iback.  It *can* be derived from 
>> the Gaussian assumption (and in fact I think that is probably the *only* 
>> justification it has).   It is true that the difference between two Poissons 
>> can be negative.  It is also true that for moderate # of counts, the 
>> Gaussian is a good approximation to the Poisson.  But we are trying to 
>> estimate Itrue, and both of those points are irrelevant to estimating Itrue 
>> when Ispot < Iback.  Contrary to your assertion, we are not concerned with 
>> differences of Poissonians, only sums.  Here is why:
>>
>> In the Poisson model you outline, Ispot is the sum of two Poisson variables, 
>> Iback and Iobs.  That means Ispot is also Poisson and can never be negative. 
>>  Again --- the observed data (Ispot) is a *sum*, so that is what we must 
>> deal with.  The likelihood function for this model is:
>>
>> L(a) = (a+b)^k exp(-a-b)
>>
>> where 'k' is the # of counts in Ispot, 'a' is the mean of the Iobs Poisson 
>> (i.e., a = Itrue), and 'b' is the   mean of the Iback Poisson.  Of 
>> course k>=0, and both parameters a>0 and b>0.  Our job is to estimate 'a', 
>> Itrue.  Given the likelihood function above, there is no valid estimate of 
>> 'a' that will give a negative value.  For example, the ML estimate of 'a' is 
>> always non-negative.  Specifically, if we assume 'b' is known from 
>> background extrapolation, the ML estimate of 'a' is:
>>
>> a = k-b   if k>b
>>
>> a = 0   if k<=b
>>
>> You can verify this visually by plotting the likelihood function (vs 'a' as 
>> variable) for any combination of k and b you want.  The SD is a bit more 
>> difficult, but it is approximately (a+b)/sqrt(k), where 'a' is now the ML 
>> estimate of 'a'.
>>
>> Note that the ML estimate of 'a', when k>b (Ispot>Iback), is equivalent to 
>> Ispot-Iback.
>>
>> Now, t

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Ronald E Stenkamp

I agree with Frank.  This thread has been fascinating and educational.  Thanks 
to all.  Ron

On Sat, 22 Jun 2013, Douglas Theobald wrote:


On Jun 22, 2013, at 6:18 PM, Frank von Delft  
wrote:


A fascinating discussion (I've learnt a lot!);  a quick sanity check, though:

In what scenarios would these improved estimates make a significant difference?


Who knows?  I always think that improved estimates are always a good thing, ignoring 
computational complexity (by "improved" I mean making more accurate physical 
assumptions).  This may all be academic --- estimating Itrue with unphysical negative 
values, and then later correcting w/French-Wilson, may give approximately the same 
answers and make no tangible difference in the models.  But that all seems a bit 
convoluted, ad hoc, and unnecessary, esp. now with the available computational power.  It 
might make a difference.


Or rather:  are there any existing programs (as opposed to vapourware) that 
would benefit significantly?

Cheers
phx



On 22/06/2013 18:04, Douglas Theobald wrote:

Ian, I really do think we are almost saying the same thing.  Let me try to 
clarify.

You say that the Gaussian model is not the "correct" data model, and that the 
Poisson is correct.  I more-or-less agree.  If I were being pedantic (me?) I would say that 
the Poisson is *more* physically realistic than the Gaussian, and more realistic in a very 
important and relevant way --- but in truth the Poisson model does not account for other 
physical sources of error that arise from real crystals and real detectors, such as dark 
noise and read noise (that's why I would prefer a gamma distribution).  I also agree that 
for x>10 the Gaussian is a good approximation to the Poisson.  I basically agree with 
every point you make about the Poisson vs the Gaussian, except for the following.

The Iobs=Ispot-Iback equation cannot be derived from a Poisson assumption, except as 
an approximation when  Ispot > Iback.  It *can* be derived from the Gaussian 
assumption (and in fact I think that is probably the *only* justification it has).   
It is true that the difference between two Poissons can be negative.  It is also true 
that for moderate # of counts, the Gaussian is a good approximation to the Poisson.  
But we are trying to estimate Itrue, and both of those points are irrelevant to 
estimating Itrue when Ispot < Iback.  Contrary to your assertion, we are not 
concerned with differences of Poissonians, only sums.  Here is why:

In the Poisson model you outline, Ispot is the sum of two Poisson variables, 
Iback and Iobs.  That means Ispot is also Poisson and can never be negative.  
Again --- the observed data (Ispot) is a *sum*, so that is what we must deal 
with.  The likelihood function for this model is:

L(a) = (a+b)^k exp(-a-b)

where 'k' is the # of counts in Ispot, 'a' is the mean of the Iobs Poisson (i.e., a = 
Itrue), and 'b' is the   mean of the Iback Poisson.  Of course k>=0, and both 
parameters a>0 and b>0.  Our job is to estimate 'a', Itrue.  Given the likelihood 
function above, there is no valid estimate of 'a' that will give a negative value.  For 
example, the ML estimate of 'a' is always non-negative.  Specifically, if we assume 'b' 
is known from background extrapolation, the ML estimate of 'a' is:

a = k-b   if k>b

a = 0   if k<=b

You can verify this visually by plotting the likelihood function (vs 'a' as 
variable) for any combination of k and b you want.  The SD is a bit more 
difficult, but it is approximately (a+b)/sqrt(k), where 'a' is now the ML 
estimate of 'a'.

Note that the ML estimate of 'a', when k>b (Ispot>Iback), is equivalent to 
Ispot-Iback.

Now, to restate:  as an estimate of Itrue, Ispot-Iback cannot be derived from the 
Poisson model.  In contrast, Ispot-Iback *can* be derived from a Gaussian model 
(as the ML and LS estimate of Itrue).  In fact, I'll wager the Gaussian is the 
only reasonable model that gives Ispot-Iback as an estimate of Itrue.  This is why 
I claim that using Ispot-Iback as an estimate of Itrue, even when Ispot wrote:
On 21 June 2013 19:45, Douglas Theobald  wrote:

The current way of doing things is summarized by Ed's equation: 
Ispot-Iback=Iobs.  Here Ispot is the # of counts in the spot (the area 
encompassing the predicted reflection), and Iback is # of counts in the 
background (usu. some area around the spot).  Our job is to estimate the true 
intensity Itrue.  Ed and others argue that Iobs is a reasonable estimate of 
Itrue, but I say it isn't because Itrue can never be negative, whereas Iobs can.

Now where does the Ispot-Iback=Iobs equation come from?  It implicitly assumes 
that both Iobs and Iback come from a Gaussian distribution, in which Iobs and 
Iback can have negative values.  Here's the implicit data model:

Ispot = Iobs + Iback

There is an Itrue, to which we add some Gaussian noise and randomly generate an Iobs.  To 
that is added some background noise, Iback, which is also randomly gen

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Nat Echols
On Sat, Jun 22, 2013 at 3:18 PM, Frank von Delft <
frank.vonde...@sgc.ox.ac.uk> wrote:

>  In what scenarios would these improved estimates make a significant
> difference?
>

Perhaps datasets where a unusually large number of reflections are very
weak, for instance where TNCS is present, or where the intensity falls off
quickly at lower resolution (but remains detectable much further)?

-Nat


Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Douglas Theobald
On Jun 22, 2013, at 6:18 PM, Frank von Delft  
wrote:

> A fascinating discussion (I've learnt a lot!);  a quick sanity check, though: 
> 
> In what scenarios would these improved estimates make a significant 
> difference?  

Who knows?  I always think that improved estimates are always a good thing, 
ignoring computational complexity (by "improved" I mean making more accurate 
physical assumptions).  This may all be academic --- estimating Itrue with 
unphysical negative values, and then later correcting w/French-Wilson, may give 
approximately the same answers and make no tangible difference in the models.  
But that all seems a bit convoluted, ad hoc, and unnecessary, esp. now with the 
available computational power.  It might make a difference.  

> Or rather:  are there any existing programs (as opposed to vapourware) that 
> would benefit significantly?
> 
> Cheers
> phx
> 
> 
> 
> On 22/06/2013 18:04, Douglas Theobald wrote:
>> Ian, I really do think we are almost saying the same thing.  Let me try to 
>> clarify.
>> 
>> You say that the Gaussian model is not the "correct" data model, and that 
>> the Poisson is correct.  I more-or-less agree.  If I were being pedantic 
>> (me?) I would say that the Poisson is *more* physically realistic than the 
>> Gaussian, and more realistic in a very important and relevant way --- but in 
>> truth the Poisson model does not account for other physical sources of error 
>> that arise from real crystals and real detectors, such as dark noise and 
>> read noise (that's why I would prefer a gamma distribution).  I also agree 
>> that for x>10 the Gaussian is a good approximation to the Poisson.  I 
>> basically agree with every point you make about the Poisson vs the Gaussian, 
>> except for the following.
>> 
>> The Iobs=Ispot-Iback equation cannot be derived from a Poisson assumption, 
>> except as an approximation when  Ispot > Iback.  It *can* be derived from 
>> the Gaussian assumption (and in fact I think that is probably the *only* 
>> justification it has).   It is true that the difference between two Poissons 
>> can be negative.  It is also true that for moderate # of counts, the 
>> Gaussian is a good approximation to the Poisson.  But we are trying to 
>> estimate Itrue, and both of those points are irrelevant to estimating Itrue 
>> when Ispot < Iback.  Contrary to your assertion, we are not concerned with 
>> differences of Poissonians, only sums.  Here is why:
>> 
>> In the Poisson model you outline, Ispot is the sum of two Poisson variables, 
>> Iback and Iobs.  That means Ispot is also Poisson and can never be negative. 
>>  Again --- the observed data (Ispot) is a *sum*, so that is what we must 
>> deal with.  The likelihood function for this model is:
>> 
>> L(a) = (a+b)^k exp(-a-b)
>> 
>> where 'k' is the # of counts in Ispot, 'a' is the mean of the Iobs Poisson 
>> (i.e., a = Itrue), and 'b' is the   mean of the Iback Poisson.  Of 
>> course k>=0, and both parameters a>0 and b>0.  Our job is to estimate 'a', 
>> Itrue.  Given the likelihood function above, there is no valid estimate of 
>> 'a' that will give a negative value.  For example, the ML estimate of 'a' is 
>> always non-negative.  Specifically, if we assume 'b' is known from 
>> background extrapolation, the ML estimate of 'a' is:
>> 
>> a = k-b   if k>b
>> 
>> a = 0   if k<=b
>> 
>> You can verify this visually by plotting the likelihood function (vs 'a' as 
>> variable) for any combination of k and b you want.  The SD is a bit more 
>> difficult, but it is approximately (a+b)/sqrt(k), where 'a' is now the ML 
>> estimate of 'a'.  
>> 
>> Note that the ML estimate of 'a', when k>b (Ispot>Iback), is equivalent to 
>> Ispot-Iback.  
>> 
>> Now, to restate:  as an estimate of Itrue, Ispot-Iback cannot be derived 
>> from the Poisson model.  In contrast, Ispot-Iback *can* be derived from a 
>> Gaussian model (as the ML and LS estimate of Itrue).  In fact, I'll wager 
>> the Gaussian is the only reasonable model that gives Ispot-Iback as an 
>> estimate of Itrue.  This is why I claim that using Ispot-Iback as an 
>> estimate of Itrue, even when Ispot> (non-physical) Gaussian model.  Feel free to prove me wrong --- can you 
>> derive Ispot-Iback, as an estimate of Itrue, from anything besides a 
>> Gaussian?
>> 
>> Cheers,
>> 
>> Douglas
>> 
>> 
>> 
>> 
>> On Sat, Jun 22, 2013 at 12:06 PM, Ian Tickle  wrote:
>> On 21 June 2013 19:45, Douglas Theobald  wrote:
>> 
>> The current way of doing things is summarized by Ed's equation: 
>> Ispot-Iback=Iobs.  Here Ispot is the # of counts in the spot (the area 
>> encompassing the predicted reflection), and Iback is # of counts in the 
>> background (usu. some area around the spot).  Our job is to estimate the 
>> true intensity Itrue.  Ed and others argue that Iobs is a reasonable 
>> estimate of Itrue, but I say it isn't because Itrue can never be negative, 
>> whereas Iobs can.
>> 
>> Now where does the Ispot-Iback=Iobs equation come

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Frank von Delft
A fascinating discussion (I've learnt a lot!);  a quick sanity check, 
though:


In what scenarios would these improved estimates make a significant 
difference?


Or rather:  are there any existing programs (as opposed to vapourware) 
that would benefit significantly?


Cheers
phx



On 22/06/2013 18:04, Douglas Theobald wrote:
Ian, I really do think we are almost saying the same thing.  Let me 
try to clarify.


You say that the Gaussian model is not the "correct" data model, and 
that the Poisson is correct.  I more-or-less agree.  If I were being 
pedantic (me?) I would say that the Poisson is *more* physically 
realistic than the Gaussian, and more realistic in a very important 
and relevant way --- but in truth the Poisson model does not account 
for other physical sources of error that arise from real crystals and 
real detectors, such as dark noise and read noise (that's why I would 
prefer a gamma distribution).  I also agree that for x>10 the Gaussian 
is a good approximation to the Poisson.  I basically agree with every 
point you make about the Poisson vs the Gaussian, except for the 
following.


The Iobs=Ispot-Iback equation cannot be derived from a Poisson 
assumption, except as an approximation when Ispot > Iback.  It *can* 
be derived from the Gaussian assumption (and in fact I think that is 
probably the *only* justification it has).   It is true that the 
difference between two Poissons can be negative.  It is also true that 
for moderate # of counts, the Gaussian is a good approximation to the 
Poisson.  But we are trying to estimate Itrue, and both of those 
points are irrelevant to estimating Itrue when Ispot < Iback. 
 Contrary to your assertion, we are not concerned with differences of 
Poissonians, only sums.  Here is why:


In the Poisson model you outline, Ispot is the sum of two Poisson 
variables, Iback and Iobs.  That means Ispot is also Poisson and can 
never be negative.  Again --- the observed data (Ispot) is a *sum*, so 
that is what we must deal with.  The likelihood function for this 
model is:


L(a) = (a+b)^k exp(-a-b)

where 'k' is the # of counts in Ispot, 'a' is the mean of the Iobs 
Poisson (i.e., a = Itrue), and 'b' is the mean of the Iback Poisson. 
 Of course k>=0, and both parameters a>0 and b>0.  Our job is to 
estimate 'a', Itrue.  Given the likelihood function above, there is no 
valid estimate of 'a' that will give a negative value.  For example, 
the ML estimate of 'a' is always non-negative.  Specifically, if we 
assume 'b' is known from background extrapolation, the ML estimate of 
'a' is:


a = k-b   if k>b

a = 0   if k<=b

You can verify this visually by plotting the likelihood function (vs 
'a' as variable) for any combination of k and b you want.  The SD is a 
bit more difficult, but it is approximately (a+b)/sqrt(k), where 'a' 
is now the ML estimate of 'a'.


Note that the ML estimate of 'a', when k>b (Ispot>Iback), is 
equivalent to Ispot-Iback.


Now, to restate:  as an estimate of Itrue, Ispot-Iback cannot be 
derived from the Poisson model.  In contrast, Ispot-Iback *can* be 
derived from a Gaussian model (as the ML and LS estimate of Itrue). 
 In fact, I'll wager the Gaussian is the only reasonable model that 
gives Ispot-Iback as an estimate of Itrue.  This is why I claim that 
using Ispot-Iback as an estimate of Itrue, even when Ispotimplicitly means you are using a (non-physical) Gaussian model.  Feel 
free to prove me wrong --- can you derive Ispot-Iback, as an estimate 
of Itrue, from anything besides a Gaussian?


Cheers,

Douglas




On Sat, Jun 22, 2013 at 12:06 PM, Ian Tickle > wrote:


On 21 June 2013 19:45, Douglas Theobald mailto:dtheob...@brandeis.edu>> wrote:


The current way of doing things is summarized by Ed's
equation: Ispot-Iback=Iobs.  Here Ispot is the # of counts in
the spot (the area encompassing the predicted reflection), and
Iback is # of counts in the background (usu. some area around
the spot).  Our job is to estimate the true intensity Itrue.
 Ed and others argue that Iobs is a reasonable estimate of
Itrue, but I say it isn't because Itrue can never be negative,
whereas Iobs can.

Now where does the Ispot-Iback=Iobs equation come from?  It
implicitly assumes that both Iobs and Iback come from a
Gaussian distribution, in which Iobs and Iback can have
negative values.  Here's the implicit data model:

Ispot = Iobs + Iback

There is an Itrue, to which we add some Gaussian noise and
randomly generate an Iobs.  To that is added some background
noise, Iback, which is also randomly generated from a Gaussian
with a "true" mean of Ibtrue.  This gives us the Ispot, the
measured intensity in our spot.  Given this data model, Ispot
will also have a Gaussian distribution, with mean equal to the
sum of Itrue + Ibtrue.  From the properties of

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Douglas Theobald
On Sat, Jun 22, 2013 at 1:56 PM, Ian Tickle  wrote:

> On 22 June 2013 18:04, Douglas Theobald  wrote:
>
>>  --- but in truth the Poisson model does not account for other physical
>> sources of error that arise from real crystals and real detectors, such as
>> dark noise and read noise (that's why I would prefer a gamma distribution).
>>
>
> A photon counter is a digital device, not an analogue one.  It starts at
> zero and adds 1 every time it detects a photon (or what it thinks is a
> photon).  Once added, it is physically impossible for it to subtract 1 from
> its accumulated count: it contains no circuit to do that.  It can certainly
> miss photons, so you end up with less than you should, and it can certainly
> 'see' photons where there were none (e.g. from instrumental noise), so you
> end up with more than you should.  However once a count has been
> accumulated in the digital memory it stays there until the memory is
> cleared for the next measurement, and you can never end up with less than
> that accumulated count and in particular not less than zero; the bits of
> memory where the counts are accumulated are simply not programmed to return
> negative numbers.  It has nothing to do with whether the crystal is real or
> not, all that matters is that photons from "somewhere" are arriving at and
> being counted by the detector.  The accumulated counts at any moment in
> time have a Poisson distribution since the photons arrive completely
> randomly in time.
>

I might add that if you are correct --- that the naive Poisson model is
appropriate (perhaps true for the latest and greatest detectors, evidently
Pilatus has no read-out noise or dark current) --- then the ML solution I
outlined is a good one (much better than the crude Ispot-Iback background
subtraction), and it provides rigorous SD estimates too.


Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Ian Tickle
On 22 June 2013 18:04, Douglas Theobald  wrote:

> Ian, I really do think we are almost saying the same thing.  Let me try to
> clarify.
>

I agree, but still only "almost"!


>  --- but in truth the Poisson model does not account for other physical
> sources of error that arise from real crystals and real detectors, such as
> dark noise and read noise (that's why I would prefer a gamma distribution).
>

A photon counter is a digital device, not an analogue one.  It starts at
zero and adds 1 every time it detects a photon (or what it thinks is a
photon).  Once added, it is physically impossible for it to subtract 1 from
its accumulated count: it contains no circuit to do that.  It can certainly
miss photons, so you end up with less than you should, and it can certainly
'see' photons where there were none (e.g. from instrumental noise), so you
end up with more than you should.  However once a count has been
accumulated in the digital memory it stays there until the memory is
cleared for the next measurement, and you can never end up with less than
that accumulated count and in particular not less than zero; the bits of
memory where the counts are accumulated are simply not programmed to return
negative numbers.  It has nothing to do with whether the crystal is real or
not, all that matters is that photons from "somewhere" are arriving at and
being counted by the detector.  The accumulated counts at any moment in
time have a Poisson distribution since the photons arrive completely
randomly in time.


> In the Poisson model you outline, Ispot is the sum of two Poisson
> variables, Iback and Iobs.  That means Ispot is also Poisson and can never
> be negative.  Again --- the observed data (Ispot) is a *sum*, so that is
> what we must deal with.  The likelihood function for this model is:
>
> No, Iobs is _not_ a Poisson variable, indeed I never said it was: I
explained that it's the difference of 2 Poissonians Ispot and Iback and
therefore approximately Gaussian (please re-read my previous email).  So
the sum of Poissonians does not come into it.  The only Poissonian variates
here are Ispot and Iback.  Neither is the background under Ispot a
Poissonian (let's call it Iback', so strictly speaking Ispot = Iobs +
Iback' and Iback is an estimate of Iback', quite possibly with a non-random
error).  This is because Iobs and Iback' are not observable photon counts.
QM does not allow you to separate Ispot into separate photon counts,
because photons are indistinguishable.  If the photons were labelled
'spot', 'back' and 'obs' then you could count Iobs independently and it
would be a Poissonian (and that would indeed solve all our problems!).
But, sadly, photons are indistinguishable, they don't arrive with handy
labels!

Does any of that change your view?

Cheers

-- Ian


Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Douglas Theobald
On Sat, Jun 22, 2013 at 1:04 PM, Douglas Theobald wrote:

> Feel free to prove me wrong --- can you derive Ispot-Iback, as an estimate
> of Itrue, from anything besides a Gaussian?
>

OK, I'll prove myself wrong.   Ispot-Iback can be derived as an estimate of
Itrue, even when Ispot

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Douglas Theobald
Ian, I really do think we are almost saying the same thing.  Let me try to
clarify.

You say that the Gaussian model is not the "correct" data model, and that
the Poisson is correct.  I more-or-less agree.  If I were being pedantic
(me?) I would say that the Poisson is *more* physically realistic than the
Gaussian, and more realistic in a very important and relevant way --- but
in truth the Poisson model does not account for other physical sources of
error that arise from real crystals and real detectors, such as dark noise
and read noise (that's why I would prefer a gamma distribution).  I also
agree that for x>10 the Gaussian is a good approximation to the Poisson.  I
basically agree with every point you make about the Poisson vs the
Gaussian, except for the following.

The Iobs=Ispot-Iback equation cannot be derived from a Poisson assumption,
except as an approximation when  Ispot > Iback.  It *can* be derived from
the Gaussian assumption (and in fact I think that is probably the *only*
justification it has).   It is true that the difference between two
Poissons can be negative.  It is also true that for moderate # of counts,
the Gaussian is a good approximation to the Poisson.  But we are trying to
estimate Itrue, and both of those points are irrelevant to estimating Itrue
when Ispot < Iback.  Contrary to your assertion, we are not concerned with
differences of Poissonians, only sums.  Here is why:

In the Poisson model you outline, Ispot is the sum of two Poisson
variables, Iback and Iobs.  That means Ispot is also Poisson and can never
be negative.  Again --- the observed data (Ispot) is a *sum*, so that is
what we must deal with.  The likelihood function for this model is:

L(a) = (a+b)^k exp(-a-b)

where 'k' is the # of counts in Ispot, 'a' is the mean of the Iobs Poisson
(i.e., a = Itrue), and 'b' is the mean of the Iback Poisson.  Of course
k>=0, and both parameters a>0 and b>0.  Our job is to estimate 'a', Itrue.
 Given the likelihood function above, there is no valid estimate of 'a'
that will give a negative value.  For example, the ML estimate of 'a' is
always non-negative.  Specifically, if we assume 'b' is known from
background extrapolation, the ML estimate of 'a' is:

a = k-b   if k>b

a = 0   if k<=b

You can verify this visually by plotting the likelihood function (vs 'a' as
variable) for any combination of k and b you want.  The SD is a bit more
difficult, but it is approximately (a+b)/sqrt(k), where 'a' is now the ML
estimate of 'a'.

Note that the ML estimate of 'a', when k>b (Ispot>Iback), is equivalent to
Ispot-Iback.

Now, to restate:  as an estimate of Itrue, Ispot-Iback cannot be derived
from the Poisson model.  In contrast, Ispot-Iback *can* be derived from a
Gaussian model (as the ML and LS estimate of Itrue).  In fact, I'll wager
the Gaussian is the only reasonable model that gives Ispot-Iback as an
estimate of Itrue.  This is why I claim that using Ispot-Iback as an
estimate of Itrue, even when Ispot wrote:

> On 21 June 2013 19:45, Douglas Theobald  wrote:
>
>>
>> The current way of doing things is summarized by Ed's equation:
>> Ispot-Iback=Iobs.  Here Ispot is the # of counts in the spot (the area
>> encompassing the predicted reflection), and Iback is # of counts in the
>> background (usu. some area around the spot).  Our job is to estimate the
>> true intensity Itrue.  Ed and others argue that Iobs is a reasonable
>> estimate of Itrue, but I say it isn't because Itrue can never be negative,
>> whereas Iobs can.
>>
>> Now where does the Ispot-Iback=Iobs equation come from?  It implicitly
>> assumes that both Iobs and Iback come from a Gaussian distribution, in
>> which Iobs and Iback can have negative values.  Here's the implicit data
>> model:
>>
>> Ispot = Iobs + Iback
>>
>> There is an Itrue, to which we add some Gaussian noise and randomly
>> generate an Iobs.  To that is added some background noise, Iback, which is
>> also randomly generated from a Gaussian with a "true" mean of Ibtrue.  This
>> gives us the Ispot, the measured intensity in our spot.  Given this data
>> model, Ispot will also have a Gaussian distribution, with mean equal to the
>> sum of Itrue + Ibtrue.  From the properties of Gaussians, then, the ML
>> estimate of Itrue will be Ispot-Iback, or Iobs.
>>
>
> Douglas, sorry I still disagree with your model.  Please note that I do
> actually support your position, that Ispot-Iback is not the best estimate
> of Itrue.  I stress that I am not arguing against this conclusion, merely
> (!) with your data model, i.e. you are arriving at the correct conclusion
> despite using the wrong model!  So I think it's worth clearing that up.
>
> First off, I can assure you that there is no assumption, either implicit
> or explicit, that Ispot and Iback come from a Gaussian distribution.  They
> are both essentially measured photon counts (perhaps indirectly), so it is
> logically impossible that they could ever be negative, even with any
> experimental error you can i

Re: [ccp4bb] ctruncate bug?

2013-06-22 Thread Ian Tickle
On 21 June 2013 19:45, Douglas Theobald  wrote:

>
> The current way of doing things is summarized by Ed's equation:
> Ispot-Iback=Iobs.  Here Ispot is the # of counts in the spot (the area
> encompassing the predicted reflection), and Iback is # of counts in the
> background (usu. some area around the spot).  Our job is to estimate the
> true intensity Itrue.  Ed and others argue that Iobs is a reasonable
> estimate of Itrue, but I say it isn't because Itrue can never be negative,
> whereas Iobs can.
>
> Now where does the Ispot-Iback=Iobs equation come from?  It implicitly
> assumes that both Iobs and Iback come from a Gaussian distribution, in
> which Iobs and Iback can have negative values.  Here's the implicit data
> model:
>
> Ispot = Iobs + Iback
>
> There is an Itrue, to which we add some Gaussian noise and randomly
> generate an Iobs.  To that is added some background noise, Iback, which is
> also randomly generated from a Gaussian with a "true" mean of Ibtrue.  This
> gives us the Ispot, the measured intensity in our spot.  Given this data
> model, Ispot will also have a Gaussian distribution, with mean equal to the
> sum of Itrue + Ibtrue.  From the properties of Gaussians, then, the ML
> estimate of Itrue will be Ispot-Iback, or Iobs.
>

Douglas, sorry I still disagree with your model.  Please note that I do
actually support your position, that Ispot-Iback is not the best estimate
of Itrue.  I stress that I am not arguing against this conclusion, merely
(!) with your data model, i.e. you are arriving at the correct conclusion
despite using the wrong model!  So I think it's worth clearing that up.

First off, I can assure you that there is no assumption, either implicit or
explicit, that Ispot and Iback come from a Gaussian distribution.  They are
both essentially measured photon counts (perhaps indirectly), so it is
logically impossible that they could ever be negative, even with any
experimental error you can imagine.  The concept of a photon counter
counting a negative number of photons is simply a logical impossibility (it
would be like counting the coins in your pocket and coming up with a
negative number, even allowing for mistakes in counting!).  This
immediately rules out the idea that they are Gaussian.  Photon counting
where the photons appear completely randomly in time (essentially as a
consequence of the Heisenberg Uncertainly Principle) obeys a Poisson
distribution.  In fact we routinely estimate the standard uncertainties of
Ispot & Iback on the basis that they are Poissonian, i.e. using var(count)
= count.  That is hardly a Gaussian assumption for the uncertainty!

Here is the correct data model: there is a true Ispot which is (or is
proportional to) the diffracted energy from the _sum_ of the Bragg
diffraction spot and the background under the spot (this is not the same as
Iback).  This energy ends up as individual photons being counted at the
detector (I know there's a complication that some detectors are not
actually photon counters, but the result is the same: you end up with a
photon count, or something proportional to it).  However photons are
indistinguishable (they do not carry labels telling us where they came
from), so quantum mechanics doesn't even allow us to talk about photons
coming from different places: all we see are indistinguishable photons
arriving at the detector and literally being counted.  Therefore the
estimated Ispot being the total number of photons counted from Bragg +
background has a Poisson distribution.  There will be some experimental
error associated with the random-in-time appearance of photons and also
instrumental errors (e.g we might simply fail to count some of the photons,
or we might count extra photons coming from somewhere else), but whatever
the source of the error there is no way that the measured count of photons
can ever be negative.

Now obviously we want to estimate the background under the spot but we
can't do that by looking at the spot itself (because the photons are
indistinguishable).  So completely independently of the Ispot measurement
we look at a nearby representative (hopefully!) area where there are no
Bragg spots and count that also: there is a true Iback associated with this
and our estimate of it from counting photons.  Again, being a photon count
it is also Poissonian and will have some experimental error associated with
it, but regardless of what the error is Iback, like Ispot, can never be
negative.

Now we have two Poissonian variables Ispot & Iback and traditionally we
perform the calculation Iobs = Ispot - Iback (whatever meaning you want to
attach to Iobs).  Provided Ispot and Iback are 'sufficiently' large numbers
a Poisson distribution can be approximated by a Gaussian with the same mean
and standard deviation, but with the proviso that the variate of this
approximate Gaussian can never be negative.  In fact you only need about 10
counts or more in _both_ Ispot and Iback for the approximation to be pretty
goo

Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Terwilliger, Thomas C
I hope I am not duplicating too much of this fascinating discussion with these 
comments:  perhaps the main reason there is confusion about what to do is that 
neither F nor I is really the most suitable thing to use in refinement.  As 
pointed out several times in different ways, we don't measure F or I, we only 
measure counts on a detector.  As a convenience, we "process" our diffraction 
images to estimate I or F and their uncertainties and model these uncertainties 
as simple functions (e.g., a Gaussian).  There is no need in principle to do 
that, and if we were to refine instead against the raw image data these issues 
about positivity would disappear and our structures might even be a little 
better.

Our standard procedure is to estimate F or I from counts on the detector, then 
to use these estimates of F or I in refinement.  This is not so easy to do 
right because F or I contain many terms coming from many pixels and it is hard 
to model their statistics in detail.  Further, attempts we make to estimate 
either F or I as physically plausible values (e.g., using the fact that they 
are not negative) will generally be biased (the values after correction will 
generally be systematically low or systematically high, as is true for the 
French and Wilson correction and as would be true for the truncation of I at 
zero or above).

Randy's method for intensity refinement is an improvement because the 
statistics are treated more fully than just using an estimate of F or I and 
assuming its uncertainty has a simple distribution.  So why not avoid all the 
problems with modeling the statistics of processed data and instead refine 
against the raw data.  From the structural model you calculate F, from F and a 
detailed model of the experiment (the same model that is currently used in data 
processing) you calculate the counts expected on each pixel. Then you calculate 
the likelihood of the data given your models of the structure and of the 
experiment.  This would have lots of benefits because it would allow improved 
descriptions of the experiment (decay, absorption, detector sensitivity, 
diffuse scattering and other "background" on the images,on and on) that 
could lead to more accurate structures in the end.  Of course there are some 
minor issues about putting all this in computer memory for refinement

-Tom T

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Phil 
[p...@mrc-lmb.cam.ac.uk]
Sent: Friday, June 21, 2013 2:50 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] ctruncate bug?

However you decide to argue the point, you must consider _all_ the observations 
of a reflection (replicates and symmetry related) together when you infer Itrue 
or F etc, otherwise you will bias the result even more. Thus you cannot 
(easily) do it during integration

Phil

Sent from my iPad

On 21 Jun 2013, at 20:30, Douglas Theobald  wrote:

> On Jun 21, 2013, at 2:48 PM, Ed Pozharski  wrote:
>
>> Douglas,
>>>> Observed intensities are the best estimates that we can come up with in an 
>>>> experiment.
>>> I also agree with this, and this is the clincher.  You are arguing that 
>>> Ispot-Iback=Iobs is the best estimate we can come up with.  I claim that is 
>>> absurd.  How are you quantifying "best"?  Usually we have some sort of 
>>> discrepancy measure between true and estimate, like RMSD, mean absolute 
>>> distance, log distance, or somesuch.  Here is the important point --- by 
>>> any measure of discrepancy you care to use, the person who estimates Iobs 
>>> as 0 when Iback>Ispot will *always*, in *every case*, beat the person who 
>>> estimates Iobs with a negative value.   This is an indisputable fact.
>>
>> First off, you may find it useful to avoid such words as absurd and 
>> indisputable fact.  I know political correctness may be sometimes overrated, 
>> but if you actually plan to have meaningful discussion, let's assume that 
>> everyone responding to your posts is just trying to help figure this out.
>
> I apologize for offending and using the strong words --- my intention was not 
> to offend.  This is just how I talk when brainstorming with my colleagues 
> around a blackboard, but of course then you can see that I smile when I say 
> it.
>
>> To address your point, you are right that J=0 is closer to "true intensity" 
>> then a negative value.  The problem is that we are not after a single 
>> intensity, but rather all of them, as they all contribute to electron 
>> density reconstruction.  If you replace negative Iobs with E(J), you would 
>> systematically inflate the averages, which may turn problematic in some 
>> cases.
>
> So, I get the point.  But even then, using any reas

Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Phil
However you decide to argue the point, you must consider _all_ the observations 
of a reflection (replicates and symmetry related) together when you infer Itrue 
or F etc, otherwise you will bias the result even more. Thus you cannot 
(easily) do it during integration

Phil

Sent from my iPad

On 21 Jun 2013, at 20:30, Douglas Theobald  wrote:

> On Jun 21, 2013, at 2:48 PM, Ed Pozharski  wrote:
> 
>> Douglas,
 Observed intensities are the best estimates that we can come up with in an 
 experiment.
>>> I also agree with this, and this is the clincher.  You are arguing that 
>>> Ispot-Iback=Iobs is the best estimate we can come up with.  I claim that is 
>>> absurd.  How are you quantifying "best"?  Usually we have some sort of 
>>> discrepancy measure between true and estimate, like RMSD, mean absolute 
>>> distance, log distance, or somesuch.  Here is the important point --- by 
>>> any measure of discrepancy you care to use, the person who estimates Iobs 
>>> as 0 when Iback>Ispot will *always*, in *every case*, beat the person who 
>>> estimates Iobs with a negative value.   This is an indisputable fact.
>> 
>> First off, you may find it useful to avoid such words as absurd and 
>> indisputable fact.  I know political correctness may be sometimes overrated, 
>> but if you actually plan to have meaningful discussion, let's assume that 
>> everyone responding to your posts is just trying to help figure this out.
> 
> I apologize for offending and using the strong words --- my intention was not 
> to offend.  This is just how I talk when brainstorming with my colleagues 
> around a blackboard, but of course then you can see that I smile when I say 
> it.  
> 
>> To address your point, you are right that J=0 is closer to "true intensity" 
>> then a negative value.  The problem is that we are not after a single 
>> intensity, but rather all of them, as they all contribute to electron 
>> density reconstruction.  If you replace negative Iobs with E(J), you would 
>> systematically inflate the averages, which may turn problematic in some 
>> cases.  
> 
> So, I get the point.  But even then, using any reasonable criterion, the 
> whole estimated dataset will be closer to the true data if you set all 
> "negative" intensity estimates to 0.  
> 
>> It is probably better to stick with "raw intensities" and construct 
>> theoretical predictions properly to account for their properties.
>> 
>> What I was trying to tell you is that observed intensities is what we get 
>> from experiment.  
> 
> But they are not what you get from the detector.  The detector spits out a 
> positive value for what's inside the spot.  It is we, as human agents, who 
> later manipulate and massage that data value by subtracting the background 
> estimate.  A value that has been subjected to a crude background subtraction 
> is not the raw experimental value.  It has been modified, and there must be 
> some logic to why we massage the data in that particular manner.  I agree, of 
> course, that the background should be accounted for somehow.  But why just 
> subtract it away?  There are other ways to massage the data --- see my other 
> post to Ian.  My argument is that however we massage the experimentally 
> observed value should be physically informed, and allowing negative intensity 
> estimates violates the basic physics.  
> 
> [snip]
> 
 These observed intensities can be negative because while their true 
 underlying value is positive, random errorsmay result in Iback>Ispot.  
 There is absolutely nothing unphysical here.
>>> Yes there is.  The only way you can get a negative estimate is to make 
>>> unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that 
>>> both the true value of I and the background noise come from a Gaussian 
>>> distribution that is allowed to have negative values.  Both of those 
>>> assumptions are unphysical.
>> 
>> See, I have a problem with this.  Both common sense and laws of physics 
>> dictate that number of photons hitting spot on a detector is a positive 
>> number.  There is no law of physics that dictates that under no 
>> circumstances there could be Ispot 
> That's not what I'm saying.  Sure, Ispot can be less than Iback randomly.  
> That does not mean we have to estimate the detected intensity as negative, 
> after accounting for background.
> 
>> Yes, E(Ispot)>=E(Iback).  Yes, E(Ispot-Iback)>=0.  But P(Ispot-Iback=0)>0, 
>> and therefore experimental sampling of Ispot-Iback is bound to occasionally 
>> produce negative values.  What law of physics is broken when for a given 
>> reflection total number of photons in spot pixels is less that total number 
>> of photons in equal number of pixels in the surrounding background mask?
>> 
>> Cheers,
>> 
>> Ed.
>> 
>> -- 
>> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>>   Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Douglas Theobald
On Jun 21, 2013, at 2:52 PM, James Holton  wrote:

> Yes, but the DIFFERENCE between two Poisson-distributed values can be 
> negative.  This is, unfortunately, what you get when you subtract the 
> background out from under a spot.  Perhaps this is the source of confusion 
> here?

Maybe, but if you assume Poisson background and intensities, the ML estimate 
when background > measured intensity is not negative, nor is it the difference 
Ispot-Iback.  The ML estimate is 0.  (With a finite non-zero SD, smaller SD the 
smaller the Ispot/Iback ratio).

> On Fri, Jun 21, 2013 at 11:34 AM, Douglas Theobald  
> wrote:
> I kinda think we're saying the same thing, sort of.
> 
> You don't like the Gaussian assumption, and neither do I.  If you make the 
> reasonable Poisson assumptions, then you don't get the Ispot-Iback=Iobs for 
> the best estimate of Itrue.  Except as an approximation for large values, but 
> we are talking about the case when Iback>Ispot, where the Gaussian 
> approximation to the Poisson no longer holds.  The sum of two Poisson 
> variates is also Poisson, which also can never be negative, unlike the 
> Gaussian.
> 
> So I reiterate: the Ispot-Iback=Iobs equation assumes Gaussians and hence 
> negativity.  The Ispot-Iback=Iobs does not follow from a Poisson assumption.
> 
> 
> On Jun 21, 2013, at 1:13 PM, Ian Tickle  wrote:
> 
> > On 21 June 2013 17:10, Douglas Theobald  wrote:
> >> Yes there is.  The only way you can get a negative estimate is to make 
> >> unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes 
> >> that both the true value of I and the background noise come from a 
> >> Gaussian distribution that is allowed to have negative values.  Both of 
> >> those assumptions are unphysical.
> >
> > Actually that's not correct: Ispot and Iback are both assumed to come from 
> > a _Poisson_ distribution which by definition is zero for negative values of 
> > its argument (you can't have a negative number of photons), so are _not_ 
> > allowed to have negative values.  For large values of the argument (in fact 
> > the approximation is pretty good even for x ~ 10) a Poisson approximates to 
> > a Gaussian, and then of course the difference Ispot-Iback is also 
> > approximately Gaussian.
> >
> > But I think that doesn't affect your argument.
> >
> > Cheers
> >
> > -- Ian
> 


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Douglas Theobald
On Jun 21, 2013, at 2:48 PM, Ed Pozharski  wrote:

> Douglas,
>>> Observed intensities are the best estimates that we can come up with in an 
>>> experiment.
>> I also agree with this, and this is the clincher.  You are arguing that 
>> Ispot-Iback=Iobs is the best estimate we can come up with.  I claim that is 
>> absurd.  How are you quantifying "best"?  Usually we have some sort of 
>> discrepancy measure between true and estimate, like RMSD, mean absolute 
>> distance, log distance, or somesuch.  Here is the important point --- by any 
>> measure of discrepancy you care to use, the person who estimates Iobs as 0 
>> when Iback>Ispot will *always*, in *every case*, beat the person who 
>> estimates Iobs with a negative value.   This is an indisputable fact.
> 
> First off, you may find it useful to avoid such words as absurd and 
> indisputable fact.  I know political correctness may be sometimes overrated, 
> but if you actually plan to have meaningful discussion, let's assume that 
> everyone responding to your posts is just trying to help figure this out.

I apologize for offending and using the strong words --- my intention was not 
to offend.  This is just how I talk when brainstorming with my colleagues 
around a blackboard, but of course then you can see that I smile when I say it. 
 

> To address your point, you are right that J=0 is closer to "true intensity" 
> then a negative value.  The problem is that we are not after a single 
> intensity, but rather all of them, as they all contribute to electron density 
> reconstruction.  If you replace negative Iobs with E(J), you would 
> systematically inflate the averages, which may turn problematic in some 
> cases.  

So, I get the point.  But even then, using any reasonable criterion, the whole 
estimated dataset will be closer to the true data if you set all "negative" 
intensity estimates to 0.  

> It is probably better to stick with "raw intensities" and construct 
> theoretical predictions properly to account for their properties.
> 
> What I was trying to tell you is that observed intensities is what we get 
> from experiment.  

But they are not what you get from the detector.  The detector spits out a 
positive value for what's inside the spot.  It is we, as human agents, who 
later manipulate and massage that data value by subtracting the background 
estimate.  A value that has been subjected to a crude background subtraction is 
not the raw experimental value.  It has been modified, and there must be some 
logic to why we massage the data in that particular manner.  I agree, of 
course, that the background should be accounted for somehow.  But why just 
subtract it away?  There are other ways to massage the data --- see my other 
post to Ian.  My argument is that however we massage the experimentally 
observed value should be physically informed, and allowing negative intensity 
estimates violates the basic physics.  

[snip]

>>> These observed intensities can be negative because while their true 
>>> underlying value is positive, random errorsmay result in Iback>Ispot.  
>>> There is absolutely nothing unphysical here.
>> Yes there is.  The only way you can get a negative estimate is to make 
>> unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that 
>> both the true value of I and the background noise come from a Gaussian 
>> distribution that is allowed to have negative values.  Both of those 
>> assumptions are unphysical.
> 
> See, I have a problem with this.  Both common sense and laws of physics 
> dictate that number of photons hitting spot on a detector is a positive 
> number.  There is no law of physics that dictates that under no circumstances 
> there could be Ispot Yes, E(Ispot)>=E(Iback).  Yes, E(Ispot-Iback)>=0.  But P(Ispot-Iback=0)>0, 
> and therefore experimental sampling of Ispot-Iback is bound to occasionally 
> produce negative values.  What law of physics is broken when for a given 
> reflection total number of photons in spot pixels is less that total number 
> of photons in equal number of pixels in the surrounding background mask?
> 
> Cheers,
> 
> Ed.
> 
> -- 
> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Ed Pozharski

Douglas,

Observed intensities are the best estimates that we can come up with in an 
experiment.

I also agree with this, and this is the clincher.  You are arguing that Ispot-Iback=Iobs is 
the best estimate we can come up with.  I claim that is absurd.  How are you quantifying 
"best"?  Usually we have some sort of discrepancy measure between true and 
estimate, like RMSD, mean absolute distance, log distance, or somesuch.  Here is the 
important point --- by any measure of discrepancy you care to use, the person who estimates 
Iobs as 0 when Iback>Ispot will *always*, in *every case*, beat the person who estimates 
Iobs with a negative value.   This is an indisputable fact.


First off, you may find it useful to avoid such words as absurd and 
indisputable fact.  I know political correctness may be sometimes 
overrated, but if you actually plan to have meaningful discussion, let's 
assume that everyone responding to your posts is just trying to help 
figure this out.


To address your point, you are right that J=0 is closer to "true 
intensity" then a negative value.  The problem is that we are not after 
a single intensity, but rather all of them, as they all contribute to 
electron density reconstruction.  If you replace negative Iobs with 
E(J), you would systematically inflate the averages, which may turn 
problematic in some cases.  It is probably better to stick with "raw 
intensities" and construct theoretical predictions properly to account 
for their properties.


What I was trying to tell you is that observed intensities is what we 
get from experiment.  They may be negative, and there is nothing 
unphysical about it.  Then you build a theoretical estimate of observed 
intensities, and if you do it right (i.e. by including experimental 
errors), they will actually have some probability of being negative.

This background has to be subtracted and what is perhaps the most useful form 
of observation is Ispot-Iback=Iobs.

How can that be the most useful form, when 0 is always a better estimate than a 
negative value, by any criterion?


Given your propensity to refer to what others might say as absurd, I am 
tempted to encourage *you* to come up with a better estimate. 
Nevertheless, let me try to clarify my point.


What is measured in the experiment is Ispot.  It contains Iback which 
our theoretical models cannot possibly account for (because we have no 
information at the refinement stage about crystal shape and other 
parameters that define background).  Strategy that has been in use for 
decades is to obtain estimates of Iback from pixels surrounding the 
integration spot.  I hope you find that reasonable.


Once we have Iback estimated, Ispot-Iback becomes Iobs - observed 
intensity.  There is no need to convert that value simply to avoid bad 
feeling brought by negative values.  Correctly formulated theoretical 
model predicts Iobs and accounts for error in it.


Let me state this again - Iobs are not true intensities and not 
estimates of true intensities.  They are experimental values sampling 
Ispot-Iback.  These can be negative.  If a theoretical model that 
approximates Iobs does not allow for negative Iobs, the model is flawed.

These observed intensities can be negative because while their true underlying 
value is positive, random errorsmay result in Iback>Ispot.  There is absolutely 
nothing unphysical here.

Yes there is.  The only way you can get a negative estimate is to make 
unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that 
both the true value of I and the background noise come from a Gaussian 
distribution that is allowed to have negative values.  Both of those 
assumptions are unphysical.


See, I have a problem with this.  Both common sense and laws of physics 
dictate that number of photons hitting spot on a detector is a positive 
number.  There is no law of physics that dictates that under no 
circumstances there could be Ispot=E(Iback).  
Yes, E(Ispot-Iback)>=0.  But P(Ispot-Iback=0)>0, and therefore 
experimental sampling of Ispot-Iback is bound to occasionally produce 
negative values.  What law of physics is broken when for a given 
reflection total number of photons in spot pixels is less that total 
number of photons in equal number of pixels in the surrounding 
background mask?


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Douglas Theobald
surement (as the measurement is actually positive, and sometimes 
> things are randomly less positive than backgroiund).  If you are using a 
> proper statistical model, after background correction you will end up with a 
> positive (or 0) value for the integrated intensity.
> 
> 
> On Jun 20, 2013, at 1:08 PM, Andrew Leslie  wrote:
> 
> >
> > The integration programs report a negative intensity simply because that is 
> > the observation.
> >
> > Because of noise in the Xray background, in a large sample of intensity 
> > estimates for reflections whose true intensity is very very small one will 
> > inevitably get some measurements that are negative. These must not be 
> > rejected because this will lead to bias (because some of these intensities 
> > for symmetry mates will be estimated too large rather than too small). It 
> > is not unusual for the intensity to remain negative even after averaging 
> > symmetry mates.
> >
> > Andrew
> >
> >
> > On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
> >
> >> Seems to me that the negative Is should be dealt with early on, in the 
> >> integration step.  Why exactly do integration programs report negative Is 
> >> to begin with?
> >>
> >>
> >> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
> >> wrote:
> >>
> >>> Wouldnt be possible to take advantage of negative Is to 
> >>> extrapolate/estimate the decay of scattering background (kind of Wilson 
> >>> plot of background scattering) to flat out the background and push all 
> >>> the Is to positive values?
> >>>
> >>> More of a question rather than a suggestion ...
> >>>
> >>> D
> >>>
> >>>
> >>>
> >>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
> >>> Tickle
> >>> Sent: 20 June 2013 17:34
> >>> To: ccp4bb
> >>> Subject: Re: [ccp4bb] ctruncate bug?
> >>>
> >>> Yes higher R factors is the usual reason people don't like I-based 
> >>> refinement!
> >>>
> >>> Anyway, refining against Is doesn't solve the problem, it only postpones 
> >>> it: you still need the Fs for maps! (though errors in Fs may be less 
> >>> critical then).
> >>> -- Ian
> >>>
> >>> On 20 June 2013 17:20, Dale Tronrud 
> >>> mailto:det...@uoxray.uoregon.edu>> wrote:
> >>> If you are refining against F's you have to find some way to avoid
> >>> calculating the square root of a negative number.  That is why people
> >>> have historically rejected negative I's and why Truncate and cTruncate
> >>> were invented.
> >>>
> >>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> >>> care less if Iobs happens to be negative.
> >>>
> >>> As for why people still refine against F...  When I was distributing
> >>> a refinement package it could refine against I but no one wanted to do
> >>> that.  The "R values" ended up higher, but they were looking at R
> >>> values calculated from F's.  Of course the F based R values are lower
> >>> when you refine against F's, that means nothing.
> >>>
> >>> If we could get the PDB to report both the F and I based R values
> >>> for all models maybe we could get a start toward moving to intensity
> >>> refinement.
> >>>
> >>> Dale Tronrud
> >>>
> >>>
> >>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
> >>> Just trying to understand the basic issues here.  How could refining 
> >>> directly against intensities solve the fundamental problem of negative 
> >>> intensity values?
> >>>
> >>>
> >>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
> >>> mailto:hofkristall...@gmail.com>> wrote:
> >>> As a maybe better alternative, we should (once again) consider to refine 
> >>> against intensities (and I guess George Sheldrick would agree here).
> >>>
> >>> I have a simple question - what exactly, short of some sort of historic 
> >>> inertia (or memory lapse), is the reason NOT to refine against 
> >>> intensities?
> >>>
> >>> Best, BR
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> This e-mail and any attachments may contain confidential, copyright and 
> >>> or privileged material, and are for the use of the intended addressee 
> >>> only. If you are not the intended addressee or an authorised recipient of 
> >>> the addressee please notify us of receipt by returning the e-mail and do 
> >>> not use, copy, retain, distribute or disclose the information in or 
> >>> attached to the e-mail.
> >>>
> >>> Any opinions expressed within this e-mail are those of the individual and 
> >>> not necessarily of Diamond Light Source Ltd.
> >>>
> >>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
> >>> attachments are free from viruses and we cannot accept liability for any 
> >>> damage which you may sustain as a result of software viruses which may be 
> >>> transmitted in or with the message.
> >>>
> >>> Diamond Light Source Limited (company no. 4375679). Registered in England 
> >>> and Wales with its registered office at Diamond House, Harwell Science 
> >>> and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> 


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Douglas Theobald
I kinda think we're saying the same thing, sort of.

You don't like the Gaussian assumption, and neither do I.  If you make the 
reasonable Poisson assumptions, then you don't get the Ispot-Iback=Iobs for the 
best estimate of Itrue.  Except as an approximation for large values, but we 
are talking about the case when Iback>Ispot, where the Gaussian approximation 
to the Poisson no longer holds.  The sum of two Poisson variates is also 
Poisson, which also can never be negative, unlike the Gaussian.  

So I reiterate: the Ispot-Iback=Iobs equation assumes Gaussians and hence 
negativity.  The Ispot-Iback=Iobs does not follow from a Poisson assumption.  


On Jun 21, 2013, at 1:13 PM, Ian Tickle  wrote:

> On 21 June 2013 17:10, Douglas Theobald  wrote:
>> Yes there is.  The only way you can get a negative estimate is to make 
>> unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that 
>> both the true value of I and the background noise come from a Gaussian 
>> distribution that is allowed to have negative values.  Both of those 
>> assumptions are unphysical.
> 
> Actually that's not correct: Ispot and Iback are both assumed to come from a 
> _Poisson_ distribution which by definition is zero for negative values of its 
> argument (you can't have a negative number of photons), so are _not_ allowed 
> to have negative values.  For large values of the argument (in fact the 
> approximation is pretty good even for x ~ 10) a Poisson approximates to a 
> Gaussian, and then of course the difference Ispot-Iback is also approximately 
> Gaussian.
> 
> But I think that doesn't affect your argument.
> 
> Cheers
> 
> -- Ian 


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Ed Pozharski

On 06/21/2013 10:19 AM, Ian Tickle wrote:
If you observe the symptoms of translational NCS in the diffraction 
pattern (i.e. systematically weak zones of reflections) you must take 
it into account when calculating the averages, i.e. if you do it 
properly parity groups should be normalised separately (though I 
concede there may be a practical issue in that I'm not aware of any 
software that currently has this feature). 


Ian,

I think this is exactly what I was trying to emphasize, that applying 
some conversion to raw intensities may have negative impact when 
conversion is based on incorrect or incomplete assumptions.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Ian Tickle
On 21 June 2013 17:10, Douglas Theobald  wrote:

> Yes there is.  The only way you can get a negative estimate is to make
> unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that
> both the true value of I and the background noise come from a Gaussian
> distribution that is allowed to have negative values.  Both of those
> assumptions are unphysical.
>

Actually that's not correct: Ispot and Iback are both assumed to come from
a _Poisson_ distribution which by definition is zero for negative values of
its argument (you can't have a negative number of photons), so are _not_
allowed to have negative values.  For large values of the argument (in fact
the approximation is pretty good even for x ~ 10) a Poisson approximates to
a Gaussian, and then of course the difference Ispot-Iback is also
approximately Gaussian.

But I think that doesn't affect your argument.

Cheers

-- Ian


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Douglas Theobald
On Jun 21, 2013, at 8:36 AM, Ed Pozharski  wrote:

> On 06/20/2013 01:07 PM, Douglas Theobald wrote:
>> How can there be nothing "wrong" with something that is unphysical?  
>> Intensities cannot be negative.
> 
> I think you are confusing two things - the true intensities and observed 
> intensities.

But I'm not.  Let me try to convince you ...

> True intensities represent the number of photons that diffract off a crystal 
> in a specific direction or, for QED-minded, relative probabilities of a 
> single photon being found in a particular area of the detector when it's 
> probability wave function finally collapses.

I agree. 

> True intensities certainly cannot be negative and in crystallographic method 
> they never are. They are represented by the best theoretical estimates 
> possible, Icalc.  These are always positive.

I also very much agree.  

> Observed intensities are the best estimates that we can come up with in an 
> experiment.  

I also agree with this, and this is the clincher.  You are arguing that 
Ispot-Iback=Iobs is the best estimate we can come up with.  I claim that is 
absurd.  How are you quantifying "best"?  Usually we have some sort of 
discrepancy measure between true and estimate, like RMSD, mean absolute 
distance, log distance, or somesuch.  Here is the important point --- by any 
measure of discrepancy you care to use, the person who estimates Iobs as 0 when 
Iback>Ispot will *always*, in *every case*, beat the person who estimates Iobs 
with a negative value.   This is an indisputable fact.  

> These are determined by integrating pixels around the spot where particular 
> reflection is expected to hit the detector.  Unfortunately, science did not 
> yet invent a method that would allow to suspend a crystal in vacuum while 
> also removing all of the outside solvent.  Neither we have included diffuse 
> scatter in our theoretical model.  Because of that, full reflection intensity 
> contains background signal in addition to the Icalc.  This background has to 
> be subtracted and what is perhaps the most useful form of observation is 
> Ispot-Iback=Iobs.

How can that be the most useful form, when 0 is always a better estimate than a 
negative value, by any criterion?

> These observed intensities can be negative because while their true 
> underlying value is positive, random errorsmay result in Iback>Ispot.  There 
> is absolutely nothing unphysical here.

Yes there is.  The only way you can get a negative estimate is to make 
unphysical assumptions.  Namely, the estimate Ispot-Iback=Iobs assumes that 
both the true value of I and the background noise come from a Gaussian 
distribution that is allowed to have negative values.  Both of those 
assumptions are unphysical.  

> Replacing Iobs with E(J) is not only unnecessary, it's ill-advised as it will 
> distort intensity statistics.  For example, let's say you have translational 
> NCS aligned with crystallographic axes, and hence some set of reflections is 
> systematically absent.  If all is well, ~0 for the subset while  
> is systematically positive.  This obviously happens because the standard 
> Wilson prior is wrong for these reflections, but I digress, as usual.
> 
> In summary, there is indeed nothing wrong, imho, with negative Iobs.  The 
> fact that some of these may become negative is correctly accounted for once 
> sigI is factored into the ML target.
> 
> Cheers,
> 
> Ed.
> 
> -- 
> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>Julian, King of Lemurs
> 


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Ian Tickle
On 21 June 2013 13:36, Ed Pozharski  wrote:

> Replacing Iobs with E(J) is not only unnecessary, it's ill-advised as it
> will distort intensity statistics.  For example, let's say you have
> translational NCS aligned with crystallographic axes, and hence some set of
> reflections is systematically absent.  If all is well, ~0 for the
> subset while  is systematically positive.  This obviously happens
> because the standard Wilson prior is wrong for these reflections, but I
> digress, as usual.
>

Ed,

If you observe the symptoms of translational NCS in the diffraction pattern
(i.e. systematically weak zones of reflections) you must take it into
account when calculating the averages, i.e. if you do it properly parity
groups should be normalised separately (though I concede there may be a
practical issue in that I'm not aware of any software that currently has
this feature).  In that case  will be ~ 0, as expected.  If you don't
do that then clearly you can't expect to get the right answer!  The
theoretical intensities are based on the assumption that the intensity
distributions are all positive, so it makes no sense to compare them with
an experimental distribution where a significant fraction are negative.
How exactly do you propose to deal properly with the P-Y L test that I
described? - because of course that also inherently assumes that the
intensities are all positive and it's certainly not valid to assume that
E(J) = E(F)^2 !

Another point is that (to paraphrase G. Orwell) "not all reflections are
created equal, just some are more equal than others".  What I mean is that
in counting reflections for the cumulative distributions (i.e. you count
the number of reflections in ranges of intensity or in ranges of L), a weak
reflection should be counted as fractional with a contribution to the total
which is less than 1, on a continuous scale from 0 to 1 related to
I/sigma(I).  In fact referring to your original posting reflections with h
< -4 will get such a small weight that it will be effectively zero and they
won't be counted at all (or it won't make the slightest difference whether
you count them or not).  Of course when it comes to outputting reflections
you can't have a fractional reflection, it's either included or it isn't.
So then you may have to have an arbitrary cutoff, though such reflections
would likely end up with zero intensity and large SD (but the programs may
not currently be good at estimating the latter in such a situation, which
is probably why they are currently rejected).

Another point worth mentioning is that the observed distributions of E^n (E
= normalised structure amplitude) tend to be very noisy, particularly for
large n, and I have a suspicion (as yet untested) that this may come from
weak reflections which have made a full contribution to the count when it
should have been fractional (or even zero).

I'm currently working on a revised version of TRUNCATE where some or all of
the above issues will be addressed.

Cheers

-- Ian


Re: [ccp4bb] ctruncate bug?

2013-06-21 Thread Ed Pozharski

On 06/20/2013 01:07 PM, Douglas Theobald wrote:

How can there be nothing "wrong" with something that is unphysical?  
Intensities cannot be negative.


I think you are confusing two things - the true intensities and observed 
intensities.


True intensities represent the number of photons that diffract off a 
crystal in a specific direction or, for QED-minded, relative 
probabilities of a single photon being found in a particular area of the 
detector when it's probability wave function finally collapses.


True intensities certainly cannot be negative and in crystallographic 
method they never are. They are represented by the best theoretical 
estimates possible, Icalc.  These are always positive.


Observed intensities are the best estimates that we can come up with in 
an experiment.  These are determined by integrating pixels around the 
spot where particular reflection is expected to hit the detector.  
Unfortunately, science did not yet invent a method that would allow to 
suspend a crystal in vacuum while also removing all of the outside 
solvent.  Neither we have included diffuse scatter in our theoretical 
model.  Because of that, full reflection intensity contains background 
signal in addition to the Icalc.  This background has to be subtracted 
and what is perhaps the most useful form of observation is Ispot-Iback=Iobs.


These observed intensities can be negative because while their true 
underlying value is positive, random errors may result in Iback>Ispot.  
There is absolutely nothing unphysical here. Replacing Iobs with E(J) is 
not only unnecessary, it's ill-advised as it will distort intensity 
statistics.  For example, let's say you have translational NCS aligned 
with crystallographic axes, and hence some set of reflections is 
systematically absent.  If all is well, ~0 for the subset while 
 is systematically positive.  This obviously happens because the 
standard Wilson prior is wrong for these reflections, but I digress, as 
usual.


In summary, there is indeed nothing wrong, imho, with negative Iobs.  
The fact that some of these may become negative is correctly accounted 
for once sigI is factored into the ML target.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Randy Read
r 2) is available as an option in XDSCONV
>>> Prior 1) seems to be used, or is available, in ctruncate in certain cases 
>>> (I don't know the details)
>>> 
>>> Using intensities instead of amplitudes in refinement would avoid having to 
>>> choose a prior, and refinement would therefore not be compromised in case 
>>> of data violating the assumptions underlying the prior. 
>>> 
>>> By the way, it is not (Iobs-Icalc)^2 that would be optimized in refinement 
>>> against intensities, but rather the corresponding maximum likelihood 
>>> formula (which I seem to remember is more complicated than the amplitude ML 
>>> formula, or is not an analytical formula at all, but maybe somebody knows 
>>> better).
>>> 
>>> best,
>>> 
>>> Kay
>>> 
>>> 
>>> On Thu, 20 Jun 2013 13:14:28 -0400, Douglas Theobald 
>>>  wrote:
>>> 
>>>> I still don't see how you get a negative intensity from that.  It seems 
>>>> you are saying that in many cases of a low intensity reflection, the 
>>>> integrated spot will be lower than the background.  That is not equivalent 
>>>> to having a negative measurement (as the measurement is actually positive, 
>>>> and sometimes things are randomly less positive than backgroiund).  If you 
>>>> are using a proper statistical model, after background correction you will 
>>>> end up with a positive (or 0) value for the integrated intensity.  
>>>> 
>>>> 
>>>> On Jun 20, 2013, at 1:08 PM, Andrew Leslie  
>>>> wrote:
>>>> 
>>>>> 
>>>>> The integration programs report a negative intensity simply because that 
>>>>> is the observation. 
>>>>> 
>>>>> Because of noise in the Xray background, in a large sample of intensity 
>>>>> estimates for reflections whose true intensity is very very small one 
>>>>> will inevitably get some measurements that are negative. These must not 
>>>>> be rejected because this will lead to bias (because some of these 
>>>>> intensities for symmetry mates will be estimated too large rather than 
>>>>> too small). It is not unusual for the intensity to remain negative even 
>>>>> after averaging symmetry mates.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>> 
>>>>> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
>>>>> 
>>>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>>>> integration step.  Why exactly do integration programs report negative 
>>>>>> Is to begin with?
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>>>> wrote:
>>>>>> 
>>>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>>>> plot of background scattering) to flat out the background and push all 
>>>>>>> the Is to positive values?
>>>>>>> 
>>>>>>> More of a question rather than a suggestion ...
>>>>>>> 
>>>>>>> D
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
>>>>>>> Ian Tickle
>>>>>>> Sent: 20 June 2013 17:34
>>>>>>> To: ccp4bb
>>>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>>>> 
>>>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>>>> refinement!
>>>>>>> 
>>>>>>> Anyway, refining against Is doesn't solve the problem, it only 
>>>>>>> postpones it: you still need the Fs for maps! (though errors in Fs may 
>>>>>>> be less critical then).
>>>>>>> -- Ian
>>>>>>> 
>>>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>>>> If you are refining against F's you have to find some way to avoid
>>>>>>> calculating the square root of a negative number.  That is why people
>>>>>>> have historical

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Randy Read
t;> On Jun 20, 2013, at 1:08 PM, Andrew Leslie  wrote:
>> 
>>> 
>>> The integration programs report a negative intensity simply because that is 
>>> the observation. 
>>> 
>>> Because of noise in the Xray background, in a large sample of intensity 
>>> estimates for reflections whose true intensity is very very small one will 
>>> inevitably get some measurements that are negative. These must not be 
>>> rejected because this will lead to bias (because some of these intensities 
>>> for symmetry mates will be estimated too large rather than too small). It 
>>> is not unusual for the intensity to remain negative even after averaging 
>>> symmetry mates.
>>> 
>>> Andrew
>>> 
>>> 
>>> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
>>> 
>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>> integration step.  Why exactly do integration programs report negative Is 
>>>> to begin with?
>>>> 
>>>> 
>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>> wrote:
>>>> 
>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>> plot of background scattering) to flat out the background and push all 
>>>>> the Is to positive values?
>>>>> 
>>>>> More of a question rather than a suggestion ...
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> 
>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>>>> Tickle
>>>>> Sent: 20 June 2013 17:34
>>>>> To: ccp4bb
>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>> 
>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>> refinement!
>>>>> 
>>>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>>>> critical then).
>>>>> -- Ian
>>>>> 
>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>> If you are refining against F's you have to find some way to avoid
>>>>> calculating the square root of a negative number.  That is why people
>>>>> have historically rejected negative I's and why Truncate and cTruncate
>>>>> were invented.
>>>>> 
>>>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>>>> care less if Iobs happens to be negative.
>>>>> 
>>>>> As for why people still refine against F...  When I was distributing
>>>>> a refinement package it could refine against I but no one wanted to do
>>>>> that.  The "R values" ended up higher, but they were looking at R
>>>>> values calculated from F's.  Of course the F based R values are lower
>>>>> when you refine against F's, that means nothing.
>>>>> 
>>>>> If we could get the PDB to report both the F and I based R values
>>>>> for all models maybe we could get a start toward moving to intensity
>>>>> refinement.
>>>>> 
>>>>> Dale Tronrud
>>>>> 
>>>>> 
>>>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>>>> Just trying to understand the basic issues here.  How could refining 
>>>>> directly against intensities solve the fundamental problem of negative 
>>>>> intensity values?
>>>>> 
>>>>> 
>>>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>>>> mailto:hofkristall...@gmail.com>> wrote:
>>>>> As a maybe better alternative, we should (once again) consider to refine 
>>>>> against intensities (and I guess George Sheldrick would agree here).
>>>>> 
>>>>> I have a simple question - what exactly, short of some sort of historic 
>>>>> inertia (or memory lapse), is the reason NOT to refine against 
>>>>> intensities?
>>>>> 
>>>>> Best, BR
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> This e-mail and any attachments may contain confidential, copyright and 
>>>>> or privileged material, and are for the use of the intended addressee 
>>>>> only. If you are not the intended addressee or an authorised recipient of 
>>>>> the addressee please notify us of receipt by returning the e-mail and do 
>>>>> not use, copy, retain, distribute or disclose the information in or 
>>>>> attached to the e-mail.
>>>>> 
>>>>> Any opinions expressed within this e-mail are those of the individual and 
>>>>> not necessarily of Diamond Light Source Ltd. 
>>>>> 
>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>>>> attachments are free from viruses and we cannot accept liability for any 
>>>>> damage which you may sustain as a result of software viruses which may be 
>>>>> transmitted in or with the message.
>>>>> 
>>>>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>>>>> and Wales with its registered office at Diamond House, Harwell Science 
>>>>> and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Ian Tickle
On 20 June 2013 20:46, Douglas Theobald  wrote:

> Well, I tend to think Ian is probably right, that doing things the
> "proper" way (vs French-Wilson) will not make much of a difference in the
> end.
>
> Nevertheless, I don't think refining against the (possibly negative)
> intensities is a good solution to dealing with negative intensities ---
> that just ignores the problem, and will end up overweighting large negative
> intensities.  Wouldn't it be better to correct the negative intensities
> with FW and then refine against that?
>
>
Hmmm, I seem to recall suggesting that a while back (but there were no
takers!).

I also think that using corrected Is, as opposed to corrected Fs, (however
you choose to do it) is the right way to do twinning & other statistical
tests.  For example the Padilla/Yeates L test uses the cumulative
distribution of |I1 - I2| / (I1 + I2) where I1 & I2 are intensities of
unrelated reflections (but close in reciprocal space).  The denominator of
this expression is clearly going to have problems if you feed it negative
intensities!  Also I believe (my apologies if I'm wrong!) that the UCLA
twinning server obtains the Is by squaring the Fs (presumably obtained by
F-W).  This is a formally invalid procedure (the expectation of I is not
the square of the expectation of F).  See here for an explanation of the
difference: http://xtal.sourceforge.net/man/bayest-desc.html .

Cheers

-- Ian


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Felix Frolow
 you used a derived esd since they can't 
>>>> be formally generated from the sigma's on I, and are very much 
>>>> undetermined for small intensities and small F's. 
>>>> 
>>>> Small molecule crystallographers routinely refine on F^2 and use all of 
>>>> the data, even if the F^2's are negative.
>>>> 
>>>> Bernie
>>>> 
>>>> On Jun 20, 2013, at 11:49 AM, Douglas Theobald wrote:
>>>> 
>>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>>> integration step.  Why exactly do integration programs report negative Is 
>>>>> to begin with?
>>>>> 
>>>>> 
>>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>>> wrote:
>>>>> 
>>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>>> plot of background scattering) to flat out the background and push all 
>>>>>> the Is to positive values?
>>>>>> 
>>>>>> More of a question rather than a suggestion ...
>>>>>> 
>>>>>> D
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
>>>>>> Ian Tickle
>>>>>> Sent: 20 June 2013 17:34
>>>>>> To: ccp4bb
>>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>>> 
>>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>>> refinement!
>>>>>> 
>>>>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>>>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>>>>> critical then).
>>>>>> -- Ian
>>>>>> 
>>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>>> If you are refining against F's you have to find some way to avoid
>>>>>> calculating the square root of a negative number.  That is why people
>>>>>> have historically rejected negative I's and why Truncate and cTruncate
>>>>>> were invented.
>>>>>> 
>>>>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>>>>> care less if Iobs happens to be negative.
>>>>>> 
>>>>>> As for why people still refine against F...  When I was distributing
>>>>>> a refinement package it could refine against I but no one wanted to do
>>>>>> that.  The "R values" ended up higher, but they were looking at R
>>>>>> values calculated from F's.  Of course the F based R values are lower
>>>>>> when you refine against F's, that means nothing.
>>>>>> 
>>>>>> If we could get the PDB to report both the F and I based R values
>>>>>> for all models maybe we could get a start toward moving to intensity
>>>>>> refinement.
>>>>>> 
>>>>>> Dale Tronrud
>>>>>> 
>>>>>> 
>>>>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>>>>> Just trying to understand the basic issues here.  How could refining 
>>>>>> directly against intensities solve the fundamental problem of negative 
>>>>>> intensity values?
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>>>>> mailto:hofkristall...@gmail.com>> wrote:
>>>>>> As a maybe better alternative, we should (once again) consider to refine 
>>>>>> against intensities (and I guess George Sheldrick would agree here).
>>>>>> 
>>>>>> I have a simple question - what exactly, short of some sort of historic 
>>>>>> inertia (or memory lapse), is the reason NOT to refine against 
>>>>>> intensities?
>>>>>> 
>>>>>> Best, BR
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> 
>>>>>> This e-mail and any attachments may contain confidential, copyright and 
>>>>>> or privileged material, and are for the use of the intended addressee 
>>>>>> only. If you are not the intended addressee or an authorised recipient 
>>>>>> of the addressee please notify us of receipt by returning the e-mail and 
>>>>>> do not use, copy, retain, distribute or disclose the information in or 
>>>>>> attached to the e-mail.
>>>>>> 
>>>>>> Any opinions expressed within this e-mail are those of the individual 
>>>>>> and not necessarily of Diamond Light Source Ltd. 
>>>>>> 
>>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>>>>> attachments are free from viruses and we cannot accept liability for any 
>>>>>> damage which you may sustain as a result of software viruses which may 
>>>>>> be transmitted in or with the message.
>>>>>> 
>>>>>> Diamond Light Source Limited (company no. 4375679). Registered in 
>>>>>> England and Wales with its registered office at Diamond House, Harwell 
>>>>>> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United 
>>>>>> Kingdom
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Tim Gruene
ils)
>>>> 
>>>> Using intensities instead of amplitudes in refinement would
>>>> avoid having to choose a prior, and refinement would
>>>> therefore not be compromised in case of data violating the
>>>> assumptions underlying the prior.
>>>> 
>>>> By the way, it is not (Iobs-Icalc)^2 that would be optimized
>>>> in refinement against intensities, but rather the
>>>> corresponding maximum likelihood formula (which I seem to
>>>> remember is more complicated than the amplitude ML formula,
>>>> or is not an analytical formula at all, but maybe somebody
>>>> knows better).
>>>> 
>>>> best,
>>>> 
>>>> Kay
>>>> 
>>>> 
>>>> On Thu, 20 Jun 2013 13:14:28 -0400, Douglas Theobald
>>>>  wrote:
>>>> 
>>>>> I still don't see how you get a negative intensity from
>>>>> that.  It seems you are saying that in many cases of a low
>>>>> intensity reflection, the integrated spot will be lower
>>>>> than the background.  That is not equivalent to having a
>>>>> negative measurement (as the measurement is actually
>>>>> positive, and sometimes things are randomly less positive
>>>>> than backgroiund).  If you are using a proper statistical
>>>>> model, after background correction you will end up with a
>>>>> positive (or 0) value for the integrated intensity.
>>>>> 
>>>>> 
>>>>> On Jun 20, 2013, at 1:08 PM, Andrew Leslie
>>>>>  wrote:
>>>>> 
>>>>>> 
>>>>>> The integration programs report a negative intensity
>>>>>> simply because that is the observation.
>>>>>> 
>>>>>> Because of noise in the Xray background, in a large
>>>>>> sample of intensity estimates for reflections whose true
>>>>>> intensity is very very small one will inevitably get some
>>>>>> measurements that are negative. These must not be
>>>>>> rejected because this will lead to bias (because some of
>>>>>> these intensities for symmetry mates will be estimated
>>>>>> too large rather than too small). It is not unusual for
>>>>>> the intensity to remain negative even after averaging
>>>>>> symmetry mates.
>>>>>> 
>>>>>> Andrew
>>>>>> 
>>>>>> 
>>>>>> On 20 Jun 2013, at 11:49, Douglas Theobald
>>>>>>  wrote:
>>>>>> 
>>>>>>> Seems to me that the negative Is should be dealt with
>>>>>>> early on, in the integration step.  Why exactly do
>>>>>>> integration programs report negative Is to begin with?
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini
>>>>>>>  wrote:
>>>>>>> 
>>>>>>>> Wouldnt be possible to take advantage of negative Is
>>>>>>>> to extrapolate/estimate the decay of scattering
>>>>>>>> background (kind of Wilson plot of background
>>>>>>>> scattering) to flat out the background and push all
>>>>>>>> the Is to positive values?
>>>>>>>> 
>>>>>>>> More of a question rather than a suggestion ...
>>>>>>>> 
>>>>>>>> D
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> From: CCP4 bulletin board
>>>>>>>> [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian
>>>>>>>> Tickle Sent: 20 June 2013 17:34 To: ccp4bb Subject:
>>>>>>>> Re: [ccp4bb] ctruncate bug?
>>>>>>>> 
>>>>>>>> Yes higher R factors is the usual reason people don't
>>>>>>>> like I-based refinement!
>>>>>>>> 
>>>>>>>> Anyway, refining against Is doesn't solve the
>>>>>>>> problem, it only postpones it: you still need the Fs
>>>>>>>> for maps! (though errors in Fs may be less critical
>>>>>>>> then). -- Ian
>>>>>>>> 
>>>>>>>> On 20 June 2013 17:20, Dale Tronrud
>>>>>>>> mailt

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
>>> On Thu, 20 Jun 2013 13:14:28 -0400, Douglas Theobald 
>>>  wrote:
>>> 
>>>> I still don't see how you get a negative intensity from that.  It seems 
>>>> you are saying that in many cases of a low intensity reflection, the 
>>>> integrated spot will be lower than the background.  That is not equivalent 
>>>> to having a negative measurement (as the measurement is actually positive, 
>>>> and sometimes things are randomly less positive than backgroiund).  If you 
>>>> are using a proper statistical model, after background correction you will 
>>>> end up with a positive (or 0) value for the integrated intensity.
>>>> 
>>>> 
>>>> On Jun 20, 2013, at 1:08 PM, Andrew Leslie  
>>>> wrote:
>>>> 
>>>>> 
>>>>> The integration programs report a negative intensity simply because that 
>>>>> is the observation.
>>>>> 
>>>>> Because of noise in the Xray background, in a large sample of intensity 
>>>>> estimates for reflections whose true intensity is very very small one 
>>>>> will inevitably get some measurements that are negative. These must not 
>>>>> be rejected because this will lead to bias (because some of these 
>>>>> intensities for symmetry mates will be estimated too large rather than 
>>>>> too small). It is not unusual for the intensity to remain negative even 
>>>>> after averaging symmetry mates.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>> 
>>>>> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
>>>>> 
>>>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>>>> integration step.  Why exactly do integration programs report negative 
>>>>>> Is to begin with?
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>>>> wrote:
>>>>>> 
>>>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>>>> plot of background scattering) to flat out the background and push all 
>>>>>>> the Is to positive values?
>>>>>>> 
>>>>>>> More of a question rather than a suggestion ...
>>>>>>> 
>>>>>>> D
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
>>>>>>> Ian Tickle
>>>>>>> Sent: 20 June 2013 17:34
>>>>>>> To: ccp4bb
>>>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>>>> 
>>>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>>>> refinement!
>>>>>>> 
>>>>>>> Anyway, refining against Is doesn't solve the problem, it only 
>>>>>>> postpones it: you still need the Fs for maps! (though errors in Fs may 
>>>>>>> be less critical then).
>>>>>>> -- Ian
>>>>>>> 
>>>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>>>> If you are refining against F's you have to find some way to avoid
>>>>>>> calculating the square root of a negative number.  That is why people
>>>>>>> have historically rejected negative I's and why Truncate and cTruncate
>>>>>>> were invented.
>>>>>>> 
>>>>>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>>>>>> care less if Iobs happens to be negative.
>>>>>>> 
>>>>>>> As for why people still refine against F...  When I was distributing
>>>>>>> a refinement package it could refine against I but no one wanted to do
>>>>>>> that.  The "R values" ended up higher, but they were looking at R
>>>>>>> values calculated from F's.  Of course the F based R values are lower
>>>>>>> when you refine against F's, that means nothing.
>>>>>>> 
>>>>>>> If we could get the PDB to report both the F and I based R values
>>>>>>> for all models maybe we could get a start toward moving to intensity
>>>>>>> refinement.
>>>>>>> 
>>>>>>> Dale Tronrud
>>>>>>> 
>>>>>>> 
>>>>>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>>>>>> Just trying to understand the basic issues here.  How could refining 
>>>>>>> directly against intensities solve the fundamental problem of negative 
>>>>>>> intensity values?
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>>>>>> mailto:hofkristall...@gmail.com>> wrote:
>>>>>>> As a maybe better alternative, we should (once again) consider to 
>>>>>>> refine against intensities (and I guess George Sheldrick would agree 
>>>>>>> here).
>>>>>>> 
>>>>>>> I have a simple question - what exactly, short of some sort of historic 
>>>>>>> inertia (or memory lapse), is the reason NOT to refine against 
>>>>>>> intensities?
>>>>>>> 
>>>>>>> Best, BR
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> This e-mail and any attachments may contain confidential, copyright and 
>>>>>>> or privileged material, and are for the use of the intended addressee 
>>>>>>> only. If you are not the intended addressee or an authorised recipient 
>>>>>>> of the addressee please notify us of receipt by returning the e-mail 
>>>>>>> and do not use, copy, retain, distribute or disclose the information in 
>>>>>>> or attached to the e-mail.
>>>>>>> 
>>>>>>> Any opinions expressed within this e-mail are those of the individual 
>>>>>>> and not necessarily of Diamond Light Source Ltd.
>>>>>>> 
>>>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>>>>>> attachments are free from viruses and we cannot accept liability for 
>>>>>>> any damage which you may sustain as a result of software viruses which 
>>>>>>> may be transmitted in or with the message.
>>>>>>> 
>>>>>>> Diamond Light Source Limited (company no. 4375679). Registered in 
>>>>>>> England and Wales with its registered office at Diamond House, Harwell 
>>>>>>> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United 
>>>>>>> Kingdom
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Kay Diederichs
ms report negative Is to 
begin with?


On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:


Wouldnt be possible to take advantage of negative Is to extrapolate/estimate 
the decay of scattering background (kind of Wilson plot of background 
scattering) to flat out the background and push all the Is to positive values?

More of a question rather than a suggestion ...

D



From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian Tickle
Sent: 20 June 2013 17:34
To: ccp4bb
Subject: Re: [ccp4bb] ctruncate bug?

Yes higher R factors is the usual reason people don't like I-based refinement!

Anyway, refining against Is doesn't solve the problem, it only postpones it: 
you still need the Fs for maps! (though errors in Fs may be less critical then).
-- Ian

On 20 June 2013 17:20, Dale Tronrud 
mailto:det...@uoxray.uoregon.edu>> wrote:
If you are refining against F's you have to find some way to avoid
calculating the square root of a negative number.  That is why people
have historically rejected negative I's and why Truncate and cTruncate
were invented.

When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
care less if Iobs happens to be negative.

As for why people still refine against F...  When I was distributing
a refinement package it could refine against I but no one wanted to do
that.  The "R values" ended up higher, but they were looking at R
values calculated from F's.  Of course the F based R values are lower
when you refine against F's, that means nothing.

If we could get the PDB to report both the F and I based R values
for all models maybe we could get a start toward moving to intensity
refinement.

Dale Tronrud


On 06/20/2013 09:06 AM, Douglas Theobald wrote:
Just trying to understand the basic issues here.  How could refining directly 
against intensities solve the fundamental problem of negative intensity values?


On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
mailto:hofkristall...@gmail.com>> wrote:
As a maybe better alternative, we should (once again) consider to refine 
against intensities (and I guess George Sheldrick would agree here).

I have a simple question - what exactly, short of some sort of historic inertia 
(or memory lapse), is the reason NOT to refine against intensities?

Best, BR




--

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd.

Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.

Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom















Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
t;> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
>>> 
>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>> integration step.  Why exactly do integration programs report negative Is 
>>>> to begin with?
>>>> 
>>>> 
>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>> wrote:
>>>> 
>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>> plot of background scattering) to flat out the background and push all 
>>>>> the Is to positive values?
>>>>> 
>>>>> More of a question rather than a suggestion ...
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> 
>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>>>> Tickle
>>>>> Sent: 20 June 2013 17:34
>>>>> To: ccp4bb
>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>> 
>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>> refinement!
>>>>> 
>>>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>>>> critical then).
>>>>> -- Ian
>>>>> 
>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>> If you are refining against F's you have to find some way to avoid
>>>>> calculating the square root of a negative number.  That is why people
>>>>> have historically rejected negative I's and why Truncate and cTruncate
>>>>> were invented.
>>>>> 
>>>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>>>> care less if Iobs happens to be negative.
>>>>> 
>>>>> As for why people still refine against F...  When I was distributing
>>>>> a refinement package it could refine against I but no one wanted to do
>>>>> that.  The "R values" ended up higher, but they were looking at R
>>>>> values calculated from F's.  Of course the F based R values are lower
>>>>> when you refine against F's, that means nothing.
>>>>> 
>>>>> If we could get the PDB to report both the F and I based R values
>>>>> for all models maybe we could get a start toward moving to intensity
>>>>> refinement.
>>>>> 
>>>>> Dale Tronrud
>>>>> 
>>>>> 
>>>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>>>> Just trying to understand the basic issues here.  How could refining 
>>>>> directly against intensities solve the fundamental problem of negative 
>>>>> intensity values?
>>>>> 
>>>>> 
>>>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>>>> mailto:hofkristall...@gmail.com>> wrote:
>>>>> As a maybe better alternative, we should (once again) consider to refine 
>>>>> against intensities (and I guess George Sheldrick would agree here).
>>>>> 
>>>>> I have a simple question - what exactly, short of some sort of historic 
>>>>> inertia (or memory lapse), is the reason NOT to refine against 
>>>>> intensities?
>>>>> 
>>>>> Best, BR
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> This e-mail and any attachments may contain confidential, copyright and 
>>>>> or privileged material, and are for the use of the intended addressee 
>>>>> only. If you are not the intended addressee or an authorised recipient of 
>>>>> the addressee please notify us of receipt by returning the e-mail and do 
>>>>> not use, copy, retain, distribute or disclose the information in or 
>>>>> attached to the e-mail.
>>>>> 
>>>>> Any opinions expressed within this e-mail are those of the individual and 
>>>>> not necessarily of Diamond Light Source Ltd. 
>>>>> 
>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>>>> attachments are free from viruses and we cannot accept liability for any 
>>>>> damage which you may sustain as a result of software viruses which may be 
>>>>> transmitted in or with the message.
>>>>> 
>>>>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>>>>> and Wales with its registered office at Diamond House, Harwell Science 
>>>>> and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Kay Diederichs
Douglas,

the intensity is negative if the integrated spot has a lower intensity than the 
estimate of the background under the spot. So yes, we are not _measuring_ 
negative intensities, rather we are estimating intensities, and that estimate 
may turn out to be negative. In a later step we try to "correct" for this, 
because it is non-physical, as you say. At that point, the "proper statistical 
model" comes into play. Essentially we use this as a "prior". In the order of 
increasing information, we can have more or less informative priors for weak 
reflections:
1) I > 0
2) I has a distribution looking like the right half of a Gaussian, and we 
estimate its width from the variance of the intensities in a resolution shell
3) I follows a Wilson distribution, and we estimate its parameters from the 
data in a resolution shell
4) I must be related to Fcalc^2 (i.e. once the structure is solved, we 
re-integrate using the Fcalc as prior)
For a given experiment, the problem is chicken-and-egg in the sense that only 
if you know the characteristics of the data can you choose the correct prior.
I guess that using prior 4) would be heavily frowned upon because there is a 
danger of model bias. You could say: A Bayesian analysis done properly should 
not suffer from model bias. This is probably true, but the theory to ensure the 
word "properly" is not available at the moment.
Crystallographers usually use prior 3) which, as I tried to point out, also has 
its weak spots, namely if the data do not behave like those of an ideal crystal 
- and today's projects often result in data that would have been discarded ten 
years ago, so they are far from ideal.
Prior 2) is available as an option in XDSCONV
Prior 1) seems to be used, or is available, in ctruncate in certain cases (I 
don't know the details)

Using intensities instead of amplitudes in refinement would avoid having to 
choose a prior, and refinement would therefore not be compromised in case of 
data violating the assumptions underlying the prior. 

By the way, it is not (Iobs-Icalc)^2 that would be optimized in refinement 
against intensities, but rather the corresponding maximum likelihood formula 
(which I seem to remember is more complicated than the amplitude ML formula, or 
is not an analytical formula at all, but maybe somebody knows better).

best,

Kay


On Thu, 20 Jun 2013 13:14:28 -0400, Douglas Theobald  
wrote:

>I still don't see how you get a negative intensity from that.  It seems you 
>are saying that in many cases of a low intensity reflection, the integrated 
>spot will be lower than the background.  That is not equivalent to having a 
>negative measurement (as the measurement is actually positive, and sometimes 
>things are randomly less positive than backgroiund).  If you are using a 
>proper statistical model, after background correction you will end up with a 
>positive (or 0) value for the integrated intensity.  
>
>
>On Jun 20, 2013, at 1:08 PM, Andrew Leslie  wrote:
>
>> 
>> The integration programs report a negative intensity simply because that is 
>> the observation. 
>> 
>> Because of noise in the Xray background, in a large sample of intensity 
>> estimates for reflections whose true intensity is very very small one will 
>> inevitably get some measurements that are negative. These must not be 
>> rejected because this will lead to bias (because some of these intensities 
>> for symmetry mates will be estimated too large rather than too small). It is 
>> not unusual for the intensity to remain negative even after averaging 
>> symmetry mates.
>> 
>> Andrew
>> 
>> 
>> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
>> 
>>> Seems to me that the negative Is should be dealt with early on, in the 
>>> integration step.  Why exactly do integration programs report negative Is 
>>> to begin with?
>>> 
>>> 
>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:
>>> 
>>>> Wouldnt be possible to take advantage of negative Is to 
>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>> plot of background scattering) to flat out the background and push all the 
>>>> Is to positive values?
>>>> 
>>>> More of a question rather than a suggestion ...
>>>> 
>>>> D
>>>> 
>>>> 
>>>> 
>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>>> Tickle
>>>> Sent: 20 June 2013 17:34
>>>> To: ccp4bb
>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>> 
>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Ian Tickle
Douglas, I think you are missing the point that estimation of the
parameters of the proper Bayesian statistical model (i.e. the Wilson prior)
in order to perform the integration in the manner you are suggesting
requires knowledge of the already integrated intensities!  I suppose we
could iterate, i.e. assume an approximate prior, integrate, calculate a
better prior, re-do the integration with the new prior and so on (hoping of
course that the whole process converges), but I think most people would
regard that as overkill.  Also dealing with the issue of averaging
estimates of intensities that no longer have a Gaussian error distribution,
and also crucially outlier rejection, would require some rethinking of the
algorithms. The question is would it make any difference in the end
compared with the 'post-correction' we're doing now?

Cheers

-- Ian


On 20 June 2013 18:14, Douglas Theobald  wrote:

> I still don't see how you get a negative intensity from that.  It seems
> you are saying that in many cases of a low intensity reflection, the
> integrated spot will be lower than the background.  That is not equivalent
> to having a negative measurement (as the measurement is actually positive,
> and sometimes things are randomly less positive than backgroiund).  If you
> are using a proper statistical model, after background correction you will
> end up with a positive (or 0) value for the integrated intensity.
>
>
> On Jun 20, 2013, at 1:08 PM, Andrew Leslie 
> wrote:
>
> >
> > The integration programs report a negative intensity simply because that
> is the observation.
> >
> > Because of noise in the Xray background, in a large sample of intensity
> estimates for reflections whose true intensity is very very small one will
> inevitably get some measurements that are negative. These must not be
> rejected because this will lead to bias (because some of these intensities
> for symmetry mates will be estimated too large rather than too small). It
> is not unusual for the intensity to remain negative even after averaging
> symmetry mates.
> >
> > Andrew
> >
> >
> > On 20 Jun 2013, at 11:49, Douglas Theobald 
> wrote:
> >
> >> Seems to me that the negative Is should be dealt with early on, in the
> integration step.  Why exactly do integration programs report negative Is
> to begin with?
> >>
> >>
> >> On Jun 20, 2013, at 12:45 PM, Dom Bellini 
> wrote:
> >>
> >>> Wouldnt be possible to take advantage of negative Is to
> extrapolate/estimate the decay of scattering background (kind of Wilson
> plot of background scattering) to flat out the background and push all the
> Is to positive values?
> >>>
> >>> More of a question rather than a suggestion ...
> >>>
> >>> D
> >>>
> >>>
> >>>
> >>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> Ian Tickle
> >>> Sent: 20 June 2013 17:34
> >>> To: ccp4bb
> >>> Subject: Re: [ccp4bb] ctruncate bug?
> >>>
> >>> Yes higher R factors is the usual reason people don't like I-based
> refinement!
> >>>
> >>> Anyway, refining against Is doesn't solve the problem, it only
> postpones it: you still need the Fs for maps! (though errors in Fs may be
> less critical then).
> >>> -- Ian
> >>>
> >>> On 20 June 2013 17:20, Dale Tronrud  det...@uoxray.uoregon.edu>> wrote:
> >>> If you are refining against F's you have to find some way to avoid
> >>> calculating the square root of a negative number.  That is why people
> >>> have historically rejected negative I's and why Truncate and cTruncate
> >>> were invented.
> >>>
> >>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> >>> care less if Iobs happens to be negative.
> >>>
> >>> As for why people still refine against F...  When I was distributing
> >>> a refinement package it could refine against I but no one wanted to do
> >>> that.  The "R values" ended up higher, but they were looking at R
> >>> values calculated from F's.  Of course the F based R values are lower
> >>> when you refine against F's, that means nothing.
> >>>
> >>> If we could get the PDB to report both the F and I based R values
> >>> for all models maybe we could get a start toward moving to intensity
> >>> refinement.
> >>>
> >>> Dale Tronrud
> >>>
> >>>
> >>> On 06/20/2013 09:06

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
On Jun 20, 2013, at 1:47 PM, Felix Frolow  wrote:

> Intensity is subtraction:  Inet=Iobs - Ibackground.  Iobs and Ibackground can 
> not be negative.  Inet CAN be negative if background is higher than Iobs. 

Just to reiterate, we know that the true value of Inet cannot be negative.  
Hence, the equation you quote is invalid and illogical --- it has no physical 
or statistical justification (except as an approximation for large Iobs and low 
Iback, when ironically background correction is unnecessary).  That equation 
does not account for random statistical fluctuations (e.g., simple Poisson 
counting statistics of shot noise).  


> We do not know how to model background scattering modulated my molecular 
> transform and mechanical motion of the molecule, 
> I recall we have called it TDS - thermal diffuse scattering. Many years ago 
> Boaz Shaanan and JH were fascinated by it.
> If we would know how deal with TDS, we would go to much nicer structures some 
> of us like and for sure to much lower 
> R factors all of us love excluding maybe referees who will claim over 
> refinement :-\
> Dr Felix Frolow   
> Professor of Structural Biology and Biotechnology, 
> Department of Molecular Microbiology and Biotechnology
> Tel Aviv University 69978, Israel
> 
> Acta Crystallographica F, co-editor
> 
> e-mail: mbfro...@post.tau.ac.il
> Tel:  ++972-3640-8723
> Fax: ++972-3640-9407
> Cellular: 0547 459 608
> 
> On Jun 20, 2013, at 20:07 , Douglas Theobald  wrote:
> 
>> How can there be nothing "wrong" with something that is unphysical?  
>> Intensities cannot be negative.  How could you measure a negative number of 
>> photons?  You can only have a Gaussian distribution around I=0 if you are 
>> using an incorrect, unphysical statistical model.  As I understand it, the 
>> physics predicts that intensities from diffraction should be gamma 
>> distributed (i.e., the square of a Gaussian variate), which makes sense as 
>> the gamma distribution assigns probability 0 to negative values.  
>> 
>> 
>> On Jun 20, 2013, at 1:00 PM, Bernard D Santarsiero  wrote:
>> 
>>> There's absolutely nothing "wrong" with negative intensities. They are 
>>> measurements of intensities that are near zero, and some will be negative, 
>>> and others positive.  The distribution around I=0 can still be Gaussian, 
>>> and you have true esd's.  With F's you used a derived esd since they can't 
>>> be formally generated from the sigma's on I, and are very much undetermined 
>>> for small intensities and small F's. 
>>> 
>>> Small molecule crystallographers routinely refine on F^2 and use all of the 
>>> data, even if the F^2's are negative.
>>> 
>>> Bernie
>>> 
>>> On Jun 20, 2013, at 11:49 AM, Douglas Theobald wrote:
>>> 
>>>> Seems to me that the negative Is should be dealt with early on, in the 
>>>> integration step.  Why exactly do integration programs report negative Is 
>>>> to begin with?
>>>> 
>>>> 
>>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  
>>>> wrote:
>>>> 
>>>>> Wouldnt be possible to take advantage of negative Is to 
>>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>>> plot of background scattering) to flat out the background and push all 
>>>>> the Is to positive values?
>>>>> 
>>>>> More of a question rather than a suggestion ...
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> 
>>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>>>> Tickle
>>>>> Sent: 20 June 2013 17:34
>>>>> To: ccp4bb
>>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>>> 
>>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>>> refinement!
>>>>> 
>>>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>>>> critical then).
>>>>> -- Ian
>>>>> 
>>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>>> If you are refining against F's you have to find some way to avoid
>>>>> calculating the square root of a negative number.  That is why people
>>>>> have historically rejected neg

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Felix Frolow
Intensity is subtraction:  Inet=Iobs - Ibackground.  Iobs and Ibackground can 
not be negative.  Inet CAN be negative if background is higher than Iobs. 
We do not know how to model background scattering modulated my molecular 
transform and mechanical motion of the molecule, 
I recall we have called it TDS - thermal diffuse scattering. Many years ago 
Boaz Shaanan and JH were fascinated by it.
If we would know how deal with TDS, we would go to much nicer structures some 
of us like and for sure to much lower 
R factors all of us love excluding maybe referees who will claim over 
refinement :-\
Dr Felix Frolow   
Professor of Structural Biology and Biotechnology, 
Department of Molecular Microbiology and Biotechnology
Tel Aviv University 69978, Israel

Acta Crystallographica F, co-editor

e-mail: mbfro...@post.tau.ac.il
Tel:  ++972-3640-8723
Fax: ++972-3640-9407
Cellular: 0547 459 608

On Jun 20, 2013, at 20:07 , Douglas Theobald  wrote:

> How can there be nothing "wrong" with something that is unphysical?  
> Intensities cannot be negative.  How could you measure a negative number of 
> photons?  You can only have a Gaussian distribution around I=0 if you are 
> using an incorrect, unphysical statistical model.  As I understand it, the 
> physics predicts that intensities from diffraction should be gamma 
> distributed (i.e., the square of a Gaussian variate), which makes sense as 
> the gamma distribution assigns probability 0 to negative values.  
> 
> 
> On Jun 20, 2013, at 1:00 PM, Bernard D Santarsiero  wrote:
> 
>> There's absolutely nothing "wrong" with negative intensities. They are 
>> measurements of intensities that are near zero, and some will be negative, 
>> and others positive.  The distribution around I=0 can still be Gaussian, and 
>> you have true esd's.  With F's you used a derived esd since they can't be 
>> formally generated from the sigma's on I, and are very much undetermined for 
>> small intensities and small F's. 
>> 
>> Small molecule crystallographers routinely refine on F^2 and use all of the 
>> data, even if the F^2's are negative.
>> 
>> Bernie
>> 
>> On Jun 20, 2013, at 11:49 AM, Douglas Theobald wrote:
>> 
>>> Seems to me that the negative Is should be dealt with early on, in the 
>>> integration step.  Why exactly do integration programs report negative Is 
>>> to begin with?
>>> 
>>> 
>>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:
>>> 
>>>> Wouldnt be possible to take advantage of negative Is to 
>>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>>> plot of background scattering) to flat out the background and push all the 
>>>> Is to positive values?
>>>> 
>>>> More of a question rather than a suggestion ...
>>>> 
>>>> D
>>>> 
>>>> 
>>>> 
>>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>>> Tickle
>>>> Sent: 20 June 2013 17:34
>>>> To: ccp4bb
>>>> Subject: Re: [ccp4bb] ctruncate bug?
>>>> 
>>>> Yes higher R factors is the usual reason people don't like I-based 
>>>> refinement!
>>>> 
>>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>>> critical then).
>>>> -- Ian
>>>> 
>>>> On 20 June 2013 17:20, Dale Tronrud 
>>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>>> If you are refining against F's you have to find some way to avoid
>>>> calculating the square root of a negative number.  That is why people
>>>> have historically rejected negative I's and why Truncate and cTruncate
>>>> were invented.
>>>> 
>>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>>> care less if Iobs happens to be negative.
>>>> 
>>>> As for why people still refine against F...  When I was distributing
>>>> a refinement package it could refine against I but no one wanted to do
>>>> that.  The "R values" ended up higher, but they were looking at R
>>>> values calculated from F's.  Of course the F based R values are lower
>>>> when you refine against F's, that means nothing.
>>>> 
>>>> If we could get the PDB to report both the F and I based R values
>>>> for all models maybe we could get a start toward moving t

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
I still don't see how you get a negative intensity from that.  It seems you are 
saying that in many cases of a low intensity reflection, the integrated spot 
will be lower than the background.  That is not equivalent to having a negative 
measurement (as the measurement is actually positive, and sometimes things are 
randomly less positive than backgroiund).  If you are using a proper 
statistical model, after background correction you will end up with a positive 
(or 0) value for the integrated intensity.  


On Jun 20, 2013, at 1:08 PM, Andrew Leslie  wrote:

> 
> The integration programs report a negative intensity simply because that is 
> the observation. 
> 
> Because of noise in the Xray background, in a large sample of intensity 
> estimates for reflections whose true intensity is very very small one will 
> inevitably get some measurements that are negative. These must not be 
> rejected because this will lead to bias (because some of these intensities 
> for symmetry mates will be estimated too large rather than too small). It is 
> not unusual for the intensity to remain negative even after averaging 
> symmetry mates.
> 
> Andrew
> 
> 
> On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:
> 
>> Seems to me that the negative Is should be dealt with early on, in the 
>> integration step.  Why exactly do integration programs report negative Is to 
>> begin with?
>> 
>> 
>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:
>> 
>>> Wouldnt be possible to take advantage of negative Is to 
>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>> plot of background scattering) to flat out the background and push all the 
>>> Is to positive values?
>>> 
>>> More of a question rather than a suggestion ...
>>> 
>>> D
>>> 
>>> 
>>> 
>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>> Tickle
>>> Sent: 20 June 2013 17:34
>>> To: ccp4bb
>>> Subject: Re: [ccp4bb] ctruncate bug?
>>> 
>>> Yes higher R factors is the usual reason people don't like I-based 
>>> refinement!
>>> 
>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>> critical then).
>>> -- Ian
>>> 
>>> On 20 June 2013 17:20, Dale Tronrud 
>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>> If you are refining against F's you have to find some way to avoid
>>> calculating the square root of a negative number.  That is why people
>>> have historically rejected negative I's and why Truncate and cTruncate
>>> were invented.
>>> 
>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>> care less if Iobs happens to be negative.
>>> 
>>> As for why people still refine against F...  When I was distributing
>>> a refinement package it could refine against I but no one wanted to do
>>> that.  The "R values" ended up higher, but they were looking at R
>>> values calculated from F's.  Of course the F based R values are lower
>>> when you refine against F's, that means nothing.
>>> 
>>> If we could get the PDB to report both the F and I based R values
>>> for all models maybe we could get a start toward moving to intensity
>>> refinement.
>>> 
>>> Dale Tronrud
>>> 
>>> 
>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>> Just trying to understand the basic issues here.  How could refining 
>>> directly against intensities solve the fundamental problem of negative 
>>> intensity values?
>>> 
>>> 
>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>> mailto:hofkristall...@gmail.com>> wrote:
>>> As a maybe better alternative, we should (once again) consider to refine 
>>> against intensities (and I guess George Sheldrick would agree here).
>>> 
>>> I have a simple question - what exactly, short of some sort of historic 
>>> inertia (or memory lapse), is the reason NOT to refine against intensities?
>>> 
>>> Best, BR
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> This e-mail and any attachments may contain confidential, copyright and or 
>>> privileged material, and are for the use of the intended addressee only. If 
>>> you are not the intended addressee or an authorised recipient of the 
>>> addressee please notify us of receipt by returning the e-mail and do not 
>>> use, copy, retain, distribute or disclose the information in or attached to 
>>> the e-mail.
>>> 
>>> Any opinions expressed within this e-mail are those of the individual and 
>>> not necessarily of Diamond Light Source Ltd. 
>>> 
>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>> attachments are free from viruses and we cannot accept liability for any 
>>> damage which you may sustain as a result of software viruses which may be 
>>> transmitted in or with the message.
>>> 
>>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>>> and Wales with its registered office at Diamond House, Harwell Science and 
>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
How can there be nothing "wrong" with something that is unphysical?  
Intensities cannot be negative.  How could you measure a negative number of 
photons?  You can only have a Gaussian distribution around I=0 if you are using 
an incorrect, unphysical statistical model.  As I understand it, the physics 
predicts that intensities from diffraction should be gamma distributed (i.e., 
the square of a Gaussian variate), which makes sense as the gamma distribution 
assigns probability 0 to negative values.  


On Jun 20, 2013, at 1:00 PM, Bernard D Santarsiero  wrote:

> There's absolutely nothing "wrong" with negative intensities. They are 
> measurements of intensities that are near zero, and some will be negative, 
> and others positive.  The distribution around I=0 can still be Gaussian, and 
> you have true esd's.  With F's you used a derived esd since they can't be 
> formally generated from the sigma's on I, and are very much undetermined for 
> small intensities and small F's. 
> 
> Small molecule crystallographers routinely refine on F^2 and use all of the 
> data, even if the F^2's are negative.
> 
> Bernie
> 
> On Jun 20, 2013, at 11:49 AM, Douglas Theobald wrote:
> 
>> Seems to me that the negative Is should be dealt with early on, in the 
>> integration step.  Why exactly do integration programs report negative Is to 
>> begin with?
>> 
>> 
>> On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:
>> 
>>> Wouldnt be possible to take advantage of negative Is to 
>>> extrapolate/estimate the decay of scattering background (kind of Wilson 
>>> plot of background scattering) to flat out the background and push all the 
>>> Is to positive values?
>>> 
>>> More of a question rather than a suggestion ...
>>> 
>>> D
>>> 
>>> 
>>> 
>>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>>> Tickle
>>> Sent: 20 June 2013 17:34
>>> To: ccp4bb
>>> Subject: Re: [ccp4bb] ctruncate bug?
>>> 
>>> Yes higher R factors is the usual reason people don't like I-based 
>>> refinement!
>>> 
>>> Anyway, refining against Is doesn't solve the problem, it only postpones 
>>> it: you still need the Fs for maps! (though errors in Fs may be less 
>>> critical then).
>>> -- Ian
>>> 
>>> On 20 June 2013 17:20, Dale Tronrud 
>>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>> If you are refining against F's you have to find some way to avoid
>>> calculating the square root of a negative number.  That is why people
>>> have historically rejected negative I's and why Truncate and cTruncate
>>> were invented.
>>> 
>>> When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>>> care less if Iobs happens to be negative.
>>> 
>>> As for why people still refine against F...  When I was distributing
>>> a refinement package it could refine against I but no one wanted to do
>>> that.  The "R values" ended up higher, but they were looking at R
>>> values calculated from F's.  Of course the F based R values are lower
>>> when you refine against F's, that means nothing.
>>> 
>>> If we could get the PDB to report both the F and I based R values
>>> for all models maybe we could get a start toward moving to intensity
>>> refinement.
>>> 
>>> Dale Tronrud
>>> 
>>> 
>>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>>> Just trying to understand the basic issues here.  How could refining 
>>> directly against intensities solve the fundamental problem of negative 
>>> intensity values?
>>> 
>>> 
>>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>>> mailto:hofkristall...@gmail.com>> wrote:
>>> As a maybe better alternative, we should (once again) consider to refine 
>>> against intensities (and I guess George Sheldrick would agree here).
>>> 
>>> I have a simple question - what exactly, short of some sort of historic 
>>> inertia (or memory lapse), is the reason NOT to refine against intensities?
>>> 
>>> Best, BR
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> This e-mail and any attachments may contain confidential, copyright and or 
>>> privileged material, and are for the use of the intended addressee only. If 
>>> you are not the intended addressee or an authorised recipient of the 
>>> addressee please notify us of receipt by returning the e-mail and do not 
>>> use, copy, retain, distribute or disclose the information in or attached to 
>>> the e-mail.
>>> 
>>> Any opinions expressed within this e-mail are those of the individual and 
>>> not necessarily of Diamond Light Source Ltd. 
>>> 
>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>>> attachments are free from viruses and we cannot accept liability for any 
>>> damage which you may sustain as a result of software viruses which may be 
>>> transmitted in or with the message.
>>> 
>>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>>> and Wales with its registered office at Diamond House, Harwell Science and 
>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Andrew Leslie
The integration programs report a negative intensity simply because that is the 
observation. 

Because of noise in the Xray background, in a large sample of intensity 
estimates for reflections whose true intensity is very very small one will 
inevitably get some measurements that are negative. These must not be rejected 
because this will lead to bias (because some of these intensities for symmetry 
mates will be estimated too large rather than too small). It is not unusual for 
the intensity to remain negative even after averaging symmetry mates.

Andrew


On 20 Jun 2013, at 11:49, Douglas Theobald  wrote:

> Seems to me that the negative Is should be dealt with early on, in the 
> integration step.  Why exactly do integration programs report negative Is to 
> begin with?
> 
> 
> On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:
> 
>> Wouldnt be possible to take advantage of negative Is to extrapolate/estimate 
>> the decay of scattering background (kind of Wilson plot of background 
>> scattering) to flat out the background and push all the Is to positive 
>> values?
>> 
>> More of a question rather than a suggestion ...
>> 
>> D
>> 
>> 
>> 
>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
>> Tickle
>> Sent: 20 June 2013 17:34
>> To: ccp4bb
>> Subject: Re: [ccp4bb] ctruncate bug?
>> 
>> Yes higher R factors is the usual reason people don't like I-based 
>> refinement!
>> 
>> Anyway, refining against Is doesn't solve the problem, it only postpones it: 
>> you still need the Fs for maps! (though errors in Fs may be less critical 
>> then).
>> -- Ian
>> 
>> On 20 June 2013 17:20, Dale Tronrud 
>> mailto:det...@uoxray.uoregon.edu>> wrote:
>>  If you are refining against F's you have to find some way to avoid
>> calculating the square root of a negative number.  That is why people
>> have historically rejected negative I's and why Truncate and cTruncate
>> were invented.
>> 
>>  When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
>> care less if Iobs happens to be negative.
>> 
>>  As for why people still refine against F...  When I was distributing
>> a refinement package it could refine against I but no one wanted to do
>> that.  The "R values" ended up higher, but they were looking at R
>> values calculated from F's.  Of course the F based R values are lower
>> when you refine against F's, that means nothing.
>> 
>>  If we could get the PDB to report both the F and I based R values
>> for all models maybe we could get a start toward moving to intensity
>> refinement.
>> 
>> Dale Tronrud
>> 
>> 
>> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>> Just trying to understand the basic issues here.  How could refining 
>> directly against intensities solve the fundamental problem of negative 
>> intensity values?
>> 
>> 
>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>> mailto:hofkristall...@gmail.com>> wrote:
>> As a maybe better alternative, we should (once again) consider to refine 
>> against intensities (and I guess George Sheldrick would agree here).
>> 
>> I have a simple question - what exactly, short of some sort of historic 
>> inertia (or memory lapse), is the reason NOT to refine against intensities?
>> 
>> Best, BR
>> 
>> 
>> 
>> 
>> -- 
>> 
>> This e-mail and any attachments may contain confidential, copyright and or 
>> privileged material, and are for the use of the intended addressee only. If 
>> you are not the intended addressee or an authorised recipient of the 
>> addressee please notify us of receipt by returning the e-mail and do not 
>> use, copy, retain, distribute or disclose the information in or attached to 
>> the e-mail.
>> 
>> Any opinions expressed within this e-mail are those of the individual and 
>> not necessarily of Diamond Light Source Ltd. 
>> 
>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
>> attachments are free from viruses and we cannot accept liability for any 
>> damage which you may sustain as a result of software viruses which may be 
>> transmitted in or with the message.
>> 
>> Diamond Light Source Limited (company no. 4375679). Registered in England 
>> and Wales with its registered office at Diamond House, Harwell Science and 
>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Ian Tickle
The prior knowledge about Is is not merely that they are >= 0, it's more
than that: we know they have an (approximate) Wilson distribution.  AFAICS
incorporating that information at the integration stage would be almost
equivalent to the F&W procedure.  In fact it would probably not be as good
since the experimental estimates of I do have an (approximate) Gaussian
distribution, being the difference of 2 Poisson distributions with large
means (or at least >~ 10).  The corrected Is, being the best estimates of
the true Is would as you point out not have a Gaussian distribution, and
some of the assumptions made in averaging equivalent reflections would not
be valid.  You could still use the corrected Is instead of the experimental
Is in refinement but I suspect it would not make any difference to the
results (except you would get lower R factors!).

-- Ian


On 20 June 2013 17:49, Douglas Theobald  wrote:

> Seems to me that the negative Is should be dealt with early on, in the
> integration step.  Why exactly do integration programs report negative Is
> to begin with?
>
>
> On Jun 20, 2013, at 12:45 PM, Dom Bellini 
> wrote:
>
> > Wouldnt be possible to take advantage of negative Is to
> extrapolate/estimate the decay of scattering background (kind of Wilson
> plot of background scattering) to flat out the background and push all the
> Is to positive values?
> >
> > More of a question rather than a suggestion ...
> >
> > D
> >
> >
> >
> > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> Ian Tickle
> > Sent: 20 June 2013 17:34
> > To: ccp4bb
> > Subject: Re: [ccp4bb] ctruncate bug?
> >
> > Yes higher R factors is the usual reason people don't like I-based
> refinement!
> >
> > Anyway, refining against Is doesn't solve the problem, it only postpones
> it: you still need the Fs for maps! (though errors in Fs may be less
> critical then).
> > -- Ian
> >
> > On 20 June 2013 17:20, Dale Tronrud  det...@uoxray.uoregon.edu>> wrote:
> >   If you are refining against F's you have to find some way to avoid
> > calculating the square root of a negative number.  That is why people
> > have historically rejected negative I's and why Truncate and cTruncate
> > were invented.
> >
> >   When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> > care less if Iobs happens to be negative.
> >
> >   As for why people still refine against F...  When I was distributing
> > a refinement package it could refine against I but no one wanted to do
> > that.  The "R values" ended up higher, but they were looking at R
> > values calculated from F's.  Of course the F based R values are lower
> > when you refine against F's, that means nothing.
> >
> >   If we could get the PDB to report both the F and I based R values
> > for all models maybe we could get a start toward moving to intensity
> > refinement.
> >
> > Dale Tronrud
> >
> >
> > On 06/20/2013 09:06 AM, Douglas Theobald wrote:
> > Just trying to understand the basic issues here.  How could refining
> directly against intensities solve the fundamental problem of negative
> intensity values?
> >
> >
> > On Jun 20, 2013, at 11:34 AM, Bernhard Rupp  <mailto:hofkristall...@gmail.com>> wrote:
> > As a maybe better alternative, we should (once again) consider to refine
> against intensities (and I guess George Sheldrick would agree here).
> >
> > I have a simple question - what exactly, short of some sort of historic
> inertia (or memory lapse), is the reason NOT to refine against intensities?
> >
> > Best, BR
> >
> >
> >
> >
> > --
> >
> > This e-mail and any attachments may contain confidential, copyright and
> or privileged material, and are for the use of the intended addressee only.
> If you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> >
> > Any opinions expressed within this e-mail are those of the individual
> and not necessarily of Diamond Light Source Ltd.
> >
> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> >
> > Diamond Light Source Limited (company no. 4375679). Registered in
> England and Wales with its registered office at Diamond House, Harwell
> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Dom Bellini
Sorry, perhaps what I was thinking was to use the Icalc to proportionally push 
up the Iobs to push the negative Is to positive numbers.

But I guess that would bias the Iobs  ?

Again just questions rather than suggestions.

D 

-Original Message-
From: Douglas Theobald [mailto:dtheob...@brandeis.edu] 
Sent: 20 June 2013 17:49
To: Bellini, Domenico (DLSLtd,RAL,DIA); ccp4bb
Subject: Re: [ccp4bb] ctruncate bug?

Seems to me that the negative Is should be dealt with early on, in the 
integration step.  Why exactly do integration programs report negative Is to 
begin with?


On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:

> Wouldnt be possible to take advantage of negative Is to extrapolate/estimate 
> the decay of scattering background (kind of Wilson plot of background 
> scattering) to flat out the background and push all the Is to positive values?
> 
> More of a question rather than a suggestion ...
> 
> D
> 
> 
> 
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
> Ian Tickle
> Sent: 20 June 2013 17:34
> To: ccp4bb
> Subject: Re: [ccp4bb] ctruncate bug?
> 
> Yes higher R factors is the usual reason people don't like I-based refinement!
> 
> Anyway, refining against Is doesn't solve the problem, it only postpones it: 
> you still need the Fs for maps! (though errors in Fs may be less critical 
> then).
> -- Ian
> 
> On 20 June 2013 17:20, Dale Tronrud 
> mailto:det...@uoxray.uoregon.edu>> wrote:
>   If you are refining against F's you have to find some way to avoid 
> calculating the square root of a negative number.  That is why people 
> have historically rejected negative I's and why Truncate and cTruncate 
> were invented.
> 
>   When refining against I, the calculation of (Iobs - Icalc)^2 
> couldn't care less if Iobs happens to be negative.
> 
>   As for why people still refine against F...  When I was distributing 
> a refinement package it could refine against I but no one wanted to do 
> that.  The "R values" ended up higher, but they were looking at R 
> values calculated from F's.  Of course the F based R values are lower 
> when you refine against F's, that means nothing.
> 
>   If we could get the PDB to report both the F and I based R values 
> for all models maybe we could get a start toward moving to intensity 
> refinement.
> 
> Dale Tronrud
> 
> 
> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
> Just trying to understand the basic issues here.  How could refining directly 
> against intensities solve the fundamental problem of negative intensity 
> values?
> 
> 
> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
> mailto:hofkristall...@gmail.com>> wrote:
> As a maybe better alternative, we should (once again) consider to refine 
> against intensities (and I guess George Sheldrick would agree here).
> 
> I have a simple question - what exactly, short of some sort of historic 
> inertia (or memory lapse), is the reason NOT to refine against intensities?
> 
> Best, BR
> 
> 
> 
> 
> --
> 
> This e-mail and any attachments may contain confidential, copyright and or 
> privileged material, and are for the use of the intended addressee only. If 
> you are not the intended addressee or an authorised recipient of the 
> addressee please notify us of receipt by returning the e-mail and do not use, 
> copy, retain, distribute or disclose the information in or attached to the 
> e-mail.
> 
> Any opinions expressed within this e-mail are those of the individual and not 
> necessarily of Diamond Light Source Ltd. 
> 
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
> attachments are free from viruses and we cannot accept liability for any 
> damage which you may sustain as a result of software viruses which may be 
> transmitted in or with the message.
> 
> Diamond Light Source Limited (company no. 4375679). Registered in 
> England and Wales with its registered office at Diamond House, Harwell 
> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United 
> Kingdom
> 
> 
> 
> 
> 
> 
> 
> 
> 




-- 

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 

Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses

Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
Seems to me that the negative Is should be dealt with early on, in the 
integration step.  Why exactly do integration programs report negative Is to 
begin with?


On Jun 20, 2013, at 12:45 PM, Dom Bellini  wrote:

> Wouldnt be possible to take advantage of negative Is to extrapolate/estimate 
> the decay of scattering background (kind of Wilson plot of background 
> scattering) to flat out the background and push all the Is to positive values?
> 
> More of a question rather than a suggestion ...
> 
> D
> 
> 
> 
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian 
> Tickle
> Sent: 20 June 2013 17:34
> To: ccp4bb
> Subject: Re: [ccp4bb] ctruncate bug?
> 
> Yes higher R factors is the usual reason people don't like I-based refinement!
> 
> Anyway, refining against Is doesn't solve the problem, it only postpones it: 
> you still need the Fs for maps! (though errors in Fs may be less critical 
> then).
> -- Ian
> 
> On 20 June 2013 17:20, Dale Tronrud 
> mailto:det...@uoxray.uoregon.edu>> wrote:
>   If you are refining against F's you have to find some way to avoid
> calculating the square root of a negative number.  That is why people
> have historically rejected negative I's and why Truncate and cTruncate
> were invented.
> 
>   When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> care less if Iobs happens to be negative.
> 
>   As for why people still refine against F...  When I was distributing
> a refinement package it could refine against I but no one wanted to do
> that.  The "R values" ended up higher, but they were looking at R
> values calculated from F's.  Of course the F based R values are lower
> when you refine against F's, that means nothing.
> 
>   If we could get the PDB to report both the F and I based R values
> for all models maybe we could get a start toward moving to intensity
> refinement.
> 
> Dale Tronrud
> 
> 
> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
> Just trying to understand the basic issues here.  How could refining directly 
> against intensities solve the fundamental problem of negative intensity 
> values?
> 
> 
> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
> mailto:hofkristall...@gmail.com>> wrote:
> As a maybe better alternative, we should (once again) consider to refine 
> against intensities (and I guess George Sheldrick would agree here).
> 
> I have a simple question - what exactly, short of some sort of historic 
> inertia (or memory lapse), is the reason NOT to refine against intensities?
> 
> Best, BR
> 
> 
> 
> 
> -- 
> 
> This e-mail and any attachments may contain confidential, copyright and or 
> privileged material, and are for the use of the intended addressee only. If 
> you are not the intended addressee or an authorised recipient of the 
> addressee please notify us of receipt by returning the e-mail and do not use, 
> copy, retain, distribute or disclose the information in or attached to the 
> e-mail.
> 
> Any opinions expressed within this e-mail are those of the individual and not 
> necessarily of Diamond Light Source Ltd. 
> 
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
> attachments are free from viruses and we cannot accept liability for any 
> damage which you may sustain as a result of software viruses which may be 
> transmitted in or with the message.
> 
> Diamond Light Source Limited (company no. 4375679). Registered in England and 
> Wales with its registered office at Diamond House, Harwell Science and 
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
On Jun 20, 2013, at 12:20 PM, Dale Tronrud  wrote:

>   If you are refining against F's you have to find some way to avoid
> calculating the square root of a negative number.  That is why people
> have historically rejected negative I's and why Truncate and cTruncate
> were invented.
> 
>   When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> care less if Iobs happens to be negative.

But we know that Is can't be negative.  Using (Iobs - Icalc)^2 does not 
incorporate that bit of physics, and it implicitly assumes a Gaussian 
distribution for the Is, which is impossible for a variable that is positive 
semi-definite.  Refining against (Iobs - Icalc)^2 is mathematically equivalent 
to shifting every I by the most negative I and refining against that, a crude 
baseline correction that I doubt most people would consider valid.  
Transforming the data to Fs at least makes the Gaussian assumption plausible, 
and I always assumed that was one main reason for working with Fs (since all 
the refinement programs assume Gaussians).  

>   As for why people still refine against F...  When I was distributing
> a refinement package it could refine against I but no one wanted to do
> that.  The "R values" ended up higher, but they were looking at R
> values calculated from F's.  Of course the F based R values are lower
> when you refine against F's, that means nothing.

R-values also implicitly assume a Gaussian, right?

> 
>   If we could get the PDB to report both the F and I based R values
> for all models maybe we could get a start toward moving to intensity
> refinement.
> 
> Dale Tronrud
> 
> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>> Just trying to understand the basic issues here.  How could refining 
>> directly against intensities solve the fundamental problem of negative 
>> intensity values?
>> 
>> 
>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp  wrote:
>> 
 As a maybe better alternative, we should (once again) consider to refine 
 against intensities (and I guess George Sheldrick would agree here).
>>> 
>>> I have a simple question - what exactly, short of some sort of historic 
>>> inertia (or memory lapse), is the reason NOT to refine against intensities?
>>> 
>>> Best, BR


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Dom Bellini
Wouldnt be possible to take advantage of negative Is to extrapolate/estimate 
the decay of scattering background (kind of Wilson plot of background 
scattering) to flat out the background and push all the Is to positive values?

More of a question rather than a suggestion ...

D



From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Ian Tickle
Sent: 20 June 2013 17:34
To: ccp4bb
Subject: Re: [ccp4bb] ctruncate bug?

Yes higher R factors is the usual reason people don't like I-based refinement!

Anyway, refining against Is doesn't solve the problem, it only postpones it: 
you still need the Fs for maps! (though errors in Fs may be less critical then).
-- Ian

On 20 June 2013 17:20, Dale Tronrud 
mailto:det...@uoxray.uoregon.edu>> wrote:
   If you are refining against F's you have to find some way to avoid
calculating the square root of a negative number.  That is why people
have historically rejected negative I's and why Truncate and cTruncate
were invented.

   When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
care less if Iobs happens to be negative.

   As for why people still refine against F...  When I was distributing
a refinement package it could refine against I but no one wanted to do
that.  The "R values" ended up higher, but they were looking at R
values calculated from F's.  Of course the F based R values are lower
when you refine against F's, that means nothing.

   If we could get the PDB to report both the F and I based R values
for all models maybe we could get a start toward moving to intensity
refinement.

Dale Tronrud


On 06/20/2013 09:06 AM, Douglas Theobald wrote:
Just trying to understand the basic issues here.  How could refining directly 
against intensities solve the fundamental problem of negative intensity values?


On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
mailto:hofkristall...@gmail.com>> wrote:
As a maybe better alternative, we should (once again) consider to refine 
against intensities (and I guess George Sheldrick would agree here).

I have a simple question - what exactly, short of some sort of historic inertia 
(or memory lapse), is the reason NOT to refine against intensities?

Best, BR




-- 

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 

Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.

Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

 









Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Ian Tickle
Yes higher R factors is the usual reason people don't like I-based
refinement!

Anyway, refining against Is doesn't solve the problem, it only postpones
it: you still need the Fs for maps! (though errors in Fs may be less
critical then).

-- Ian


On 20 June 2013 17:20, Dale Tronrud  wrote:

>If you are refining against F's you have to find some way to avoid
> calculating the square root of a negative number.  That is why people
> have historically rejected negative I's and why Truncate and cTruncate
> were invented.
>
>When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
> care less if Iobs happens to be negative.
>
>As for why people still refine against F...  When I was distributing
> a refinement package it could refine against I but no one wanted to do
> that.  The "R values" ended up higher, but they were looking at R
> values calculated from F's.  Of course the F based R values are lower
> when you refine against F's, that means nothing.
>
>If we could get the PDB to report both the F and I based R values
> for all models maybe we could get a start toward moving to intensity
> refinement.
>
> Dale Tronrud
>
>
> On 06/20/2013 09:06 AM, Douglas Theobald wrote:
>
>> Just trying to understand the basic issues here.  How could refining
>> directly against intensities solve the fundamental problem of negative
>> intensity values?
>>
>>
>> On Jun 20, 2013, at 11:34 AM, Bernhard Rupp 
>> wrote:
>>
>>  As a maybe better alternative, we should (once again) consider to refine
 against intensities (and I guess George Sheldrick would agree here).

>>>
>>> I have a simple question - what exactly, short of some sort of historic
>>> inertia (or memory lapse), is the reason NOT to refine against intensities?
>>>
>>> Best, BR
>>>
>>


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Dale Tronrud

   If you are refining against F's you have to find some way to avoid
calculating the square root of a negative number.  That is why people
have historically rejected negative I's and why Truncate and cTruncate
were invented.

   When refining against I, the calculation of (Iobs - Icalc)^2 couldn't
care less if Iobs happens to be negative.

   As for why people still refine against F...  When I was distributing
a refinement package it could refine against I but no one wanted to do
that.  The "R values" ended up higher, but they were looking at R
values calculated from F's.  Of course the F based R values are lower
when you refine against F's, that means nothing.

   If we could get the PDB to report both the F and I based R values
for all models maybe we could get a start toward moving to intensity
refinement.

Dale Tronrud

On 06/20/2013 09:06 AM, Douglas Theobald wrote:

Just trying to understand the basic issues here.  How could refining directly 
against intensities solve the fundamental problem of negative intensity values?


On Jun 20, 2013, at 11:34 AM, Bernhard Rupp  wrote:


As a maybe better alternative, we should (once again) consider to refine 
against intensities (and I guess George Sheldrick would agree here).


I have a simple question - what exactly, short of some sort of historic inertia 
(or memory lapse), is the reason NOT to refine against intensities?

Best, BR


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Douglas Theobald
Just trying to understand the basic issues here.  How could refining directly 
against intensities solve the fundamental problem of negative intensity values?


On Jun 20, 2013, at 11:34 AM, Bernhard Rupp  wrote:

>> As a maybe better alternative, we should (once again) consider to refine 
>> against intensities (and I guess George Sheldrick would agree here).
> 
> I have a simple question - what exactly, short of some sort of historic 
> inertia (or memory lapse), is the reason NOT to refine against intensities? 
> 
> Best, BR


Re: [ccp4bb] ctruncate bug?

2013-06-20 Thread Bernhard Rupp
>As a maybe better alternative, we should (once again) consider to refine 
>against intensities (and I guess George Sheldrick would agree here).

I have a simple question - what exactly, short of some sort of historic inertia 
(or memory lapse), is the reason NOT to refine against intensities? 

Best, BR


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Kay Diederichs
On Wed, 19 Jun 2013 11:01:22 -0400, Ed Pozharski  wrote:

>Dear Kay and Jeff,
>
>frankly, I do not see much justification for any rejection based on
>h-cutoff.

I agree

>
>French&Wilson only talk about I/sigI cutoff, which also warrants further
>scrutiny.  It probably could be argued that reflections with I/sigI<-4
>are still more likely to be weak than strong so F~0 seems to make more
>sense than rejection.  The nature of these outliers should probably be
>resolved at the integration stage, but these really aren't that
>numerous.
>
>As for h>-4 requirement, I don't see French&Wilson even arguing for that
>anywhere in the paper.  h variable does not reflect any physical
>quantity that would come with prior expectation of being non-negative
>and while the posterior of the true intensity (for acentric reflections)
>is distributed according to the truncated normal distribution N(sigma*h,
>sigma^2), I don't really see why h<-4 is "bad".
>
>From what I understand, Kay has removed h-cutoff from XDSCONV (or never
>included it in the first place).  Perhaps ctruncate/phenix should change

there was a h<-4 cutoff in previous versions of XDSCONV which has been removed.

Concerning removal of negative observations/reflections, it may be justified to 
refer to a very recent paper - see 
http://journals.iucr.org/d/issues/2013/07/00/ba5192/index.html  

best,

Kay

>too?  Or am I misunderstanding something and there is some rationale for
>h<-4 cutoff?
>
>Cheers,
>
>Ed.
>
>
>On Wed, 2013-06-19 at 06:47 +0100, Kay Diederichs wrote:
>> Hi Jeff,
>>
>> what I did in XDSCONV is to mitigate the numerical difficulties associated 
>> with low h (called "Score" in XDSCONV output) values, and I removed the h < 
>> -4 cutoff. The more negative h becomes, the closer to zero is the resulting 
>> amplitude, so not applying a h cutoff makes sense (to me, anyway).
>> XDSCONV still applies the I < -3*sigma cutoff, by default.
>>
>> thanks,
>>
>> Kay
>
>--
>I don't know why the sacrifice thing didn't work.
>Science behind it seemed so solid.
>Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Jeff Headd
Hi Ed,

While I don't think French and Wilson argue explicitly for the h>-4.0
requirement in their main manuscript, if you look at the source code
included in the supplementary material for this paper, they include this in
their implementation, which is what I worked from.

Charles, do you happen to know why this was included in the first place,
other than it limits the size of the look-up table?

Jeff


On Wed, Jun 19, 2013 at 11:01 AM, Ed Pozharski wrote:

> Dear Kay and Jeff,
>
> frankly, I do not see much justification for any rejection based on
> h-cutoff.
>
> French&Wilson only talk about I/sigI cutoff, which also warrants further
> scrutiny.  It probably could be argued that reflections with I/sigI<-4
> are still more likely to be weak than strong so F~0 seems to make more
> sense than rejection.  The nature of these outliers should probably be
> resolved at the integration stage, but these really aren't that
> numerous.
>
> As for h>-4 requirement, I don't see French&Wilson even arguing for that
> anywhere in the paper.  h variable does not reflect any physical
> quantity that would come with prior expectation of being non-negative
> and while the posterior of the true intensity (for acentric reflections)
> is distributed according to the truncated normal distribution N(sigma*h,
> sigma^2), I don't really see why h<-4 is "bad".
>
> From what I understand, Kay has removed h-cutoff from XDSCONV (or never
> included it in the first place).  Perhaps ctruncate/phenix should change
> too?  Or am I misunderstanding something and there is some rationale for
> h<-4 cutoff?
>
> Cheers,
>
> Ed.
>
>
> On Wed, 2013-06-19 at 06:47 +0100, Kay Diederichs wrote:
> > Hi Jeff,
> >
> > what I did in XDSCONV is to mitigate the numerical difficulties
> associated with low h (called "Score" in XDSCONV output) values, and I
> removed the h < -4 cutoff. The more negative h becomes, the closer to zero
> is the resulting amplitude, so not applying a h cutoff makes sense (to me,
> anyway).
> > XDSCONV still applies the I < -3*sigma cutoff, by default.
> >
> > thanks,
> >
> > Kay
>
> --
> I don't know why the sacrifice thing didn't work.
> Science behind it seemed so solid.
> Julian, King of Lemurs
>


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Ed Pozharski
Dear Kay and Jeff,

frankly, I do not see much justification for any rejection based on
h-cutoff.  

French&Wilson only talk about I/sigI cutoff, which also warrants further
scrutiny.  It probably could be argued that reflections with I/sigI<-4
are still more likely to be weak than strong so F~0 seems to make more
sense than rejection.  The nature of these outliers should probably be
resolved at the integration stage, but these really aren't that
numerous.

As for h>-4 requirement, I don't see French&Wilson even arguing for that
anywhere in the paper.  h variable does not reflect any physical
quantity that would come with prior expectation of being non-negative
and while the posterior of the true intensity (for acentric reflections)
is distributed according to the truncated normal distribution N(sigma*h,
sigma^2), I don't really see why h<-4 is "bad".

>From what I understand, Kay has removed h-cutoff from XDSCONV (or never
included it in the first place).  Perhaps ctruncate/phenix should change
too?  Or am I misunderstanding something and there is some rationale for
h<-4 cutoff?

Cheers,

Ed.


On Wed, 2013-06-19 at 06:47 +0100, Kay Diederichs wrote:
> Hi Jeff,
> 
> what I did in XDSCONV is to mitigate the numerical difficulties associated 
> with low h (called "Score" in XDSCONV output) values, and I removed the h < 
> -4 cutoff. The more negative h becomes, the closer to zero is the resulting 
> amplitude, so not applying a h cutoff makes sense (to me, anyway).
> XDSCONV still applies the I < -3*sigma cutoff, by default.
> 
> thanks,
> 
> Kay

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Charles Ballard
To add to the discussion a plot of the acentric K&W from -10 to 10 (normalised 
wrt sqrt(sigma) ).  ftp://ftp.ccp4.ac.uk/ccb/aZF2.pdf,
black dots are F/sqrt(sigma) while blue is corresponding plot for sigma

The value drops from 0.42 to 0.28 going from h = -4 to h = -10.

Note:  for this we are heading for F/sigF of ~1.92.


In ctruncate the norm is corrected (somewhat) for anisotropy, while for cases 
with twinning or NCS the default is to use a flat prior (in intensity).

Charles


On 19 Jun 2013, at 14:29, Kay Diederichs wrote:

> On Wed, 19 Jun 2013 14:19:19 +0100, Kay Diederichs 
>  wrote:
> 
>> I wonder if problem b) is why Evans and Murshudov  observe little 
>> contribution of reflections in shells with CC1/2 below 0.27 in one of their 
>> test cases, which had very anisotropic data.
> 
> sorry, forgot the reference. The paper is "How good are my data and what is 
> the resolution?" by PR Evans and GN Murshudov (2013) ActaD 69, 1204-1214, 
> accessible at http://journals.iucr.org/d/issues/2013/07/00/ba5190/index.html
> 
> Kay

--
Scanned by iCritical.


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Kay Diederichs
On Wed, 19 Jun 2013 14:19:19 +0100, Kay Diederichs 
 wrote:

>I wonder if problem b) is why Evans and Murshudov  observe little contribution 
>of reflections in shells with CC1/2 below 0.27 in one of their test cases, 
>which had very anisotropic data.

sorry, forgot the reference. The paper is "How good are my data and what is the 
resolution?" by PR Evans and GN Murshudov (2013) ActaD 69, 1204-1214, 
accessible at http://journals.iucr.org/d/issues/2013/07/00/ba5190/index.html

Kay


Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Kay Diederichs
Hi James,

Concerning XDSCONV, I cannot reproduce your plot. A Linux (64bit) program 
"test_xdsconv" which allows to input  I, sigI, , and mode, where
I: measured intensity
sigI: sigma(I)
: average I in resolution shell
mode: -1/0/1 for truncated normal/acentric/centric prior
is at ftp://turn5.biologie.uni-konstanz.de/pub/test_xdsconv  . It allows to 
test individual combinations of  I, sigI,  and find out what, according to 
French&Wilson 1978, the value of J, sigJ, F, sigF is (where J stands for the 
posterior intensity), according to the XDSCONV implementation. The user has to 
enter the numbers. Some example input and output is:
1 1 1 0
 0.79788 0.60281 0.82218 0.34915
1 1 0.1 0
 0.10851 0.10731 0.29235 0.15180
1 1 0.01 0
 0.00993 0.01025 0.08854 0.04568
0.1 0.1 0.01 0
 0.01085 0.01073 0.09245 0.04800
100 1 0.01 0
 0.79788 0.60281 0.82218 0.34915
These results show that e.g. in a resolution shell with =0.01 , the 
posterior (i.e. most likely given the prior) intensity is 0.00993. In other 
words, a strong intensity (I,sigI=1,1) in a weak- is unexpected and not 
believable, and is thus strongly weighted down. 
The output of test_xdsconv is meant to help understand F&W1978 and its XDSCONV 
implementation, and I cannot find a problem with it. If anybody does, I'd love 
to hear about it.

What definitively _is_ a problem is the fact that the Wilson prior is not a 
good one in several situations that I can think of:
a) twinning: the distribution of intensities is no longer a Wilson distribution
b) anisotropy: the overall  will be used as a prior instead of using the 
anisotropic , in the direction of reciprocal space where the reflection 
resides that we're interested in. The effect of this is that the amplitudes 
resulting from the F&W procedure will be "more equal" than they should be, thus 
the amount of apparent anisotropy will be less for the weak high-resolution 
shells than it is in the stronger shells.
c) translational NCS or otherwise modulated intensities: this will greatly 
screw up the amplitudes because the prior is the overall mean intensity instead 
of the mean intensity of the "intensity class" a reflection belongs to.

I wonder if problem b) is why Evans and Murshudov  observe little contribution 
of reflections in shells with CC1/2 below 0.27 in one of their test cases, 
which had very anisotropic data.

Anyway, it seems to me that the results from an analysis of the data should be 
fed back into the F&W procedure. As a maybe better alternative, we should (once 
again) consider to refine against intensities (and I guess George Sheldrick 
would agree here).

best,

Kay

On Wed, 19 Jun 2013 12:11:25 +1000, James M Holton  wrote:

>Actually, Jeff, the problem goes even deeper than that. Have a look at these 
>Wilson plots:
>http://bl831.als.lbl.gov/~jamesh/wilson/wilsons.png
>
>For these plots I took Fs from a unit cell full of a random collection of 
>atoms, squared them, added Gaussian noise with RMS = 1, and then ran them back 
>through various programs. The "plateau" at F ~ 1 which overestimates some 
>"true intensities" by almost a factor of a million arises because French & 
>Wilson did not think it "right" to use the slope of the Wilson plot as a 
>source of prior information. A bit naive, I suppose, because we can actually 
>be REALLY sure that 1.0 A intensities are "zero" if the data drop into the 
>noise at 3A. Nevertheless, no one has ever augmented the F&W procedure to take 
>this prior knowledge into account. 
>
>A shame! Because if they did there would be no need for a resolution cut-off 
>at all. 
>
>Sent from a tiny virtual keyboard on a plane about to take off
>
>On Jun 19, 2013, at 1:08 AM, Jeff Headd  wrote:
>
>> Hi Ed,
>> 
>> Thanks for including the code block.
>> 
>> I've looked back over the F&W paper, and the reason for the h<-4.0 cutoff is 
>> that the entire premise assumes that the true intensities are normally 
>> distributed, and the formulation breaks down at that far out of an 
>> "outlier". For most datasets I haven't seen this assumption to be a huge 
>> problem, but in some cases the assumption of a normal distribution is not 
>> reasonable, and you'll end up with a higher percentage of rejected weak 
>> intensities.
>> 
>> Kay, does the new XDSCONV method treat the negative intensities in some way 
>> to make them positive, or does this just work with very weak positive 
>> intensities?
>> 
>> Jeff
>> 
>> 
>> On Tue, Jun 18, 2013 at 12:15 AM, Ed Pozharski  
>> wrote:
>> Jeff,
>> 
>> thanks - I can see the same equation and cutoff applied in ctruncate source. 
>>Here is the relevant part of the code
>> 
>>> // Bayesian statistics tells us to modify I/sigma by subtracting 
>>> off sigma/S
>>> // where S is the mean intensity in the resolution shell
>>> h = I/sigma - sigma/S;
>>> // reject as unphysical reflections for which I < -3.7 sigma, or h 

Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread Kay Diederichs
Hi Jeff,

what I did in XDSCONV is to mitigate the numerical difficulties associated with 
low h (called "Score" in XDSCONV output) values, and I removed the h < -4 
cutoff. The more negative h becomes, the closer to zero is the resulting 
amplitude, so not applying a h cutoff makes sense (to me, anyway).
XDSCONV still applies the I < -3*sigma cutoff, by default.

thanks,

Kay


Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread James M Holton
Actually, Jeff, the problem goes even deeper than that. Have a look at these 
Wilson plots:
http://bl831.als.lbl.gov/~jamesh/wilson/wilsons.png

For these plots I took Fs from a unit cell full of a random collection of 
atoms, squared them, added Gaussian noise with RMS = 1, and then ran them back 
through various programs. The "plateau" at F ~ 1 which overestimates some "true 
intensities" by almost a factor of a million arises because French & Wilson did 
not think it "right" to use the slope of the Wilson plot as a source of prior 
information. A bit naive, I suppose, because we can actually be REALLY sure 
that 1.0 A intensities are "zero" if the data drop into the noise at 3A. 
Nevertheless, no one has ever augmented the F&W procedure to take this prior 
knowledge into account. 

A shame! Because if they did there would be no need for a resolution cut-off at 
all. 

Sent from a tiny virtual keyboard on a plane about to take off

On Jun 19, 2013, at 1:08 AM, Jeff Headd  wrote:

> Hi Ed,
> 
> Thanks for including the code block.
> 
> I've looked back over the F&W paper, and the reason for the h<-4.0 cutoff is 
> that the entire premise assumes that the true intensities are normally 
> distributed, and the formulation breaks down at that far out of an "outlier". 
> For most datasets I haven't seen this assumption to be a huge problem, but in 
> some cases the assumption of a normal distribution is not reasonable, and 
> you'll end up with a higher percentage of rejected weak intensities.
> 
> Kay, does the new XDSCONV method treat the negative intensities in some way 
> to make them positive, or does this just work with very weak positive 
> intensities?
> 
> Jeff
> 
> 
> On Tue, Jun 18, 2013 at 12:15 AM, Ed Pozharski  wrote:
> Jeff,
> 
> thanks - I can see the same equation and cutoff applied in ctruncate source.  
>   Here is the relevant part of the code
> 
>> // Bayesian statistics tells us to modify I/sigma by subtracting off 
>> sigma/S
>> // where S is the mean intensity in the resolution shell
>> h = I/sigma - sigma/S;
>> // reject as unphysical reflections for which I < -3.7 sigma, or h < 
>> -4.0
>> if (I/sigma < -3.7 || h < -4.0 ) {
>> nrej++;
>> if (debug) printf("unphys: %f %f %f %f\n",I,sigma,S,h);
>> return(0);
>> }
> 
> This seems to be arbitrary cutoff choice, given that they are hard-coded.  At 
> the very least, cutoffs should depend on the total number of reflections to 
> represent famyliwise error rates.
> 
> It is however the h-based rejection that seems most problematic to me.  In 
> the dataset in question, up to 20% reflections are rejected in the highest 
> resolution shell (granted, I/sigI there is 0.33).  I would expect that 
> reflections are rejected when they are deemed to be outliers due to reasons 
> other than statistical errors (e.g. streaks, secondary lattice spots in the 
> background, etc).  I must say that this was done with extremely good quality 
> data, so I   doubt that 1 out of 5 reflections returns some physically 
> impossible measurement.
> 
> What is happening is that =3S in the highest resolution shell, and for 
> many reflection h<-4.0.  This does not mean that reflections are "unphysical" 
> though, just that shell as a whole has mostly weak data (in this case 89% 
> with I/sigI<2 and 73% with I/sigI<1).
> 
> What is counterintuitive is why do I have to discard reflections that are 
> just plain weak, and not really outliers?
> 
> Cheers,
> 
> Ed.
> 
> 
> 
> 
> On 06/17/2013 10:29 PM, Jeff Headd wrote:
>> Hi Ed,
>> 
>> I'm not directly familiar with the ctruncate implementation of French and 
>> Wilson, but from the implementation that I put into Phenix (based on the 
>> original F&W paper) I can tell you that any reflection where (I/sigI) - 
>> (sigI/mean_intensity) is less than a defined cutoff (in our case -4.0), then 
>> it is rejected. Depending on sigI and the mean intensity for a given shell, 
>> this can result in positive intensities that are also rejected. Typically 
>> this will effect very small positive intensities as you've observed.
>> 
>> I don't recall the mathematical justification for this and don't have a copy 
>> of F&W here at home, but I can have a look in the morning when I get into 
>> the lab and let you know.
>> 
>> Jeff
>> 
>> 
>> On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski  wrote:
>> I noticed something strange when processing a dataset with imosflm.  The
>> final output ctruncate_etc.mtz, contains IMEAN and F columns, which
>> should be the conversion according to French&Wilson.  Problem is that
>> IMEAN has no missing values (100% complete) while F has about 1500
>> missing (~97% complete)!
>> 
>> About half of the reflections that go missing are negative, but half are
>> positive.  About 5x more negative intensities are successfully
>> converted.  Most impacted are high resolution shells with weak signal,
>> so I am sure impact on "n

Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread Jeff Headd
Hi Ed,

Thanks for including the code block.

I've looked back over the F&W paper, and the reason for the h<-4.0 cutoff
is that the entire premise assumes that the true intensities are normally
distributed, and the formulation breaks down at that far out of an
"outlier". For most datasets I haven't seen this assumption to be a huge
problem, but in some cases the assumption of a normal distribution is not
reasonable, and you'll end up with a higher percentage of rejected weak
intensities.

Kay, does the new XDSCONV method treat the negative intensities in some way
to make them positive, or does this just work with very weak positive
intensities?

Jeff


On Tue, Jun 18, 2013 at 12:15 AM, Ed Pozharski wrote:

>  Jeff,
>
> thanks - I can see the same equation and cutoff applied in ctruncate
> source.Here is the relevant part of the code
>
> // Bayesian statistics tells us to modify I/sigma by subtracting
> off sigma/S
> // where S is the mean intensity in the resolution shell
> h = I/sigma - sigma/S;
> // reject as unphysical reflections for which I < -3.7 sigma, or h
> < -4.0
> if (I/sigma < -3.7 || h < -4.0 ) {
> nrej++;
> if (debug) printf("unphys: %f %f %f %f\n",I,sigma,S,h);
> return(0);
> }
>
>
> This seems to be arbitrary cutoff choice, given that they are hard-coded.
> At the very least, cutoffs should depend on the total number of reflections
> to represent famyliwise error rates.
>
> It is however the h-based rejection that seems most problematic to me.  In
> the dataset in question, up to 20% reflections are rejected in the highest
> resolution shell (granted, I/sigI there is 0.33).  I would expect that
> reflections are rejected when they are deemed to be outliers due to reasons
> other than statistical errors (e.g. streaks, secondary lattice spots in the
> background, etc).  I must say that this was done with extremely good
> quality data, so I doubt that 1 out of 5 reflections returns some
> physically impossible measurement.
>
> What is happening is that =3S in the highest resolution shell, and
> for many reflection h<-4.0.  This does not mean that reflections are
> "unphysical" though, just that shell as a whole has mostly weak data (in
> this case 89% with I/sigI<2 and 73% with I/sigI<1).
>
> What is counterintuitive is why do I have to discard reflections that are
> just plain weak, and not really outliers?
>
> Cheers,
>
> Ed.
>
>
>
>
> On 06/17/2013 10:29 PM, Jeff Headd wrote:
>
> Hi Ed,
>
>  I'm not directly familiar with the ctruncate implementation of French
> and Wilson, but from the implementation that I put into Phenix (based on
> the original F&W paper) I can tell you that any reflection where (I/sigI) -
> (sigI/mean_intensity) is less than a defined cutoff (in our case -4.0),
> then it is rejected. Depending on sigI and the mean intensity for a given
> shell, this can result in positive intensities that are also rejected.
> Typically this will effect very small positive intensities as you've
> observed.
>
>  I don't recall the mathematical justification for this and don't have a
> copy of F&W here at home, but I can have a look in the morning when I get
> into the lab and let you know.
>
>  Jeff
>
>
> On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski wrote:
>
>> I noticed something strange when processing a dataset with imosflm.  The
>> final output ctruncate_etc.mtz, contains IMEAN and F columns, which
>> should be the conversion according to French&Wilson.  Problem is that
>> IMEAN has no missing values (100% complete) while F has about 1500
>> missing (~97% complete)!
>>
>> About half of the reflections that go missing are negative, but half are
>> positive.  About 5x more negative intensities are successfully
>> converted.  Most impacted are high resolution shells with weak signal,
>> so I am sure impact on "normal" refinement would be minimal.
>>
>> However, I am just puzzled why would ctruncate reject positive
>> intensities (or negative for that matter - I don't see any cutoff
>> described in the manual and the lowest I/sigI for successfully converted
>> reflection is -18).
>>
>> Is this a bug or feature?
>>
>> Cheers,
>>
>> Ed.
>>
>> --
>> I don't know why the sacrifice thing didn't work.
>> Science behind it seemed so solid.
>> Julian, King of Lemurs
>>
>
>
>
> --
> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
> Julian, King of Lemurs
>
>


Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread Kay Diederichs
Hi Frank,

older versions of XDSCONV, for datasets with weak high-resolution data, printed 
a long list starting with:

 SUSPICIOUS REFLECTIONS NOT INCLUDED IN OUTPUT DATA SET
(at most 100 are listed below)

SUSPICIOUS REFLECTIONS NOT INCLUDED IN OUTPUT DATA SET
(at most 100 are listed below)

Type = 2 for centric, 1 for acentric reflections
  = Mean intensity in resolution shell
Score= Intensity/Sigma - Sigma/(Type*)
 < -4 indicates a suspicious reflection

   hkl  Type  Intensity Sigma   Score

  -115  -651  0.E+00  0.1184E+05  0.2316E+04 -5.11
  -126  -641  0.E+00  0.1236E+05  0.2561E+04 -4.83
  -13   12  -641  0.E+00  0.7067E+04  0.1745E+04 -4.05
  -148  -641  0.E+00  0.1467E+05  0.3092E+04 -4.75
...
...
  TOTAL NUMBER OF SUSPICIOUS REFLECTIONS REMOVED 115


For the latest version, I have re-worked the code such that the amplitude can 
still be obtained from the intensities in even weaker shells. (eventually some 
reflections may _still_ be rejected as suspicious but those are _much_ weaker 
than those rejected by previous versions)

I hope this answers your question ...

best,

Kay


Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread Frank von Delft
Hi Kay - could you elaborate on "the latest version of XDSCONV has a fix 
for it"?   (A look around The Google did not help me.)


Cheers
Frank


On 18/06/2013 11:38, Kay Diederichs wrote:

Dear Ed,

AFAIK James Holton found the same issue, and a similar problem also existed in 
XDSCONV. In my view, it is an example of the problem that most programs so far 
have dealt with weak data in a suboptimal way, and have undergone little 
testing with such data.
The latest version of XDSCONV (March 30, 2013) has a fix for this.
I take the opportunity to announce that a bugfix build of the March 30 XDS package 
release is on Wolfgang Kabsch's server. The bug this fixes is that the "GENERATED 
BY" line in the output file from XSCALE did not mention XSCALE; this made pointless 
fail.

thanks,

Kay


Re: [ccp4bb] ctruncate bug?

2013-06-18 Thread Kay Diederichs
Dear Ed,

AFAIK James Holton found the same issue, and a similar problem also existed in 
XDSCONV. In my view, it is an example of the problem that most programs so far 
have dealt with weak data in a suboptimal way, and have undergone little 
testing with such data.
The latest version of XDSCONV (March 30, 2013) has a fix for this.
I take the opportunity to announce that a bugfix build of the March 30 XDS 
package release is on Wolfgang Kabsch's server. The bug this fixes is that the 
"GENERATED BY" line in the output file from XSCALE did not mention XSCALE; this 
made pointless fail.

thanks,

Kay


Re: [ccp4bb] ctruncate bug?

2013-06-17 Thread Ed Pozharski

Jeff,

thanks - I can see the same equation and cutoff applied in ctruncate 
source.Here is the relevant part of the code


// Bayesian statistics tells us to modify I/sigma by 
subtracting off sigma/S

// where S is the mean intensity in the resolution shell
h = I/sigma - sigma/S;
// reject as unphysical reflections for which I < -3.7 sigma, 
or h < -4.0

if (I/sigma < -3.7 || h < -4.0 ) {
nrej++;
if (debug) printf("unphys: %f %f %f %f\n",I,sigma,S,h);
return(0);
}


This seems to be arbitrary cutoff choice, given that they are 
hard-coded.  At the very least, cutoffs should depend on the total 
number of reflections to represent famyliwise error rates.


It is however the h-based rejection that seems most problematic to me.  
In the dataset in question, up to 20% reflections are rejected in the 
highest resolution shell (granted, I/sigI there is 0.33).  I would 
expect that reflections are rejected when they are deemed to be outliers 
due to reasons other than statistical errors (e.g. streaks, secondary 
lattice spots in the background, etc).  I must say that this was done 
with extremely good quality data, so I doubt that 1 out of 5 reflections 
returns some physically impossible measurement.


What is happening is that =3S in the highest resolution shell, 
and for many reflection h<-4.0.  This does not mean that reflections are 
"unphysical" though, just that shell as a whole has mostly weak data (in 
this case 89% with I/sigI<2 and 73% with I/sigI<1).


What is counterintuitive is why do I have to discard reflections that 
are just plain weak, and not really outliers?


Cheers,

Ed.



On 06/17/2013 10:29 PM, Jeff Headd wrote:

Hi Ed,

I'm not directly familiar with the ctruncate implementation of French 
and Wilson, but from the implementation that I put into Phenix (based 
on the original F&W paper) I can tell you that any reflection where 
(I/sigI) - (sigI/mean_intensity) is less than a defined cutoff (in our 
case -4.0), then it is rejected. Depending on sigI and the mean 
intensity for a given shell, this can result in positive intensities 
that are also rejected. Typically this will effect very small positive 
intensities as you've observed.


I don't recall the mathematical justification for this and don't have 
a copy of F&W here at home, but I can have a look in the morning when 
I get into the lab and let you know.


Jeff


On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski > wrote:


I noticed something strange when processing a dataset with
imosflm.  The
final output ctruncate_etc.mtz, contains IMEAN and F columns, which
should be the conversion according to French&Wilson.  Problem is that
IMEAN has no missing values (100% complete) while F has about 1500
missing (~97% complete)!

About half of the reflections that go missing are negative, but
half are
positive.  About 5x more negative intensities are successfully
converted.  Most impacted are high resolution shells with weak signal,
so I am sure impact on "normal" refinement would be minimal.

However, I am just puzzled why would ctruncate reject positive
intensities (or negative for that matter - I don't see any cutoff
described in the manual and the lowest I/sigI for successfully
converted
reflection is -18).

Is this a bug or feature?

Cheers,

Ed.

--
I don't know why the sacrifice thing didn't work.
Science behind it seemed so solid.
Julian, King of Lemurs





--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs



Re: [ccp4bb] ctruncate bug?

2013-06-17 Thread Jeff Headd
Hi Ed,

I'm not directly familiar with the ctruncate implementation of French and
Wilson, but from the implementation that I put into Phenix (based on the
original F&W paper) I can tell you that any reflection where (I/sigI) -
(sigI/mean_intensity) is less than a defined cutoff (in our case -4.0),
then it is rejected. Depending on sigI and the mean intensity for a given
shell, this can result in positive intensities that are also rejected.
Typically this will effect very small positive intensities as you've
observed.

I don't recall the mathematical justification for this and don't have a
copy of F&W here at home, but I can have a look in the morning when I get
into the lab and let you know.

Jeff


On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski wrote:

> I noticed something strange when processing a dataset with imosflm.  The
> final output ctruncate_etc.mtz, contains IMEAN and F columns, which
> should be the conversion according to French&Wilson.  Problem is that
> IMEAN has no missing values (100% complete) while F has about 1500
> missing (~97% complete)!
>
> About half of the reflections that go missing are negative, but half are
> positive.  About 5x more negative intensities are successfully
> converted.  Most impacted are high resolution shells with weak signal,
> so I am sure impact on "normal" refinement would be minimal.
>
> However, I am just puzzled why would ctruncate reject positive
> intensities (or negative for that matter - I don't see any cutoff
> described in the manual and the lowest I/sigI for successfully converted
> reflection is -18).
>
> Is this a bug or feature?
>
> Cheers,
>
> Ed.
>
> --
> I don't know why the sacrifice thing didn't work.
> Science behind it seemed so solid.
> Julian, King of Lemurs
>


[ccp4bb] ctruncate bug?

2013-06-17 Thread Ed Pozharski
I noticed something strange when processing a dataset with imosflm.  The
final output ctruncate_etc.mtz, contains IMEAN and F columns, which
should be the conversion according to French&Wilson.  Problem is that
IMEAN has no missing values (100% complete) while F has about 1500
missing (~97% complete)!

About half of the reflections that go missing are negative, but half are
positive.  About 5x more negative intensities are successfully
converted.  Most impacted are high resolution shells with weak signal,
so I am sure impact on "normal" refinement would be minimal.

However, I am just puzzled why would ctruncate reject positive
intensities (or negative for that matter - I don't see any cutoff
described in the manual and the lowest I/sigI for successfully converted
reflection is -18).

Is this a bug or feature?

Cheers,

Ed.

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs