[ccp4bb] mosflm gain

2011-03-03 Thread Bryan Lepore
wondering if mosflm can automatically estimate the gain.

i.e. i gather it is still estimated the usual way.

-Bryan


Re: [ccp4bb] mosflm gain

2011-03-03 Thread David Waterman
Usually Mosflm will use a default value for the gain that depends on the
type of detector used. This value is not realistic for CCD detectors, that
is it is not really equal to the ratio of ADUs to incident X-ray photons,
however it satisfies typical images under the assumptions of pixel
independence and Poisson distribution, which are not true either. Inasmuch
as the gain is just a scale factor in the data, it doesn't really matter
that it isn't physically meaningful in the way you might expect from its
name. However, the procedure of calculating gain from the variance to mean
ratio from a background region of the image, which is the only simple
automatic approach available if all you have is an image, should be avoided
if you are looking for the gain in "real" units.

I realise that didn't answer the question, but I thought it might be worth
pointing out.


-- David


On 3 March 2011 20:34, Bryan Lepore  wrote:

> wondering if mosflm can automatically estimate the gain.
>
> i.e. i gather it is still estimated the usual way.
>
> -Bryan
>


Re: [ccp4bb] mosflm gain

2011-03-04 Thread A Leslie

Dear Bryan,

The quick answer is no. As David Waterman  
mentioned, it has a default value for the gain for each type of  
detector that it can deal with.


A more detailed answer. An incorrect value for the gain can be  
indicated by values of the BGRATIO which differ significantly from  
unity (1.0). BGRATIO  is the ratio of the rms variation in the  
background to the variation expected on the basis of Poisson  
statistics, using the gain to convert from digitised values in the  
image to X-ray photons. This is calculated for all measured spots, and  
binned as a function of intensity for each image measured (and printed  
in the full logfile). Mosflm prints a warning message if this differs  
from 1.0 by more than 10% and will suggest an "improved" value for the  
gain that should be used.


There are a host of caveats in this procedure. For example, if the  
images contains significant diffuse scatter around the Bragg spots,  
the BGRATIO may be above 1.0 ... this is probably the commonest  
effect, but does not mean the gain is wrong. If for any reason the  
mask definition (defining the boundary between background and spot)  
has not worked correctly so the spot extends into the background, this  
will also give a BGRATIO of one (in this case, the BGRATIO will tend  
to be close to 1.0 for weak spots but greater than 1.0 for strong  
ones). The boundary is controlled by the "Profile tolerance"  
parameters, which are sometimes set artificially high to help process  
images where the spots are not fully resolved.


This is why Mosflm does not automatically update its default value for  
the gain based on the BGRATIO.


As David has mentioned, this procedure also assumes that adjacent  
pixels are independent, which they most certainly are not (except  
possibly for some pixel detectors), due to the point spread function  
of the detectors and corrections that are applied to the raw images.


Does it matter ? The gain is used to identify outliers in the  
background plane determination (eg due to zingers, shadows, ice spots,  
hot pixels etc) which are rejected from the calculation, so if it is  
significantly in error this will introduce systematic errors in the  
integrated intensities. This can show up in the cumulative intensity  
distribution in Truncate if the gain is a very long way off. I have  
not done a proper study of this, but I think it would need to be out  
by more than 20% to have a significant effect. The gain is also used  
to calculate sig(I), however, the sig(I) values from Mosflm are  
adjusted in SCALA to reflect the true variation between symmetry  
related reflections so that providing the multiplicity is high enough  
for this to work correctly this will not have any real effect on the  
final merged data.


The bottom line is that the estimates for sig(I) that emerge for this  
procedure seem to be quite good, in that the correction factors that  
are subsequently applied in SCALA for cases where other systematic  
errors are small (ie no radiation damage, absorption etc etc) are very  
close to 1.0.


Best wishes,

Andrew




On 3 Mar 2011, at 20:34, Bryan Lepore wrote:


wondering if mosflm can automatically estimate the gain.

i.e. i gather it is still estimated the usual way.

-Bryan


Re: [ccp4bb] mosflm gain

2011-03-04 Thread A Leslie

Dear All,

spotted a mistake in my response, please see the correction below (in  
bold):


There are a host of caveats in this procedure. For example, if the  
images contains significant diffuse scatter around the Bragg spots,  
the BGRATIO may be above 1.0 ... this is probably the commonest  
effect, but does not mean the gain is wrong. If for any reason the  
mask definition (defining the boundary between background and spot)  
has not worked correctly so the spot extends into the background, this  
will also give a BGRATIO of GREATER THAN one (in this case, the  
BGRATIO will tend to be close to 1.0 for weak spots but greater than  
1.0 for strong ones). The boundary is controlled by the "Profile  
tolerance" parameters, which are sometimes set artificially high to  
help process images where the spots are not fully resolved.


Andrew

On 3 Mar 2011, at 20:34, Bryan Lepore wrote:


wondering if mosflm can automatically estimate the gain.

i.e. i gather it is still estimated the usual way.

-Bryan




Re: [ccp4bb] mosflm gain

2011-03-04 Thread James Holton
I have found that the best way to get the GAIN "right" in MOSFLM is to 
have a look at the optimum "Sdfac" parameter at the end of SCALA (the 
first of the three SDCORRection values).  Specifically, if SDFac is > 1, 
then you need to increase the GAIN.  This is because SDFac>1 means that 
the spots were noisier than MOSFLM thought they should be, and if a 
given number of ADU is noisier than expected, then there must have been 
fewer photons involved in generating the signal.  This means that the 
"true gain" was higher.  Yes, there are other sources of error, like 
shutter jitter, beam flicker, calibration errors, absorption effects, 
scale factor errors, etc.  But these are all directly proportional to 
the intensity, and therefore accounted for by adjusting SDadd (the last 
of the three SDCORR values).  SDfac accounts for noise proportional to 
the square root of intensity, and only shot noise (like photon counting) 
behaves like that.


David Waterman makes an excellent point that the point-spread function 
(PSF) acts like a smoothing filter and makes the background look less 
noisy than photon-counting error permits.  This makes the 
BGRATIO-estimated GAIN lower than the "true" GAIN.  However, one can 
argue that this is not always a bad thing, since the error in measuring 
the intensity of a given area of flat background really is "better than 
photon counting".  This is because you have the smoothing effect of the 
PSF working "for you": bringing in signal from areas outside the region 
you are measuring (prior knowledge of "flatness" if you will).  However, 
this smoothing effect of the PSF does not apply to spots because spot 
photons all arrive in essentially the same place, and no "smoothing" 
will change the intrinsic noise of the total number of photons that 
actually arrived.  The upshot of this is that we really need two 
different values for GAIN, one for the background and one for the 
background-subtracted spot intensity.  The influence on sigma(I) would 
depend on the relative contributions from the spot vs the background 
under it.  I am pretty sure this is not implemented.


It is perhaps interesting that there is also a third type of noise which 
is independent of the spot intensity: "read-out noise".  This used to be 
called "fog" on film detectors.  Despite all the money we spend on 
detectors that minimize it, there is no specific accounting for read-out 
noise in MOSFLM or any other integration package I am aware of.  
However, a "trick" to account for it is to simply lower the ADCOFFSET.  
For example, using 1 A X-rays on an ADSC Q315r detector in hwbin mode, 
the true GAIN is 1.8 ADU/photon, the ADCOFFSET is 40 ADU, and the 
read-out noise is equivalent to the noise deposited by ~2 photon/pixel 
of x-ray background.  This means that a blank image has an average value 
of 40 ADU and rms variation of ~2.5 ADU, but this is equivalent to an 
image from a detector with the same gain, no read-out noise, and 
ADCOFFSET of 36 that was "fogged" by 2 photons/pixel (regardless of 
exposure time).  Yes, this is a small change in ADCOFFSET, and I doubt 
you will notice the difference.  I think this speaks to the fact that, 
on modern detectors at least, read-out noise is essentially negligible.


Another way to get the GAIN, of course, is to measure it directly.  I 
did this on an ADSC Q315 detector in swbin mode by comparison to a 
NaI:Tl scintillator (after accounting for the window and sensor 
thickness of the latter device):

http://bl831.als.lbl.gov/~jamesh/pickup/Q315_gain.png
You can see how the GAIN changes appreciably with photon energy, and 
this is largely because lower-energy photons generate less signal.  GAIN 
also changes with the detector read-out mode.  For example, this number 
is 3 times higher for a Q315r in hwbin mode.  I have listed my best 
information on the typical GAIN and read-out noise of common detectors 
on my "minimum crystal size" page here:

http://bl831.als.lbl.gov/xtalsize.html
You can extract the parameters by selecting the "detector type = " you 
want, and then switching it again to "Custom..."


-James Holton
MAD Scientist

On 3/3/2011 12:34 PM, Bryan Lepore wrote:

wondering if mosflm can automatically estimate the gain.

i.e. i gather it is still estimated the usual way.

-Bryan


Re: [ccp4bb] mosflm gain

2011-03-07 Thread A Leslie
I have to say that I don't fully agree with James' recommendation to  
adjust the GAIN in MOSFLM until the calculated SDFAC parameter in  
SCALA is 1.0.


(Background information, the sigmas from Mosflm sd(I) are corrected in  
SCALA according to
   sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl +  
(SdAdd*Ihl)**2}
in order to get the best agreement between corrected sigmas and the  
observed differences between symmtery/Friedel related intensities)



While I fully agree with his argument that systematic errors such as  
absorption, etc give an error proportional to the intensity, and  
therefore should be corrected by the SDADD term rather than SDFAC, in  
any "real world" data set that I have come across the situation is not  
so simple. Indeed, according to the usual treatment of errors there  
should be no need for the SDB term in SCALA, but in practice it is  
essential to have this term to be able to match corrected sigmas with  
the observed differences between symmetry related reflections. It also  
turns out that the three variable parameters SDFAC, SDB and SDADD are  
highly correlated, so one can get rather different values for any  
individual parameter from very similar datasets. Radiation damage is  
certainly one source of error which would not be expected to follow a  
simple error model, or non-isomorphism if multiple crystals have been  
used.


Phil Evans is not entirely happy with the behaviour of the refinement  
of these parameters and is in fact currently looking at this, but  
there is a  basic problem here that one is trying to use a simple  
error model for a situation where (for whatever reason) it does not  
really apply.


The sigma estimates from MOSFLM are only intended to give an estimate  
of the random error in the intensities. In my opinion, trying to  
account for systematic errors is best done at the point of merging the  
data where much more information is available (ie symmetry related  
measurements).


I would be most interested to hear of any examples where the default  
value of the GAIN in MOSFLM is clearly wrong, but to the best of my  
current knowledge the default GAIN is perfectly adequate.


Best wishes

Andrew
On 4 Mar 2011, at 19:47, James Holton wrote:

I have found that the best way to get the GAIN "right" in MOSFLM is  
to have a look at the optimum "Sdfac" parameter at the end of SCALA  
(the first of the three SDCORRection values).  Specifically, if  
SDFac is > 1, then you need to increase the GAIN.  This is because  
SDFac>1 means that the spots were noisier than MOSFLM thought they  
should be, and if a given number of ADU is noisier than expected,  
then there must have been fewer photons involved in generating the  
signal.  This means that the "true gain" was higher.  Yes, there are  
other sources of error, like shutter jitter, beam flicker,  
calibration errors, absorption effects, scale factor errors, etc.   
But these are all directly proportional to the intensity, and  
therefore accounted for by adjusting SDadd (the last of the three  
SDCORR values).  SDfac accounts for noise proportional to the square  
root of intensity, and only shot noise (like photon counting)  
behaves like that.


David Waterman makes an excellent point that the point-spread  
function (PSF) acts like a smoothing filter and makes the background  
look less noisy than photon-counting error permits.  This makes the  
BGRATIO-estimated GAIN lower than the "true" GAIN.  However, one can  
argue that this is not always a bad thing, since the error in  
measuring the intensity of a given area of flat background really is  
"better than photon counting".  This is because you have the  
smoothing effect of the PSF working "for you": bringing in signal  
from areas outside the region you are measuring (prior knowledge of  
"flatness" if you will).  However, this smoothing effect of the PSF  
does not apply to spots because spot photons all arrive in  
essentially the same place, and no "smoothing" will change the  
intrinsic noise of the total number of photons that actually  
arrived.  The upshot of this is that we really need two different  
values for GAIN, one for the background and one for the background- 
subtracted spot intensity.  The influence on sigma(I) would depend  
on the relative contributions from the spot vs the background under  
it.  I am pretty sure this is not implemented.


It is perhaps interesting that there is also a third type of noise  
which is independent of the spot intensity: "read-out noise".  This  
used to be called "fog" on film detectors.  Despite all the money we  
spend on detectors that minimize it, there is no specific accounting  
for read-out noise in MOSFLM or any other integration package I am  
aware of.  However, a "trick" to account for it is to simply lower  
the ADCOFFSET.  For example, using 1 A X-rays on an ADSC Q315r  
detector in hwbin mode, the true GAIN is 1.8 ADU/photon, the  
ADCOFFSET is 40 ADU, and the read

Re: [ccp4bb] mosflm gain

2011-03-08 Thread James Holton
Andrew!  You don't believe me?  Well, I suppose it serves me right for 
not explaining where the idea came from (see below).


 I do, however, agree with Andrew's assessment that the default-chosen 
gain in MOSFLM is adequate for all practical purposes.  Any error in 
GAIN will be almost exactly compensated for by a corresponding change in 
Sdfac in SCALA, and the final value of sigma(I) will be essentially the 
same.  The only possible difference will be in the sigma-based outlier 
rejection within MOSFLM, but since the typical errors in the sigma are 
only ~30%, I predict it will be hard to find a situation where this 
makes or breaks a structure determination.


So, by way of explanation: there are three things that led me to this 
conclusion:

1) the control: fake data with all pixels independent.
adjusting the GAIN as MOSFLM recommends from the BGRATIO analysis 
does, in fact, reproduce the "correct" value of the gain used to 
generate the fake data.  In SCALA, Sdfac refines to ~1.0, SdB refines to 
0, and Sdadd refines to the actual magnitude of fractional error 
(introduced by beam flicker, shutter jitter, etc.).  No surprises here.
2) "blur" the fake data with the point-spread function (PSF) empirically 
derived for my detector
In this case, the "MOSFLM-refined gain" is too low.  In SCALA, 
Sdfac refines to ~1.3, SdB refines to 3-5, and Sdadd is a bit low.  
These parameters are about what I see processing good real data.
3) use real data, but force MOSFLM to use the GAIN calibrated 
independently for the detector
   MOSFLM grumbles a lot about the BGRATIO.  In SCALA, Sdfac refines to 
~1, and SdB refines to ~0.  Sdadd is consistent with my 
independently-measured fractional error sources.


Now, I have not evaluated this approach on a huge number of data sets, 
but in this case the PSF was both necessary and sufficient to explain 
the "mystery of SdB".  That is: the need for SdB arises because using an 
"incorrect" gain creates a correlation between Sdfac and Sdadd.  I 
imagine there are other ways to get a non-zero SdB as well, but for 
"good data" I suspect this is the dominant mechanism.  I never wrote 
this up because I am fairly certain the article would do nothing to 
improve the impact factor of the journal in which it was published, but 
this anecdote might perhaps be useful to Andrew, Phil, and a few other 
readers of this list.


-James Holton
MAD Scientist


On 3/7/2011 2:00 AM, A Leslie wrote:



I have to say that I don't fully agree with James' recommendation to 
adjust the GAIN in MOSFLM until the calculated SDFAC parameter in 
SCALA is 1.0.


(Background information, the sigmas from Mosflm sd(I) are corrected in 
SCALA according to
   sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + 
(SdAdd*Ihl)**2}
in order to get the best agreement between corrected sigmas and the 
observed differences between symmtery/Friedel related intensities)



While I fully agree with his argument that systematic errors such as 
absorption, etc give an error proportional to the intensity, and 
therefore should be corrected by the SDADD term rather than SDFAC, in 
any "real world" data set that I have come across the situation is not 
so simple. Indeed, according to the usual treatment of errors there 
should be no need for the SDB term in SCALA, but in practice it is 
essential to have this term to be able to match corrected sigmas with 
the observed differences between symmetry related reflections. It also 
turns out that the three variable parameters SDFAC, SDB and SDADD are 
highly correlated, so one can get rather different values for any 
individual parameter from very similar datasets. Radiation damage is 
certainly one source of error which would not be expected to follow a 
simple error model, or non-isomorphism if multiple crystals have been 
used.


Phil Evans is not entirely happy with the behaviour of the refinement 
of these parameters and is in fact currently looking at this, but 
there is a  basic problem here that one is trying to use a simple 
error model for a situation where (for whatever reason) it does not 
really apply.


The sigma estimates from MOSFLM are only intended to give an estimate 
of the random error in the intensities. In my opinion, trying to 
account for systematic errors is best done at the point of merging the 
data where much more information is available (ie symmetry related 
measurements).


I would be most interested to hear of any examples where the default 
value of the GAIN in MOSFLM is clearly wrong, but to the best of my 
current knowledge the default GAIN is perfectly adequate.


Best wishes

Andrew
On 4 Mar 2011, at 19:47, James Holton wrote:

I have found that the best way to get the GAIN "right" in MOSFLM is 
to have a look at the optimum "Sdfac" parameter at the end of SCALA 
(the first of the three SDCORRection values).  Specifically, if SDFac 
is > 1, then you need to increase the GAIN.  This is because SDFac>1 
means that the spots were 

Re: [ccp4bb] mosflm gain

2011-03-13 Thread Andrew Leslie
Dear James,

 Many thanks for the detailed explanation. I do find
your results very interesting  and (when time allows
!) I will certainly investigate this effect in more
detail and see if I find similar results for data
that shows significant levels of radiation damage (as
mine invariably seem to do). I have to admit that it
is not entirely clear to me why PSF would result in a
correlation between SDFAC and SDADD, although this is
clearly what you see. It would be rewarding to get to
the bottom of this. As I mentioned earlier, Phil
Evans is currently looking at the refinement of the
SD parameters in relation to Aimless (the imminent
replacement for SCALA) so he is also very interested
in figuring out exactly what is going on here (but
does not have any answers as yet).

Best wishes,

Andrew

>
> Andrew!  You don't believe me?  Well, I suppose it serves me right for
> not explaining where the idea came from (see below).
>
>   I do, however, agree with Andrew's assessment that the default-chosen
> gain in MOSFLM is adequate for all practical purposes.  Any error in
> GAIN will be almost exactly compensated for by a corresponding change in
> Sdfac in SCALA, and the final value of sigma(I) will be essentially the
> same.  The only possible difference will be in the sigma-based outlier
> rejection within MOSFLM, but since the typical errors in the sigma are
> only ~30%, I predict it will be hard to find a situation where this
> makes or breaks a structure determination.
>
> So, by way of explanation: there are three things that led me to this
> conclusion:
> 1) the control: fake data with all pixels independent.
>  adjusting the GAIN as MOSFLM recommends from the BGRATIO analysis
> does, in fact, reproduce the "correct" value of the gain used to
> generate the fake data.  In SCALA, Sdfac refines to ~1.0, SdB refines to
> 0, and Sdadd refines to the actual magnitude of fractional error
> (introduced by beam flicker, shutter jitter, etc.).  No surprises here.
> 2) "blur" the fake data with the point-spread function (PSF) empirically
> derived for my detector
>  In this case, the "MOSFLM-refined gain" is too low.  In SCALA,
> Sdfac refines to ~1.3, SdB refines to 3-5, and Sdadd is a bit low.
> These parameters are about what I see processing good real data.
> 3) use real data, but force MOSFLM to use the GAIN calibrated
> independently for the detector
> MOSFLM grumbles a lot about the BGRATIO.  In SCALA, Sdfac refines to
> ~1, and SdB refines to ~0.  Sdadd is consistent with my
> independently-measured fractional error sources.
>
> Now, I have not evaluated this approach on a huge number of data sets,
> but in this case the PSF was both necessary and sufficient to explain
> the "mystery of SdB".  That is: the need for SdB arises because using an
> "incorrect" gain creates a correlation between Sdfac and Sdadd.  I
> imagine there are other ways to get a non-zero SdB as well, but for
> "good data" I suspect this is the dominant mechanism.  I never wrote
> this up because I am fairly certain the article would do nothing to
> improve the impact factor of the journal in which it was published, but
> this anecdote might perhaps be useful to Andrew, Phil, and a few other
> readers of this list.
>
> -James Holton
> MAD Scientist
>
>
> On 3/7/2011 2:00 AM, A Leslie wrote:
>>
>>
>> I have to say that I don't fully agree with James' recommendation to
>> adjust the GAIN in MOSFLM until the calculated SDFAC parameter in
>> SCALA is 1.0.
>>
>> (Background information, the sigmas from Mosflm sd(I) are corrected in
>> SCALA according to
>>sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl +
>> (SdAdd*Ihl)**2}
>> in order to get the best agreement between corrected sigmas and the
>> observed differences between symmtery/Friedel related intensities)
>>
>>
>> While I fully agree with his argument that systematic errors such as
>> absorption, etc give an error proportional to the intensity, and
>> therefore should be corrected by the SDADD term rather than SDFAC, in
>> any "real world" data set that I have come across the situation is not
>> so simple. Indeed, according to the usual treatment of errors there
>> should be no need for the SDB term in SCALA, but in practice it is
>> essential to have this term to be able to match corrected sigmas with
>> the observed differences between symmetry related reflections. It also
>> turns out that the three variable parameters SDFAC, SDB and SDADD are
>> highly correlated, so one can get rather different values for any
>> individual parameter from very similar datasets. Radiation damage is
>> certainly one source of error which would not be expected to follow a
>> simple error model, or non-isomorphism if multiple crystals have been
>> used.
>>
>> Phil Evans is not entirely happy with the behaviour of the refinement
>> of these parameters and is in fact currently looking at this, but
>> there is a  basic problem here that one is trying to use a simple