Re: [ccp4bb] am I doing this right?

2021-10-18 Thread Guillaume Gaullier
I am not qualified to comment on anything in the rest of this discussion, but 
regarding the excerpt quoted below: this way of recording an "image" seems very 
similar to the EER (electron event representation) of the Falcon 4 direct 
electron detector used in cryoEM. See https://doi.org/10.1107/S205225252000929X

Guillaume


On 18 Oct 2021, at 08:30, Frank von Delft 
mailto:frank.vonde...@cmd.ox.ac.uk>> wrote:

Also:  should the detectors change how they read out things, then?  Just write 
out the events with timestamp, rather than dumping all pixels all the time into 
these arbitrary containers called "image".  Or is that what's already happening 
in HDF5 (which I don't understand one bit, I should add).









N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r det att 
vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r det kan du 
l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For 
more information on how this is performed, please read here: 
http://www.uu.se/en/about-uu/data-protection-policy



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-18 Thread James Holton
HDF5 is still "framing", but using better compression than the "byte 
offset" one implemented in Pilatus CBFs, which has a minimum of one byte 
per pixel. Very fast, but not designed for near-blank images.


Assuming entropy-limited compression the ultimate data rate is the 
number of photons/s hitting the detector multiplied by log2(Npix) where 
Npix is the number of pixels. The reason its log2() is because that's 
the number of bits needed to store the address of which pixel got the 
photon, and since the arrival of each photon is basically random further 
compression is generally not possible without loss of information.  
There might be some additional bits about the time interval, but it 
might be more efficient to store that implicitly in the framing. As long 
as storing "no photons" only takes up one bit that would probably be 
more efficient.


So, for a 100 micron thick sample, flux = 1e12 photons/s and ~4000 
pixels you get ~3.4 GB/s of perfectly and losslessly compressed data.  
Making it smaller than that requires throwing away information.


I'm starting to think this might be the best prior. If you start out 
assuming nothing (not even uniform), then the variance of 0 photons may 
well be infinite. However, it is perhaps safe to assume that the dataset 
as a whole as at least one photon in it. And then if you happen to know 
the whole data set contains N photons and you have F images of Q pixels, 
then maybe a reasonable prior distribution is Poissonian with 
mean=variance= N/F/Q photons/pixel ?


-James Holton
MAD Scientist

On 10/17/2021 11:30 PM, Frank von Delft wrote:
Thanks, I learnt two things now - one of which being that I'm credited 
with coining that word!  Stap me vittals...


If it's single photon events you're after, isn't it quantum statistics 
where you need to go find that prior?  (Or is that what you're doing 
in this thread - I wouldn't be able to tell.)


Also:  should the detectors change how they read out things, then?  
Just write out the events with timestamp, rather than dumping all 
pixels all the time into these arbitrary containers called "image".  
Or is that what's already happening in HDF5 (which I don't understand 
one bit, I should add).


Frank




On 17/10/2021 18:12, James Holton wrote:


Well Frank, I think it comes down to something I believe you were the 
first to call "dose slicing".


Like fine phi slicing, collecting a larger number of weaker images 
records the same photons, but with more information about the sample 
before it dies. In fine phi slicing the extra information allows you 
to do better background rejection, and in "dose slicing" the extra 
information is about radiation damage. We lose that information when 
we use longer exposures per image, and if you burn up the entire 
useful life of your crystal in one shot, then all information about 
how the spots decayed during the exposure is lost. Your data are also 
rather incomplete.


How much information is lost? Well, how much more disk space would be 
taken up, even after compression, if you collected only 1 photon per 
image?  And kept collecting all the way out to 30 MGy in dose? That's 
about 1 million photons (images) per cubic micron of crystal.  So, 
I'd say the amount of information lost is "quite a bit".


But what makes matters worse is that if you did collect this data set 
and preserved all information available from your crystal you'd have 
no way to process it. This is not because its impossible, its just 
that we don't have the software. Your only choice would be to go find 
images with the same "phi" value and add them together until you have 
enough photons/pixel to index it. Once you've got an indexing 
solution you can map every photon hit to a position in reciprocal 
space as well as give it a time/dose stamp. What do you do with 
that?  You can do zero-dose extrapolation, of course!  Damage-free 
data! Wouldn't that be nice. Or can you?  The data you will have in 
hand for each reciprocal-space pixel might look something like:
tic tic .. tic . tic ... tic tictic ... 
tictic.


So. Eight photons.  With time-of-arrival information.  How do you fit 
a straight line to that?  You could "bin" the data or do some kind of 
smoothing thing, but then you are losing information again. Perhaps 
also making ill-founded assumptions. You need error bars of some 
kind, and, better yet, the shape of the distribution implied by those 
error bars.


And all this makes me think somebody must have already done this. I'm 
willing to bet probably some time in the late 1700s to early 1800s. 
All we're really talking about here is augmenting maximum-likelihood 
estimation of an average value to maximum-likelihood estimation of a 
straight line. That is, slope and intercept, with sigmas on both. I 
suspect the proper approach is to first bring everything down to the 
exact information content of a single photon (or lack of a photon), 
and build up 

Re: [ccp4bb] am I doing this right?

2021-10-18 Thread James Holton

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the 
expectation and variance if I observe a 10x10 patch of pixels with zero 
counts?" is:

Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather 
exponential. And that means adding variances may not be the way to 
propagate error.


Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:

Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero counts - 
what is the variance?") you ask is the following:

1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
tells us the probability of observing k counts if we know l. The PDF is 
normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
2) you don't know before the experiment what l is, and you assume it is some number x 
with 0<=x<=xmax (the xmax limit can be calculated by looking at the physics of 
the experiment; it is finite and less than the overload value of the pixel, otherwise 
you should do a different experiment). Since you don't know that number, all the x 
values are equally likely - you use a uniform prior.
3) what is the PDF P(l|k) of l if we observe k counts?  That can be found with Bayes 
theorem, and it turns out that (due to the uniform prior) the right hand side of the 
formula looks the same as in 1) : P(l|k) = l^k*e^(-l)/k! (again, the ! stands for the 
factorial, it is not a semantic exclamation mark). This is eqs. 7.42 and 7.43 in Agostini 
"Bayesian Reasoning in Data Analysis".
3a) side note: if we calculate the expectation value for l, by multiplying with 
l and integrating over l from 0 to infinity, we obtain E(P(l|k))=k+1, and 
similarly for the variance (Agostini eqs 7.45 and 7.46)
4) for k=0 (zero counts observed in a single pixel), this reduces to 
P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also 
§7.4.1 of Agostini.
5) since we have 100 independent pixels, we must multiply the individual PDFs 
to get the overall PDF f, and also normalize to make the integral over that PDF 
to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic math). A 
more Bayesian procedure would be to realize that the posterior PDF 
P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
pixel, and so forth until the 100th pixel. This has the same result f(l|all 100 
pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n .  
This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 counts.
7) the variance is then INTEGRAL_0_to_infinity over (l-1/n)^2*n*e^(-n*l) dl . 
This is 1/n^2

I find these results quite satisfactory. Please note that they deviate from the 
MLE result: expectation value=0, variance=0 . The problem appears to be that a 
Maximum Likelihood Estimator may give wrong results for small n; something that 
I've read a couple of times but which appears not to be universally 
known/taught. Clearly, the result in 6) and 7) for large n converges towards 0, 
as it should be.
What this also means is that one should really work out the PDF instead of just 
adding expectation values and variances (and arriving at 100 if all 100 pixels 
have zero counts) because it is contradictory to use a uniform prior for all 
the pixels if OTOH these agree perfectly in being 0!

What this means for zero-dose extrapolation I have not thought about. At least 
it prevents infinite weights!

Best,
Kay








To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Call for Registration to virtually attend Dr. Michael Rossmann Symposium on Oct 24-26, 2021

2021-10-18 Thread Xiao, Chuan
Dear CCP4 X-ray crystallographers,

I am writing this email to invite you to attend Dr. Michael Rossmann Symposium 
that will be held on Oct 24-26, 2021.  Dr. Michael Rossmann was a pioneer in 
protein crystallography and we hope you will be interested to come to the 
symposium in memory of him by listening talks from his friends, colleagues, and 
former trainees.

The symposium will be a hybrid one with both in-person and online. The 
in-person registration has been expired but the virtual attending registration 
is still open till Oct 22nd with a fee of $10. Keynote speakers include Dr. 
Felix Ray, Dr. Michael Diamond, Dr. Joachim Frank, Dr. Polly Roy, and Dr. Hao 
Wu. The full agenda can be found in the URL below:

https://web.cvent.com/event/35b1092e-1536-4fdd-9be0-071dadd3bfca/websitePage:17a44d10-f31a-4522-9701-9e8e2d617a86?locale=en-US=p2kY6DqPITXEWCvuLrfvbWtrD_xow-NH9jEWHA0mREw

The registration URL is copied below for you.

https://web.cvent.com/event/35b1092e-1536-4fdd-9be0-071dadd3bfca/summary?locale=en-US=p2kY6DqPITXEWCvuLrfvbWtrD_xow-NH9jEWHA0mREw

Thank you.

[UTEP]

Chuan (River) Xiao
Professor of Biochemistry
2020 UT System Regents Outstanding Teaching Award Recipient

Department of Chemistry and Biochemistry
The University of Texas at El Paso
500 W. University Ave.
El Paso, TX 79968
Office: 915-747-8657
Fax: 915-747-5996
http://utminers.utep.edu/cxiao/






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] 4-year PhD position in Bergen, Norway

2021-10-18 Thread Petri Kursula
Hi,

we have an opening for a fully funded 4-year PhD position in my group at hte 
Unviersity of Bergen, Norway. Please see the full ad below for details. 
Informal queries are welcome, but applications must be made through the link in 
the advert.

https://www.jobbnorge.no/en/available-jobs/job/213744/phd-position 

Best regards,
Petri

Petri Kursula
--
Professor 
--
Department of Biomedicine
University of Bergen, Norway
https://link.uib.no/petri
petri.kurs...@uib.no
--
Faculty of Biochemistry and Molecular Medicine
Biocenter Oulu
University of Oulu, Finland
--










To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-18 Thread Ian Tickle
All, this was my reply to one of James' emails which I just noticed was to
me only, and I ought to have CC'd it to the BB, since it has relevance to
others' contributions to the discussion.

Cheers

-- Ian


On Mon, 18 Oct 2021 at 11:27, Ian Tickle  wrote:

>
> James, no I don't think so, what does it have to do with the number of
> pixels?  All that matters is the photon flux and the area of measurement.
> The only purpose of the detector is to make the measurement: it cannot
> possibly change the measurement unless of course it's less than ideal.
> Assuming we are using ideal quantum detectors with DQE = 1 you can take
> away the detector and replace it with a different ideal detector with
> different-sized pixels.  The result must be the same for any ideal detector
> assuming a fixed photon flux and measurement box.  Remember that for the
> Poisson distribution (and the true count is Poisson-distributed), the
> expectation equals the variance.  If you're saying that the variance is 100
> then so is the expectation: does that sound sensible given that no counts
> were observed?
>
> I would take heed of Gergely's wise counsel: "It is easy to fall into the
> trap of frequentist thinking and reduce data one step at a time.".  Your
> argument is step 1: estimate expectation and variance for each pixel; step
> 2: add up the expectations and variances for all the pixels.  It doesn't
> work like that!
>
> I stick with my original suggestion that for no observed counts in some
> area the best estimate of expectation and variance is 1 in that area (I'm
> probably assuming an uninformative prior but as I said I haven't been
> through the algebra in detail).
>
> Cheers
>
> -- Ian
>
>
> On Sat, 16 Oct 2021 at 16:04, James Holton  wrote:
>
>> Sorry to be unclear.  I am actually referring to the background.  Assume
>> the spot is a systematic absence, but we still need a variance.  The
>> "units" (if you will pemit) will be photons/pixel.  Not photons/spot.
>>
>> I think in that case my 10x10 patch of independent pixels, all with zero
>> observed counts, would have a variance of 100.  The sum of two patches of
>> 50 would also have a variance of 100.
>>
>> Right?
>>
>>
>> On 10/16/2021 5:35 AM, Ian Tickle wrote:
>>
>>
>> PS Note also that the prior is for the integrated spot intensity, not the
>> individual pixel counts, so we should integrate before applying the +1
>> correction.
>>
>> I.
>>
>>
>> On Sat, 16 Oct 2021 at 10:16, Ian Tickle  wrote:
>>
>>>
>>> James, you're now talking about additivity of the observed or true
>>> counts for different spots whereas I assumed your question still concerned
>>> estimation of the expectation and variance of the true count of a given
>>> spot, assuming some prior distribution of the true count.  As we've already
>>> seen, the latter does not behave in an intuitive way.
>>>
>>> We can use argument by reductio ad absurdum (reduction to absurdity) to
>>> demonstrate this, i.e. assume the contrary (i.e. that the expected counts
>>> and variances are additive), and show that it leads inevitably to a logical
>>> contradiction.  First it should be clear that the result must be
>>> independent of the pixellation of the detector surface, i.e. the pixel size
>>> (it needn't correspond to the hardware pixel detectors), provided it's not
>>> smaller than the hardware pixel and not greater than the spot size.
>>>
>>> That means that we can subdivide the 10x10 area any way we like and we
>>> should always get the same answer for the total expected value and
>>> variance, so 100 1x1 pixels, or 50 2x1, or 25 2x2, or 4 5x5, or 1 10x10
>>> etc.  Given that we accept the estimate of 1 for the expectation and
>>> variance of the true count for zero observed count and assume additivity of
>>> the expected values and variances, these choices give 100, 50, 25, 4 and 1
>>> as the answer!  We can accept possible alternative solutions to a problem
>>> where each solution is correct in its own universe, but not different
>>> solutions that are simultaneously correct in the same universe: that's a
>>> logical contradiction.  So we are forced to abandon additivity for the
>>> expected value and variance of the true count for a single spot.  That of
>>> course has nothing to do with additivity of those values for multiple
>>> different spots: that is still valid.
>>>
>>> BTW I found this paper on Bayesian priors for the Poisson distribution:
>>> "Inferring the intensity of Poisson processes at the limit of the detector
>>> sensitivity": https://arxiv.org/pdf/hep-ex/9909047.pdf .
>>>
>>> Cheers
>>>
>>> -- Ian
>>>
>>>
>>> On Sat, 16 Oct 2021 at 00:25, James Holton  wrote:
>>>
 I don't follow.  How is it that variances are not additive?  We do this
 all the time when we merge data together.


 On 10/15/2021 4:22 PM, Ian Tickle wrote:

 James, also as the question is posed none of the answers is correct
 because a photon count must be an integer (there's no such thing as a
 

[ccp4bb] Postdoc position Oxford Univ - Structural and Chemical Biology of E3 Ligases

2021-10-18 Thread Alex Bullock
A postdoc position is available in the group of Alex Bullock, University of 
Oxford, to study the structural and chemical biology of E3 Ligases as part of 
the EU-funded "EUbOpen" consortium, with funding until 30 April 2025. The 
project will support the next generation of PROTACs/molecular glues for 
targeted protein degradation.
Nature Reviews Drug Discovery volume 18, pages 949-963 (2019)
https://www.nature.com/articles/s41573-019-0047-y

The postholder will optimise purification and crystallisation systems for human 
Cullin-RING E3 ligase substrate adaptor proteins, solve their structures and 
study their interactions with substrate proteins, peptides and low molecular 
weight compounds. The post will involve significant collaborative teamwork 
internally, as well as with external academic and industry partners.

Applications close at noon 10 November 2021. Full details are available here:
Job Details 
(corehr.com)

Informal enquires can be sent to 
alex.bull...@cmd.ox.ac.uk

https://www.cmd.ox.ac.uk/team/alex-bullock
https://www.cmd.ox.ac.uk/research/growth-factor-signalling-and-ubiquitination





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-18 Thread Kay Diederichs
Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero 
counts - what is the variance?") you ask is the following:

1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
tells us the probability of observing k counts if we know l. The PDF is 
normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
2) you don't know before the experiment what l is, and you assume it is some 
number x with 0<=x<=xmax (the xmax limit can be calculated by looking at the 
physics of the experiment; it is finite and less than the overload value of the 
pixel, otherwise you should do a different experiment). Since you don't know 
that number, all the x values are equally likely - you use a uniform prior.
3) what is the PDF P(l|k) of l if we observe k counts?  That can be found with 
Bayes theorem, and it turns out that (due to the uniform prior) the right hand 
side of the formula looks the same as in 1) : P(l|k) = l^k*e^(-l)/k! (again, 
the ! stands for the factorial, it is not a semantic exclamation mark). This is 
eqs. 7.42 and 7.43 in Agostini "Bayesian Reasoning in Data Analysis".
3a) side note: if we calculate the expectation value for l, by multiplying with 
l and integrating over l from 0 to infinity, we obtain E(P(l|k))=k+1, and 
similarly for the variance (Agostini eqs 7.45 and 7.46)
4) for k=0 (zero counts observed in a single pixel), this reduces to 
P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also 
§7.4.1 of Agostini.
5) since we have 100 independent pixels, we must multiply the individual PDFs 
to get the overall PDF f, and also normalize to make the integral over that PDF 
to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic math). A 
more Bayesian procedure would be to realize that the posterior PDF 
P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
pixel, and so forth until the 100th pixel. This has the same result f(l|all 100 
pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n .  
This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 counts.
7) the variance is then INTEGRAL_0_to_infinity over (l-1/n)^2*n*e^(-n*l) dl . 
This is 1/n^2 

I find these results quite satisfactory. Please note that they deviate from the 
MLE result: expectation value=0, variance=0 . The problem appears to be that a 
Maximum Likelihood Estimator may give wrong results for small n; something that 
I've read a couple of times but which appears not to be universally 
known/taught. Clearly, the result in 6) and 7) for large n converges towards 0, 
as it should be.
What this also means is that one should really work out the PDF instead of just 
adding expectation values and variances (and arriving at 100 if all 100 pixels 
have zero counts) because it is contradictory to use a uniform prior for all 
the pixels if OTOH these agree perfectly in being 0!

What this means for zero-dose extrapolation I have not thought about. At least 
it prevents infinite weights!

Best,
Kay 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Bioinformatician position at PDBe in collaboration with CCP4

2021-10-18 Thread John Berrisford
Dear Colleagues,

 

We have a bioinformatician position available in the PDBe team at the
European Bioinformatics Institute (EBI) on the Wellcome Genome Campus near
Cambridge.

 

We are looking for a structural bioinformatician who is interested in
developing methods for protein structure analysis to identify biologically
relevant conformational states and link these to macromolecular function.

 

The successful candidate will work in collaboration with Eugene Krissinel,
and the CCP4 core team at the Science and Technology Facilities Council, to
implement new data analysis methods, improve biological data processing
pipelines, perform data analysis, and contribute to developing user-facing
web pages.

 

The closing date for applications is 8th November 2021. For more information
on the position, please visit:

 

 
https://www.embl.org/jobs/position/EBI01923 

 

Kind regards,

 

John Berrisford

 

--

John Berrisford

PDBe

European Bioinformatics Institute (EMBL-EBI)

European Molecular Biology Laboratory

Wellcome Trust Genome Campus

Hinxton

Cambridge CB10 1SD UK

Tel: +44 1223 492529

 

  http://www.pdbe.org

 
http://www.facebook.com/proteindatabank

  http://twitter.com/PDBeurope

 




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-18 Thread Frank von Delft
Thanks, I learnt two things now - one of which being that I'm credited 
with coining that word!  Stap me vittals...


If it's single photon events you're after, isn't it quantum statistics 
where you need to go find that prior?  (Or is that what you're doing in 
this thread - I wouldn't be able to tell.)


Also:  should the detectors change how they read out things, then? Just 
write out the events with timestamp, rather than dumping all pixels all 
the time into these arbitrary containers called "image". Or is that 
what's already happening in HDF5 (which I don't understand one bit, I 
should add).


Frank




On 17/10/2021 18:12, James Holton wrote:


Well Frank, I think it comes down to something I believe you were the 
first to call "dose slicing".


Like fine phi slicing, collecting a larger number of weaker images 
records the same photons, but with more information about the sample 
before it dies. In fine phi slicing the extra information allows you 
to do better background rejection, and in "dose slicing" the extra 
information is about radiation damage. We lose that information when 
we use longer exposures per image, and if you burn up the entire 
useful life of your crystal in one shot, then all information about 
how the spots decayed during the exposure is lost. Your data are also 
rather incomplete.


How much information is lost? Well, how much more disk space would be 
taken up, even after compression, if you collected only 1 photon per 
image?  And kept collecting all the way out to 30 MGy in dose? That's 
about 1 million photons (images) per cubic micron of crystal.  So, I'd 
say the amount of information lost is "quite a bit".


But what makes matters worse is that if you did collect this data set 
and preserved all information available from your crystal you'd have 
no way to process it. This is not because its impossible, its just 
that we don't have the software. Your only choice would be to go find 
images with the same "phi" value and add them together until you have 
enough photons/pixel to index it. Once you've got an indexing solution 
you can map every photon hit to a position in reciprocal space as well 
as give it a time/dose stamp. What do you do with that?  You can do 
zero-dose extrapolation, of course!  Damage-free data! Wouldn't that 
be nice. Or can you?  The data you will have in hand for each 
reciprocal-space pixel might look something like:
tic tic .. tic . tic ... tic tictic ... 
tictic.


So. Eight photons.  With time-of-arrival information.  How do you fit 
a straight line to that?  You could "bin" the data or do some kind of 
smoothing thing, but then you are losing information again. Perhaps 
also making ill-founded assumptions. You need error bars of some kind, 
and, better yet, the shape of the distribution implied by those error 
bars.


And all this makes me think somebody must have already done this. I'm 
willing to bet probably some time in the late 1700s to early 1800s. 
All we're really talking about here is augmenting maximum-likelihood 
estimation of an average value to maximum-likelihood estimation of a 
straight line. That is, slope and intercept, with sigmas on both. I 
suspect the proper approach is to first bring everything down to the 
exact information content of a single photon (or lack of a photon), 
and build up from there.  If you are lucky enough to have a large 
number of photons then linear regression will work, and you are back 
to Diederichs (2003). But when you're photon-starved the statistics of 
single photons become more and more important.  This led me to: is it 
k? or k+1 ?  When k=0 getting this wrong could introduce a factor of 
infinity.


So, perhaps the big "consequence of getting it wrong" is embarrassing 
myself by re-making a 200-year old mistake I am not currently aware 
of. I am confident a solution exists, but only recently started 
working on this.  So, I figured ... ask the world?


-James Holton
MAD Scientist


On 10/17/2021 1:51 AM, Frank Von Delft wrote:
James, I've been watching the thread with fascination, but also the 
confusion of wild ignorance. I've finally realised why.


What I've missed is: what exactly makes the question so important?  
I've understood what brought it up, if course, but not the 
consequence of getting it wrong.


Frank

Sent from tiny silly touch screen

*From:* James Holton 
*Sent:* Saturday, 16 October 2021 20:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] am I doing this right?

Thank you everyone for your thoughtful and thought-provoking responses!

But, I am starting to think I was not as clear as I could have been
about my question.  I am actually concerning myself with background, not
necessarily Bragg peaks.  With Bragg photons you want the sum, but for
background you want the average.

What I'm getting at is: how does one properly weight a zero-photon