Re: [ccp4bb] [3dem] Which resolution?

Kay Diederichs Thu, 05 Mar 2020 12:40:22 -0800

Dear James,

important and educational points! This triggers some thoughts ...


The one point where I don't quite agree is with "What about filtering out the 
noise?  An ideal noise suppression filter has the same shape as the signal (I 
found that in Numerical Recipes), and the shape of the signal from a 
macromolecule is a Gaussian in reciprocal space (aka straight line on a Wilson 
plot). This is true, by the way, for both a molecule packed into a crystal or 
free in solution.  So, the ideal noise-suppression filter is simply applying a 
B factor.  "

I think we can do better than that. We should use the knowledge about the 
actual signal and its noise (which we measure) as a weighting factor, rather 
than that of the theoretical signal (the straight line in the Wilson plot), for 
the purpose of noise suppression. Formula 13.3.6 in Numerical Recipes (3rd ed., 
2007), which gives the optimal (Wiener) filter to be used for weighting, is 
phi(f) = S(f)^2/(S(f)^2 + N(f)^2)  . But at high resolution, this is just 
CC1/2: see Box 1 of reference (1) - the formula CC1/2 = 1/(1 + 2/(I/sigma)^2 
can be written as CC1/2 = <I>^2/(<I>^2 + 2 <sigma>^2) where sigma is the 
estimate of the noise in I ; don't know right now why there is a factor of 2).  
This goes to zero at the resolution where the signal goes to zero, and is near 
one in the resolution range in which we have good knowledge of the signal. 
(I only thought about this today, and I also considered CC* as a weighting 
factor, as I understand is suggested by Rosenthal and Henderson, J.Mol.Biol. 
2003, but I cannot convince myself currently that this is right. Anyway, the 
shape of the CC* curve as a function of resolution matches that of CC1/2)
In other words, we should be able to suppress the noise by multiplying the 
Fourier coefficients used for map calculation with (a smooth 
resolution-dependent approximation of) CC1/2. This should allow to sharpen, 
with the best noise suppression we can get.

Thinking about this, we are already typically using weighted Fourier 
coefficients of the form 2mFobs-DFcalc for map calculation. Aren't these 
already weighted in the correct way? I think not - those m and D weights are 
calculated from estimates of model (in-)accuracy and (in-)completeness, but 
don't properly take the measurement errors into account. Of course, since noisy 
data make the sigmaA values worse, the noise in the data influences sigmaA, but 
not in the functionally correct form. To my understanding, the correct way to 
take account of both model and data errors is given by reference (2), which - 
to my knowledge - is not yet implemented except in PHASER. 

Hope this makes sense!

Kay

References:
(1) Karplus & Diederichs (2015) Assessing and maximizing data quality in 
macromolecular crystallography.
Curr. Opin. Struct. Biol. 34, 60-68 . PDF at 
https://www.biologie.uni-konstanz.de/typo3temp/secure_downloads/82815/0/2b10c9e6f9a28129e1b119d21aeeab217c918bb1/Karplus2015_CurrOpinStructBiol.pdf
(2) RJ Read, AJ McCoy (2016) A log-likelihood-gain intensity target for 
crystallographic phasing that accounts for experimental error. Acta 
Crystallographica Section D: Structural Biology 72 (3), 375-387 
https://scripts.iucr.org/cgi-bin/paper?dz5382

On Thu, 5 Mar 2020 01:11:33 +0100, James Holton <jmhol...@lbl.gov> wrote:

>
>The funny thing is, although we generally regard resolution as a primary
>indicator of data quality the appearance of a density map at the classic
>"1-sigma" contour has very little to do with resolution, and everything
>to do with the B factor.
>
>Seriously, try it. Take any structure you like, set all the B factors to
>30 with PDBSET, calculate a map with SFALL or phenix.fmodel and have a
>look at the density of tyrosine (Tyr) side chains.  Even if you
>calculate structure factors all the way out to 1.0 A the holes in the
>Tyr rings look exactly the same: just barely starting to form.  This is
>because the structure factors from atoms with B=30 are essentially zero
>out at 1.0 A, and adding zeroes does not change the map.  You can adjust
>the contour level, of course, and solvent content will have some effect
>on where the "1-sigma" contour lies, but generally B=30 is the point
>where Tyr side chains start to form their holes.  Traditionally, this is
>attributed to 1.8A resolution, but it is really at B=30.  The point
>where waters first start to poke out above the 1-sigma contour is at
>B=60, despite being generally attributed to d=2.7A.
>
>Now, of course, if you cut off this B=30 data at 3.5A then the Tyr side
>chains become blobs, but that is equivalent to collecting data with the
>detector way too far away and losing your high-resolution spots off the
>edges.  I have seen a few people do that, but not usually for a
>published structure.  Most people fight very hard for those faint,
>barely-existing high-angle spots.  But why do we do that if the map is
>going to look the same anyway?  The reason is because resolution and B
>factors are linked.
>
>Resolution is about separation vs width, and the width of the density
>peak from any atom is set by its B factor.  Yes, atoms have an intrinsic
>width, but it is very quickly washed out by even modest B factors (B >
>10).  This is true for both x-ray and electron form factors. To a very
>good approximation, the FWHM of C, N and O atoms is given by:
>FWHM= sqrt(B*log(2))/pi+0.15
>
>where "B" is the B factor assigned to the atom and the 0.15 fudge factor
>accounts for its intrinsic width when B=0.  Now that we know the peak
>width, we can start to ask if two peaks are "resolved".
>
>Start with the classical definition of "resolution" (call it after Airy,
>Raleigh, Dawes, or whatever famous person you like), but essentially you
>are asking the question: "how close can two peaks be before they merge
>into one peak?".  For Gaussian peaks this is 0.849*FWHM. Simple enough.
>However, when you look at the density of two atoms this far apart you
>will see the peak is highly oblong. Yes, the density has one maximum,
>but there are clearly two atoms in there.  It is also pretty obvious the
>long axis of the peak is the line between the two atoms, and if you fit
>two round atoms into this peak you recover the distance between them
>quite accurately.  Are they really not "resolved" if it is so clear
>where they are?
>
>In such cases you usually want to sharpen, as that will make the oblong
>blob turn into two resolved peaks.  Sharpening reduces the B factor and
>therefore FWHM of every atom, making the "resolution" (0.849*FWHM) a
>shorter distance.  So, we have improved resolution with sharpening!  Why
>don't we always do this?  Well, the reason is because of noise.  
>Sharpening up-weights the noise of high-order Fourier terms and
>therefore degrades the overall signal-to-noise (SNR) of the map.  This
>is what I believe Colin would call reduced "contrast".  Of course, since
>we view maps with a threshold (aka contour) a map with SNR=5 will look
>almost identical to a map with SNR=500. The "noise floor" is generally
>well below the 1-sigma threshold, or even the 0-sigma threshold
>(https://doi.org/10.1073/pnas.1302823110).  As you turn up the
>sharpening you will see blobs split apart and also see new peaks rising
>above your map contouring threshold.  Are these new peaks real?  Or are
>they noise?  That is the difference between SNR=500 and SNR=5,
>respectively.  The tricky part of sharpening is knowing when you have
>reached the point where you are introducing more noise than signal. 
>There are some good methods out there, but none of them are perfect.
>
>What about filtering out the noise?  An ideal noise suppression filter
>has the same shape as the signal (I found that in Numerical Recipes),
>and the shape of the signal from a macromolecule is a Gaussian in
>reciprocal space (aka straight line on a Wilson plot). This is true, by
>the way, for both a molecule packed into a crystal or free in solution. 
>So, the ideal noise-suppression filter is simply applying a B factor. 
>Only problem is: sharpening is generally done by applying a negative B
>factor, so applying a Gaussian blur is equivalent to just not sharpening
>as much. So, we are back to "optimal sharpening" again.
>
>Why not use a filter that is non-Gaussian?  We do this all the time! 
>Cutting off the data at a given resolution (d) is equivalent to blurring
>the map with this function:
>
>kernel_d(r) = 4/3*pi/d**3*sinc3(2*pi*r/d)
>sinc3(x) = (x==0?1:3*(sin(x)/x-cos(x))/(x*x))
>
>where kernel_d(r) is the normalized weight given to a point "r" Angstrom
>away from the center of each blurring operation, and "sinc3" is the
>Fourier synthesis of a solid sphere.  That is, if you make an HKL file
>with all F=1 and PHI=0 out to a resolution d, then effectively all hkls
>beyond the resolution limit are zero. If you calculate a map with those
>Fs, you will find the kernel_d(r) function at the origin.  What that
>means is: by applying a resolution cutoff, you are effectively
>multiplying your data by this sphere of unit Fs, and since a
>multiplication in reciprocal space is a convolution in real space, the
>effect is convoluting (blurring) with kernel_d(x).
>
>For comparison, if you apply a B factor, the real-space blurring kernel
>is this:
>kernel_B(r) = (4*pi/B)**1.5*exp(-4*pi**2/B*r*r)
>
>If you graph these two kernels (format is for gnuplot) you will find
>that they have the same FWHM whenever B=80*(d/3)**2.  This "rule" is the
>one I used for my resolution demonstration movie I made back in the late
>20th century:
>https://bl831.als.lbl.gov/~jamesh/movies/index.html#resolution
>
>What I did then was set all atomic B factors to B = 80*(d/3)^2 and then
>cut the resolution at "d".  Seemed sensible at the time.  I suppose I
>could have used the PDB-wide average atomic B factor reported for
>structures with resolution "d", which roughly follows:
>B = 4*d**2+12
>https://bl831.als.lbl.gov/~jamesh/pickup/reso_vs_avgB.png
>
>The reason I didn't use this formula for the movie is because I didn't
>figure it out until about 10 years later.  These two curves cross at
>1.5A, but diverge significantly at poor resolution.  So, which one is
>right?  It depends on how well you can measure really really faint
>spots, and we've been getting better at that in recent decades.
>
>So, what I'm trying to say here is that just because your data has CC1/2
>or FSC dropping off to insignificance at 1.8 A doesn't mean you are
>going to see holes in Tyr side chains.  However, if you measure your
>weak, high-res data really well (high multiplicity), you might be able
>to sharpen your way to a much clearer map.
>
>-James Holton
>MAD Scientist
>
>On 2/27/2020 11:01 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
>>
>> James
>>
>> All you say seems sensible to me but there is the possibility of
>> confusion regarding the use of the word threshold. I fully agree that
>> a half bit information threshold is inappropriate if it is taken to
>> mean that the data should be truncated at that resolution. The ever
>> more sophisticated refinement programs are becoming adept at handling
>> the noisy data.
>>
>> The half bit information threshold I was discussing refers to a
>> nominal resolution. This is not just for trivial reporting purposes.
>> The half bit threshold is being used to compare imaging methods and
>> perhaps demonstrate that significant information is present with a
>> dose below any radiation damage threshold (that word again). The
>> justification for doing this appears to come from the fact it has been
>> adopted for protein structure determination by single particle
>> electron microscopy. However, low contrast features might not be
>> visible at this nominal resolution.
>>
>> The analogy with protein crystallography might be to collect data
>> below an absorption edge to give a nominal resolution of 2 angstrom.
>> Then do it again well above the absorption edge. The second one gives
>> much greater Bijvoet differences despite the fact that the nominal
>> resolution is the same. I doubt whether anyone doing this would be
>> misled by this as they would examine the statistics for the Bijvoet
>> differences instead. However, it does indicate the relationship
>> between contrast and resolution.
>>
>> The question, if referring to an information threshold for nominal
>> resolution, could be “Is there significant information in the data at
>> the required contrast and resolution?”. Then “Can one obtain this
>> information at a dose below any radiation damage limit”
>>
>> Keep posting!
>>
>> Regards
>>
>> Colin
>>
>> *From:*James Holton <jmhol...@lbl.gov>
>> *Sent:* 27 February 2020 01:14
>> *To:* CCP4BB@JISCMAIL.AC.UK
>> *Cc:* Nave, Colin (DLSLtd,RAL,LSCI) <colin.n...@diamond.ac.uk>
>> *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>>
>> In my opinion the threshold should be zero bits.  Yes, this is where
>> CC1/2 = 0 (or FSC = 0).  If there is correlation then there is
>> information, and why throw out information if there is information to
>> be had?  Yes, this information comes with noise attached, but that is
>> why we have weights.
>>
>> It is also important to remember that zero intensity is still useful
>> information.  Systematic absences are an excellent example.  They have
>> no intensity at all, but they speak volumes about the structure.  In a
>> similar way, high-angle zero-intensity observations also tell us
>> something.  Ever tried unrestrained B factor refinement at poor
>> resolution?  It is hard to do nowadays because of all the safety
>> catches in modern software, but you can get great R factors this way. 
>> A telltale sign of this kind of "over fitting" is remarkably large
>> Fcalc values beyond the resolution cutoff.  These don't contribute to
>> the R factor, however, because Fobs is missing for these hkls. So,
>> including zero-intensity data suppresses at least some types of
>> over-fitting.
>>
>> The thing I like most about the zero-information resolution cutoff is
>> that it forces us to address the real problem: what do you mean by
>> "resolution" ?  Not long ago, claiming your resolution was 3.0 A meant
>> that after discarding all spots with individual I/sigI < 3 you still
>> have 80% completeness in the 3.0 A bin.  Now we are saying we have a
>> 3.0 A data set when we can prove statistically that a few
>> non-background counts fell into the sum of all spot areas at 3.0 A. 
>> These are not the same thing.
>>
>> Don't get me wrong, including the weak high-resolution information
>> makes the model better, and indeed I am even advocating including all
>> the noisy zeroes.  However, weak data at 3.0 A is never going to be as
>> good as having strong data at 3.0 A.  So, how do we decide?  I
>> personally think that the resolution assigned to the PDB deposition
>> should remain the classical I/sigI > 3 at 80% rule.  This is really
>> the only way to have meaningful comparison of resolution between very
>> old and very new structures.  One should, of course, deposit all the
>> data, but don't claim that cut-off as your "resolution".  That is just
>> plain unfair to those who came before.
>>
>> Oh yeah, and I also have a session on "interpreting low-resolution
>> maps" at the GRC this year.
>> https://www.grc.org/diffraction-methods-in-structural-biology-conference/2020/
>>
>> So, please, let the discussion continue!
>>
>> -James Holton
>> MAD Scientist
>>
>> On 2/22/2020 11:06 AM, Nave, Colin (DLSLtd,RAL,LSCI) wrote:
>>
>>     Alexis
>>
>>     This is a very useful summary.
>>
>>     You say you were not convinced by Marin's derivation in 2005. Are
>>     you convinced now and, if not, why?
>>
>>     My interest in this is that the FSC with half bit thresholds have
>>     the danger of being adopted elsewhere because they are becoming
>>     standard for protein structure determination (by EM or MX). If it
>>     is used for these mature techniques it must be right!
>>
>>     It is the adoption of the ½ bit threshold I worry about. I gave a
>>     rather weak example for MX which consisted of partial occupancy of
>>     side chains, substrates etc. For x-ray imaging a wide range of
>>     contrasts can occur and, if you want to see features with only a
>>     small contrast above the surroundings then I think the half bit
>>     threshold would be inappropriate.
>>
>>     It would be good to see a clear message from the MX and EM
>>     communities as to why an information content threshold of ½ a bit
>>     is generally appropriate for these techniques and an
>>     acknowledgement that this threshold is technique/problem dependent.
>>
>>     We might then progress from the bronze age to the iron age.
>>
>>     Regards
>>
>>     Colin
>>
>>     *From:*CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK>
>>     <mailto:CCP4BB@JISCMAIL.AC.UK> *On Behalf Of *Alexis Rohou
>>     *Sent:* 21 February 2020 16:35
>>     *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
>>     *Subject:* Re: [ccp4bb] [3dem] Which resolution?
>>
>>     Hi all,
>>
>>     For those bewildered by Marin's insistence that everyone's been
>>     messing up their stats since the bronze age, I'd like to offer
>>     what my understanding of the situation. More details in this
>>     thread from a few years ago on the exact same topic:
>>
>>     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
>>
>>     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
>>
>>     Notwithstanding notational problems (e.g. strict equations as
>>     opposed to approximation symbols, or omission of symbols to denote
>>     estimation), I believe Frank & Al-Ali and "descendent" papers
>>     (e.g. appendix of Rosenthal & Henderson 2003) are fine. The cross
>>     terms that Marin is agitated about indeed do in fact have an
>>     expectation value of 0.0 (in the ensemble; if the experiment were
>>     performed an infinite number of times with different realizations
>>     of noise). I don't believe Pawel or Jose Maria or any of the other
>>     authors really believe that the cross-terms are orthogonal.
>>
>>     When N (the number of independent Fouier voxels in a shell) is
>>     large enough, mean(Signal x Noise) ~ 0.0 is only an approximation,
>>     but a pretty good one, even for a single FSC experiment. This is
>>     why, in my book, derivations that depend on Frank & Al-Ali are OK,
>>     under the strict assumption that N is large. Numerically, this
>>     becomes apparent when Marin's half-bit criterion is plotted -
>>     asymptotically it has the same behavior as a constant threshold.
>>
>>     So, is Marin wrong to worry about this? No, I don't think so.
>>     There are indeed cases where the assumption of large N is broken.
>>     And under those circumstances, any fixed threshold (0.143, 0.5,
>>     whatever) is dangerous. This is illustrated in figures of van Heel
>>     & Schatz (2005). Small boxes, high-symmetry, small objects in
>>     large boxes, and a number of other conditions can make fixed
>>     thresholds dangerous.
>>
>>     It would indeed be better to use a non-fixed threshold. So why am
>>     I not using the 1/2-bit criterion in my own work? While
>>     numerically it behaves well at most resolution ranges, I was not
>>     convinced by Marin's derivation in 2005. Philosophically though, I
>>     think he's right - we should aim for FSC thresholds that are more
>>     robust to the kinds of edge cases mentioned above. It would be the
>>     right thing to do.
>>
>>     Hope this helps,
>>
>>     Alexis
>>
>>     On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
>>     <pawel.a.penc...@uth.tmc.edu <mailto:pawel.a.penc...@uth.tmc.edu>>
>>     wrote:
>>
>>         Marin,
>>
>>         The statistics in 2010 review is fine. You may disagree with
>>         assumptions, but I can assure you the “statistics” (as you
>>         call it) is fine. Careful reading of the paper would reveal to
>>         you this much.
>>
>>         Regards,
>>
>>         Pawel
>>
>>
>>
>>
>>             On Feb 16, 2020, at 10:38 AM, Marin van Heel
>>             <marin.vanh...@googlemail.com
>>             <mailto:marin.vanh...@googlemail.com>> wrote:
>>
>>             
>>
>>             ***** EXTERNAL EMAIL *****
>>
>>             Dear Pawel and All others ....
>>
>>             This 2010 review is - unfortunately - largely based on the
>>             flawed statistics I mentioned before, namely on the a
>>             priori assumption that the inner product of a signal
>>             vector and a noise vector are ZERO (an orthogonality
>>             assumption).  The (Frank & Al-Ali 1975) paper we have
>>             refuted on a number of occasions (for example in 2005, and
>>             most recently in our BioRxiv paper) but you still take
>>             that as the correct relation between SNR and FRC (and you
>>             never cite the criticism...).
>>
>>             Sorry
>>
>>             Marin
>>
>>             On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
>>             <pawel.a.penc...@uth.tmc.edu
>>             <mailto:pawel.a.penc...@uth.tmc.edu>> wrote:
>>
>>                 Dear Teige,
>>
>>                 I am wondering whether you are familiar with
>>
>>
>>                     Resolution measures in molecular electron microscopy.
>>
>>                 Penczek PA. Methods Enzymol. 2010.
>>
>>
>>                       Citation
>>
>>                 Methods Enzymol. 2010;482:73-100. doi:
>>                 10.1016/S0076-6879(10)82003-8.
>>
>>                 You will find there answers to all questions you asked
>>                 and much more.
>>
>>                 Regards,
>>
>>                 Pawel Penczek
>>
>>                 Regards,
>>
>>                 Pawel
>>
>>                 _______________________________________________
>>                 3dem mailing list
>>                 3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
>>                 https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>                 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
>>
>>         _______________________________________________
>>         3dem mailing list
>>         3...@ncmir.ucsd.edu <mailto:3...@ncmir.ucsd.edu>
>>         https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
>>
>>     ------------------------------------------------------------------------
>>
>>     To unsubscribe from the CCP4BB list, click the following link:
>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>>     --
>>
>>     This e-mail and any attachments may contain confidential,
>>     copyright and or privileged material, and are for the use of the
>>     intended addressee only. If you are not the intended addressee or
>>     an authorised recipient of the addressee please notify us of
>>     receipt by returning the e-mail and do not use, copy, retain,
>>     distribute or disclose the information in or attached to the e-mail.
>>     Any opinions expressed within this e-mail are those of the
>>     individual and not necessarily of Diamond Light Source Ltd.
>>     Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>     attachments are free from viruses and we cannot accept liability
>>     for any damage which you may sustain as a result of software
>>     viruses which may be transmitted in or with the message.
>>     Diamond Light Source Limited (company no. 4375679). Registered in
>>     England and Wales with its registered office at Diamond House,
>>     Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>>     0DE, United Kingdom
>>
>>     ------------------------------------------------------------------------
>>
>>     To unsubscribe from the CCP4BB list, click the following link:
>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>> --
>>
>> This e-mail and any attachments may contain confidential, copyright
>> and or privileged material, and are for the use of the intended
>> addressee only. If you are not the intended addressee or an authorised
>> recipient of the addressee please notify us of receipt by returning
>> the e-mail and do not use, copy, retain, distribute or disclose the
>> information in or attached to the e-mail.
>> Any opinions expressed within this e-mail are those of the individual
>> and not necessarily of Diamond Light Source Ltd.
>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>> attachments are free from viruses and we cannot accept liability for
>> any damage which you may sustain as a result of software viruses which
>> may be transmitted in or with the message.
>> Diamond Light Source Limited (company no. 4375679). Registered in
>> England and Wales with its registered office at Diamond House, Harwell
>> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
>> Kingdom
>>
>
>
>########################################################################
>
>To unsubscribe from the CCP4BB list, click the following link:
>https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] [3dem] Which resolution?

Reply via email to