Re: [ccp4bb] Quantifying electron density inside of a given volume

James Holton Tue, 16 Aug 2022 20:08:46 -0700

Thank you Pavel, responses inline below. There are indeed a lot ofpossible approaches, and my experience is far from complete, but I willshare my impressions.


On 8/15/2022 8:15 PM, Pavel Afonine wrote:

Hi James,
I like your approach with dummy atoms and occupancy refinement.Dealing with actual maps sounds like hell to me indeed (especiallygiven that we deal with weighted Fourier maps!).

I agree that dealing with maps directly can be tricky, but I have foundthat CCP4 support for this is extensive, if you know the right keywords!

Yes, the standard maps are weighted, but I've found the weighted mapsare closer to the "true density" than any other kind of map. At least,in terms of correlation coefficients. I have tried scaling the 2mFo-DFcmap to an Fcalc map (with solvent) to try and recover the true, absolutescale. Usually the scale factor is not worse than ~0.8 overall. This isprobably because the average value of "m" is close to that? One couldargue that you should scale up all maps by the FOM, but then again isn'tthe maximum-likelihood number of electrons reflected in the weightedmap? Such re-scaling is not possible with the mFo-DFc map, of course. So, yes, another good argument for doing occupancy refinement instead ofintegrating density!

Reading this as someone who immediately translates this into acomputer code (in my mind), a few things that aren't clear to me:
- Where exactly inside the blob of density do you place these dummy atoms?

Where? At the peaks. What I usually do is pick peaks, put atoms at thehighest ones, then either refine for a bit or simply subtract a "divot"of density around each added atom and then look for new peaks. Once theheight of after-refinement positive difference features drops to aninsignificant level I stop. NB that "significance" of any sigma leveldepends on the number of voxels in the map, but its usually between 3.5and 4.5 sigmas. One can also start with a 3D "grid" of dummy atoms. Gaussians spacedwithin 0.8*fwhm of each other form a pretty flat density profile. Insuch cases refining just one overall B factor and individual occupancieswith no xyz motion is pretty effective all by itself.

- How many do you place?

Enough for the largest remaining peak to become insignificant. In thecase of occupancy refinement one might also us the final total recoveredoccupancy ad a convergence criterion. I find it best to use aniterative process, but this is also slower. 10-sigma peaks are prettyobviously not noise, but once you get down into the ~4 sigma level thelikelihood that one of them is a "noise peak" gets higher. Eventually,it is hard to avoid having at least a few "noise peaks".This is why I wrote my "watershed" program. This is run after theinitial adding/refining procedure has converged. Watershed procedureis: take each and every occupancy-refined atoms one at a time, deleteit, and re-refine everything else to see if losing that one atom madeany difference. Using multiple CPUs, you can do all the "minus one"refinements in parallel. In some cases Rwork will actually improve afterdeleting an atom, so in that case you should definitely throw it out. Once the least-useful dummy atom has been eliminated, you can repeat theprocess with the new, improved constellation of atoms. In the end, youhave established that every single atom in your dummy constellation is"needed". By "needed" I mean without it the Rwork goes up. You may wantto use other criteria, such as the total occupancy drops, or you get asignificant difference peak. In general, this "watershed" procedurefinds at least a few waters in most structures that don't need to bethere. I have found that just looking at Rwork is a pretty good way toeliminate waters that were built into "noise peaks". So, if I know I'mgoing to "watershed" anyway, I can be pretty aggressive with how low Igo in peak height in the water adding protocol.

- Is there a risk of placing them too close to the boundary of theblob (in which case the question remains: what is the boundary?)?

Placing at only the tallest peak is one way to avoid this. Grid layoutis not. However, one hopes that atoms placed in vacuum and refined inoccupancy will drop to occ=0. Either that or B=500. In practice with"grid of atoms" I tend to refine occupancy first, no B nor xyz. Theatoms then act like a pinscreen, capturing the density shape ratherwell. Adding B factors then lets the slope of the edges of the blobmatch better. Doing occ, B and xyz all at the same time on a gridusually blows up.And by "blow up", I mean that Rwork and Rfree rapidly rise into the50%s. I can stabilize it by using heavy damping and lots of cycles, butthis is very slow. I think this blowup happens because the default stepsize in refinement programs is too large for this "crowded atom" situation.

- I guess O or C as an atom type should do it, but what about B factor(would you refine B as well?)?

I find oxygen will do in peak-picking cases, but for grid atoms undervery smooth density and closely-spaced atoms I have found the individualoccupancies get rather small, the round-off error of occupancy from 0.01to 0.00 creates a granularity of ~0.1 electron, and this can startcreating artificial noise in the fit. For cases like this I have goneto "liquid helium", or modelling dummy atoms as He. Yes, you might thinkthat hydrogen would be better, but H atoms have so many special checksand whiz-bangs for how they are treated I eventually gave up and went toHe. (One could also argue that at low enough temperature He atoms areallowed to "overlap" anyway. ;) )

Formally, it shouldn't matter (much) what the B factors refine to, sinceB does not affect the number of electrons in the atom. However, Bfactors most certainly do affect peak heights, and the "tails" can startto be lost when the B factor gets big. This is because the program thatgenerates the calculated map used to make Fcalc only plots each atom outto a certain radius. I'm not sure what it is in phenix nor refmac, butin CCP4's sfall it is 4.5 A. Sfall doesn't support B>99 because atomswith higher B factors start to lose significant density beyond thatradius. For B=500 you need a radius of at least 10 A. You'd think thatthe practice of deleting B=500 atoms or slowly lowering theiroccupancies would be a good strategy, but oddly enough I have not foundthat to be the case in practice. Sometimes a big, flat atom is what thedensity wants. I have had some limited success with breaking up high-Batoms into a 2x2x2 grid of new atoms with lower B factors, and thatseems to work fairly well. Exactly what threshold to use is still a goodquestion. SHELX, I believe, has a mechanism for splitting highlyanisotropic B factors into two atoms, so there is precedence.

- if you refine B, how do you deconvolute occupancy from refined Bvalues (and eventually from effects of positional errors of your DAs)?

For the grid-of-atoms approach I've found a fixed overall B factor andxyz positions at first is a good place to start. I then add Brefinement, then xyz. For a small blob this can be stable, but if youtry to do this for the whole bulk solvent region, for example, it blows up.

-....
- How all these choices are going to affect the result?

I always recommend doing multi-start refinement, trying a variety ofstrategies to see how much the result jumps around. One might even wantto compare integrated density to occupancy-refined recovered electroncount? There is the old adage that "a person with two thermocouplesnever knows what temperature it is", but personally, I'd rather know theerror bars.

And the overall accuracy? I doubt it will ever be less than Rfree, whichfor macromolecules is still in the 20%s. For now....


Cheers!

-James Holton
MAD Scientist



On Mon, Aug 15, 2022 at 4:38 PM James Holton <jmhol...@lbl.gov> wrote:

    There are several programs for integrating electron density, but
    please let me assure you that it is almost always the wrong thing
    to do.

    A much better strategy is occupancy refinement.  Throw in dummy
    atoms, turn off non-bonded interactions to them, and refine their
    occupancy until it a) stops changing (may be more than one round),
    and b) there are no Fo-Fc differences left in the region of
    interest.  Then all you do is add up the occupancies, multiply by
    the relevant atomic number (usually 8), and voila! you get the
    best-fit number of electrons in your blob. You may want to try
    re-running with random starting points to get an idea of the error
    bars.

    What is wrong with integrating density?  Well, for one, it is hard
    to know where to set the boundaries. Integrated density can be
    VERY sensitive to the choice of radius, making your arbitrary
    decision of which radius to use a source of error. Too small and
    you miss stuff. Too big and you add unnecessary noise. Also,
    neighboring atoms have tails, and if you don't subtract them
    properly, that is another source of error. Also, because of the
    missing F000 term, there is an offset, which adds a term
    proportional to the integration volume.  For example, an integral
    resulting in zero "electrons" does NOT mean you have vacuum. It
    just means that the area you integrated has the same average
    density as the entire map. This may not be the number you want.

    The beauty of occupancy refinement is that it automatically
    handles all these problems. The "vacuum level" and F000 are known
    quantities in the calculated map. The B factors given to the dummy
    atoms als o allow the borders of your integration region to be
    "soft": down-weighting the contribution of map noise far from your
    region of interest. And, finally, by including atoms in the green
    density, neighboring atoms won't be sucked into it.

    Think of it as fitting a smooth curve to noisy data and the number
    of electrons is just a parameter in that fit, rather than trying
    to integrate the noisy data itself.  This is not an analogy.
    Refinement programs are really just very sophisticated
    curve-fitting programs.  And if you have a forest of overlapping
    peaks and you are trying to de-convolute the area/volume of one
    peak, it is best to include that peak in the fit, rather than
    leave it out. Shoulder peaks especially tend to get "eaten" by
    large neighboring peaks.

    How do you turn off non-bonds? Well, there is documentation for
    refmac:
    http://www.ysbl.york.ac.uk/refmac/data/refmac_keywords.html
    and phenix:
    https://phenix-online.org/documentation/reference/refinement.html

    All that said, to answer the original question:
     One very easy thing to do within the CCP4 suite is to use
    "mapmask" to make a mask corresponding to your "sphere", or other
    region of interest.  Perhaps place a water at the center of your
    peak, and either use the "border" feature of mapmask, or use
    "sfall" to compute a calculated map and convert that to a mask
    using "threshold" in mapmask.  This mask should have values of 0
    or 1 at every voxel. (or, if you feel like being clever, something
    between 0 and 1 to reflect how much you want to weight a given
    voxel). You can check it in coot. If you then multiply this mask
    by your mFo-DFc map the result will have a non-zero average value.
    This will be printed out in the log file. Multiply this average
    value by the volume of the unit cell and you have your integrated
    number of electrons. Yes, its that simple.
    One issue you may have is map parameter compatibility (grid
    spacing, axis order, xyz limits, etc.). You get around these by
    using the same grid in all your fft or sfall runs, and then use
    mapmask to make the axis and limits match before you multiply the
    map and mask.  The only other issue here might be the average
    value being a very small number and rounded off by the default
    print precision. You can fix this by multiplying the map by a
    large constant (again, using mapmask), then the printed value will
    have lots of digits.

    This may seem complicated, but the use of masks can be a very
    valuable skill to develop.  In fact, one way to simplify,
    stabilize and accelerate the occupancy refinement described above
    is to use a mask to isolate the region of interest. That is, take
    the mFo-DFc map, zero out everything far away from your peak, and
    convert the result to structure factors. You can then call these
    structure factors "Fobs" (alongside the original sigma(Fobs)) in a
    new refinement. The Rwork/Rfree then becomes a local statistic,
    indicative of the % error in your refined total occupancy. One
    caveat is that if every atom in the new refinement is having its
    occupancy refined you will lose the absolute scale. To fix this,
    you need to add back at least one well-ordered atom into "Fobs",
    and also include it in the model.  For example, take a
    well-ordered helix, extract those atoms, calculate a map using
    "sfall", and add it to the masked-off difference map before doing
    the "new Fobs" structure factor calculation. Include these same
    atoms in the new refinement. They won't move, but they will keep
    the scale stable.  Oh, and don't forget to turn off the bulk
    solvent correction! The bulk solvent has already been subtracted
    in the mFo-DFc map.

    Hope this all makes sense and feel free to ask follow-up questions,

    -James Holton
    MAD Scientist


    On 8/10/2022 9:59 AM, Neno Vuksanovic wrote:

    Dear All,

    I would like to quantify electron density inside of positive
    Fo-Fc blobs in active sites of multiple protomers in the map and
    compare them. I am aware that I can interpolate maps and obtain
    density values at coordinate points using either MapMan, Chimera
    or Coot, but I would like to know if there is a tool that would
    let me designate a sphere of a certain volume and calculate total
    electron density inside it?

    Best Regards,
    Neno

    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>



    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Quantifying electron density inside of a given volume

Reply via email to