Thank you Pavel, responses inline below.  There are indeed a lot of possible approaches, and my experience is far from complete, but I will share my impressions.

On 8/15/2022 8:15 PM, Pavel Afonine wrote:
Hi James,

I like your approach with dummy atoms and occupancy refinement. Dealing with actual maps sounds like hell to me indeed (especially given that we deal with weighted Fourier maps!).
I agree that dealing with maps directly can be tricky, but I have found that CCP4 support for this is extensive, if you know the right keywords!

Yes, the standard maps are weighted, but I've found the weighted maps are closer to the "true density" than any other kind of map. At least, in terms of correlation coefficients. I have tried scaling the 2mFo-DFc map to an Fcalc map (with solvent) to try and recover the true, absolute scale. Usually the scale factor is not worse than ~0.8 overall. This is probably because the average value of "m" is close to that?  One could argue that you should scale up all maps by the FOM, but then again isn't the maximum-likelihood number of electrons reflected in the weighted map?  Such re-scaling is not possible with the mFo-DFc map, of course.  So, yes, another good argument for doing occupancy refinement instead of integrating density!
Reading this as someone who immediately translates this into a computer code (in my mind), a few things that aren't clear to me:
- Where exactly inside the blob of density do you place these dummy atoms?
Where? At the peaks. What I usually do is pick peaks, put atoms at the highest ones, then either refine for a bit or simply subtract a "divot" of density around each added atom and then look for new peaks. Once the height of after-refinement positive difference features drops to an insignificant level I stop.  NB that "significance" of any sigma level depends on the number of voxels in the map, but its usually between 3.5 and 4.5 sigmas.  One can also start with a 3D "grid" of dummy atoms.  Gaussians spaced within 0.8*fwhm of each other form a pretty flat density profile.  In such cases refining just one overall B factor and individual occupancies with no xyz motion is pretty effective all by itself.
- How many do you place?
Enough for the largest remaining peak to become insignificant.  In the case of occupancy refinement one might also us the final total recovered occupancy ad a convergence criterion.  I find it best to use an iterative process, but this is also slower. 10-sigma peaks are pretty obviously not noise, but once you get down into the ~4 sigma level the likelihood that one of them is a "noise peak" gets higher. Eventually, it is hard to avoid having at least a few "noise peaks". This is why I wrote my "watershed" program. This is run after the initial adding/refining procedure has converged.  Watershed procedure is: take each and every occupancy-refined atoms one at a time, delete it, and re-refine everything else to see if losing that one atom made any difference.  Using multiple CPUs, you can do all the "minus one" refinements in parallel. In some cases Rwork will actually improve after deleting an atom, so in that case you should definitely throw it out.  Once the least-useful dummy atom has been eliminated, you can repeat the process with the new, improved constellation of atoms.  In the end, you have established that every single atom in your dummy constellation is "needed".  By "needed" I mean without it the Rwork goes up. You may want to use other criteria, such as the total occupancy drops, or you get a significant difference peak.  In general, this "watershed" procedure finds at least a few waters in most structures that don't need to be there.  I have found that just looking at Rwork is a pretty good way to eliminate waters that were built into "noise peaks".  So, if I know I'm going to "watershed" anyway, I can be pretty aggressive with how low I go in peak height in the water adding protocol.
- Is there a risk of placing them too close to the boundary of the blob (in which case the question remains: what is the boundary?)?
Placing at only the tallest peak is one way to avoid this.  Grid layout is not.  However, one hopes that atoms placed in vacuum and refined in occupancy will drop to occ=0.  Either that or B=500.  In practice with "grid of atoms" I tend to refine occupancy first, no B nor xyz. The atoms then act like a pinscreen, capturing the density shape rather well. Adding B factors then lets the slope of the edges of the blob match better.  Doing occ, B and xyz all at the same time on a grid usually blows up. And by "blow up", I mean that Rwork and Rfree rapidly rise into the 50%s. I can stabilize it by using heavy damping and lots of cycles, but this is very slow.  I think this blowup happens because the default step size in refinement programs is too large for this "crowded atom" situation.
- I guess O or C as an atom type should do it, but what about B factor (would you refine B as well?)?
I find oxygen will do in peak-picking cases, but for grid atoms under very smooth density and closely-spaced atoms I have found the individual occupancies get rather small, the round-off error of occupancy from 0.01 to 0.00 creates a granularity of ~0.1 electron, and this can start creating artificial noise in the fit.  For cases like this I have gone to "liquid helium", or modelling dummy atoms as He. Yes, you might think that hydrogen would be better, but H atoms have so many special checks and whiz-bangs for how they are treated I eventually gave up and went to He.  (One could also argue that at low enough temperature He atoms are allowed to "overlap" anyway. ;) )

Formally, it shouldn't matter (much) what the B factors refine to, since B does not affect the number of electrons in the atom. However, B factors most certainly do affect peak heights, and the "tails" can start to be lost when the B factor gets big.  This is because the program that generates the calculated map used to make Fcalc only plots each atom out to a certain radius.  I'm not sure what it is in phenix nor refmac, but in CCP4's sfall it is 4.5 A. Sfall doesn't support B>99 because atoms with higher B factors start to lose significant density beyond that radius. For B=500 you need a radius of at least 10 A. You'd think that the practice of deleting B=500 atoms or slowly lowering their occupancies would be a good strategy, but oddly enough I have not found that to be the case in practice.  Sometimes a big, flat atom is what the density wants. I have had some limited success with breaking up high-B atoms into a 2x2x2 grid of new atoms with lower B factors, and that seems to work fairly well. Exactly what threshold to use is still a good question.  SHELX, I believe, has a mechanism for splitting highly anisotropic B factors into two atoms, so there is precedence.
- if you refine B, how do you deconvolute occupancy from refined B values (and eventually from effects of positional errors of your DAs)?
For the grid-of-atoms approach I've found a fixed overall B factor and xyz positions at first is a good place to start. I then add B refinement, then xyz.  For a small blob this can be stable, but if you try to do this for the whole bulk solvent region, for example, it blows up.
-....
- How all these choices are going to affect the result?
I always recommend doing multi-start refinement, trying a variety of strategies to see how much the result jumps around.  One might even want to compare integrated density to occupancy-refined recovered electron count?  There is the old adage that "a person with two thermocouples never knows what temperature it is", but personally, I'd rather know the error bars.

And the overall accuracy? I doubt it will ever be less than Rfree, which for macromolecules is still in the 20%s.  For now....

Cheers!

-James Holton
MAD Scientist



On Mon, Aug 15, 2022 at 4:38 PM James Holton <jmhol...@lbl.gov> wrote:

    There are several programs for integrating electron density, but
    please let me assure you that it is almost always the wrong thing
    to do.

    A much better strategy is occupancy refinement.  Throw in dummy
    atoms, turn off non-bonded interactions to them, and refine their
    occupancy until it a) stops changing (may be more than one round),
    and b) there are no Fo-Fc differences left in the region of
    interest.  Then all you do is add up the occupancies, multiply by
    the relevant atomic number (usually 8), and voila! you get the
    best-fit number of electrons in your blob. You may want to try
    re-running with random starting points to get an idea of the error
    bars.

    What is wrong with integrating density?  Well, for one, it is hard
    to know where to set the boundaries. Integrated density can be
    VERY sensitive to the choice of radius, making your arbitrary
    decision of which radius to use a source of error. Too small and
    you miss stuff. Too big and you add unnecessary noise. Also,
    neighboring atoms have tails, and if you don't subtract them
    properly, that is another source of error. Also, because of the
    missing F000 term, there is an offset, which adds a term
    proportional to the integration volume.  For example, an integral
    resulting in zero "electrons" does NOT mean you have vacuum. It
    just means that the area you integrated has the same average
    density as the entire map. This may not be the number you want.

    The beauty of occupancy refinement is that it automatically
    handles all these problems. The "vacuum level" and F000 are known
    quantities in the calculated map. The B factors given to the dummy
    atoms als o allow the borders of your integration region to be
    "soft": down-weighting the contribution of map noise far from your
    region of interest. And, finally, by including atoms in the green
    density, neighboring atoms won't be sucked into it.

    Think of it as fitting a smooth curve to noisy data and the number
    of electrons is just a parameter in that fit, rather than trying
    to integrate the noisy data itself.  This is not an analogy.
    Refinement programs are really just very sophisticated
    curve-fitting programs.  And if you have a forest of overlapping
    peaks and you are trying to de-convolute the area/volume of one
    peak, it is best to include that peak in the fit, rather than
    leave it out. Shoulder peaks especially tend to get "eaten" by
    large neighboring peaks.

    How do you turn off non-bonds? Well, there is documentation for
    refmac:
    http://www.ysbl.york.ac.uk/refmac/data/refmac_keywords.html
    and phenix:
    https://phenix-online.org/documentation/reference/refinement.html

    All that said, to answer the original question:
     One very easy thing to do within the CCP4 suite is to use
    "mapmask" to make a mask corresponding to your "sphere", or other
    region of interest.  Perhaps place a water at the center of your
    peak, and either use the "border" feature of mapmask, or use
    "sfall" to compute a calculated map and convert that to a mask
    using "threshold" in mapmask.  This mask should have values of 0
    or 1 at every voxel. (or, if you feel like being clever, something
    between 0 and 1 to reflect how much you want to weight a given
    voxel). You can check it in coot. If you then multiply this mask
    by your mFo-DFc map the result will have a non-zero average value.
    This will be printed out in the log file. Multiply this average
    value by the volume of the unit cell and you have your integrated
    number of electrons. Yes, its that simple.
    One issue you may have is map parameter compatibility (grid
    spacing, axis order, xyz limits, etc.). You get around these by
    using the same grid in all your fft or sfall runs, and then use
    mapmask to make the axis and limits match before you multiply the
    map and mask.  The only other issue here might be the average
    value being a very small number and rounded off by the default
    print precision. You can fix this by multiplying the map by a
    large constant (again, using mapmask), then the printed value will
    have lots of digits.

    This may seem complicated, but the use of masks can be a very
    valuable skill to develop.  In fact, one way to simplify,
    stabilize and accelerate the occupancy refinement described above
    is to use a mask to isolate the region of interest. That is, take
    the mFo-DFc map, zero out everything far away from your peak, and
    convert the result to structure factors. You can then call these
    structure factors "Fobs" (alongside the original sigma(Fobs)) in a
    new refinement. The Rwork/Rfree then becomes a local statistic,
    indicative of the % error in your refined total occupancy. One
    caveat is that if every atom in the new refinement is having its
    occupancy refined you will lose the absolute scale. To fix this,
    you need to add back at least one well-ordered atom into "Fobs",
    and also include it in the model.  For example, take a
    well-ordered helix, extract those atoms, calculate a map using
    "sfall", and add it to the masked-off difference map before doing
    the "new Fobs" structure factor calculation. Include these same
    atoms in the new refinement. They won't move, but they will keep
    the scale stable.  Oh, and don't forget to turn off the bulk
    solvent correction! The bulk solvent has already been subtracted
    in the mFo-DFc map.

    Hope this all makes sense and feel free to ask follow-up questions,

    -James Holton
    MAD Scientist


    On 8/10/2022 9:59 AM, Neno Vuksanovic wrote:
    Dear All,

    I would like to quantify electron density inside of positive
    Fo-Fc blobs in active sites of multiple protomers in the map and
    compare them. I am aware that I can interpolate maps and obtain
    density values at coordinate points using either MapMan, Chimera
    or Coot, but I would like to know if there is a tool that would
    let me designate a sphere of a certain volume and calculate total
    electron density inside it?

    Best Regards,
    Neno

    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>



    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to