Thank you Pavel, responses inline below. There are indeed a lot of
possible approaches, and my experience is far from complete, but I will
share my impressions.
On 8/15/2022 8:15 PM, Pavel Afonine wrote:
Hi James,
I like your approach with dummy atoms and occupancy refinement.
Dealing with actual maps sounds like hell to me indeed (especially
given that we deal with weighted Fourier maps!).
I agree that dealing with maps directly can be tricky, but I have found
that CCP4 support for this is extensive, if you know the right keywords!
Yes, the standard maps are weighted, but I've found the weighted maps
are closer to the "true density" than any other kind of map. At least,
in terms of correlation coefficients. I have tried scaling the 2mFo-DFc
map to an Fcalc map (with solvent) to try and recover the true, absolute
scale. Usually the scale factor is not worse than ~0.8 overall. This is
probably because the average value of "m" is close to that? One could
argue that you should scale up all maps by the FOM, but then again isn't
the maximum-likelihood number of electrons reflected in the weighted
map? Such re-scaling is not possible with the mFo-DFc map, of course.
So, yes, another good argument for doing occupancy refinement instead of
integrating density!
Reading this as someone who immediately translates this into a
computer code (in my mind), a few things that aren't clear to me:
- Where exactly inside the blob of density do you place these dummy atoms?
Where? At the peaks. What I usually do is pick peaks, put atoms at the
highest ones, then either refine for a bit or simply subtract a "divot"
of density around each added atom and then look for new peaks. Once the
height of after-refinement positive difference features drops to an
insignificant level I stop. NB that "significance" of any sigma level
depends on the number of voxels in the map, but its usually between 3.5
and 4.5 sigmas.
One can also start with a 3D "grid" of dummy atoms. Gaussians spaced
within 0.8*fwhm of each other form a pretty flat density profile. In
such cases refining just one overall B factor and individual occupancies
with no xyz motion is pretty effective all by itself.
- How many do you place?
Enough for the largest remaining peak to become insignificant. In the
case of occupancy refinement one might also us the final total recovered
occupancy ad a convergence criterion. I find it best to use an
iterative process, but this is also slower. 10-sigma peaks are pretty
obviously not noise, but once you get down into the ~4 sigma level the
likelihood that one of them is a "noise peak" gets higher. Eventually,
it is hard to avoid having at least a few "noise peaks".
This is why I wrote my "watershed" program. This is run after the
initial adding/refining procedure has converged. Watershed procedure
is: take each and every occupancy-refined atoms one at a time, delete
it, and re-refine everything else to see if losing that one atom made
any difference. Using multiple CPUs, you can do all the "minus one"
refinements in parallel. In some cases Rwork will actually improve after
deleting an atom, so in that case you should definitely throw it out.
Once the least-useful dummy atom has been eliminated, you can repeat the
process with the new, improved constellation of atoms. In the end, you
have established that every single atom in your dummy constellation is
"needed". By "needed" I mean without it the Rwork goes up. You may want
to use other criteria, such as the total occupancy drops, or you get a
significant difference peak. In general, this "watershed" procedure
finds at least a few waters in most structures that don't need to be
there. I have found that just looking at Rwork is a pretty good way to
eliminate waters that were built into "noise peaks". So, if I know I'm
going to "watershed" anyway, I can be pretty aggressive with how low I
go in peak height in the water adding protocol.
- Is there a risk of placing them too close to the boundary of the
blob (in which case the question remains: what is the boundary?)?
Placing at only the tallest peak is one way to avoid this. Grid layout
is not. However, one hopes that atoms placed in vacuum and refined in
occupancy will drop to occ=0. Either that or B=500. In practice with
"grid of atoms" I tend to refine occupancy first, no B nor xyz. The
atoms then act like a pinscreen, capturing the density shape rather
well. Adding B factors then lets the slope of the edges of the blob
match better. Doing occ, B and xyz all at the same time on a grid
usually blows up.
And by "blow up", I mean that Rwork and Rfree rapidly rise into the
50%s. I can stabilize it by using heavy damping and lots of cycles, but
this is very slow. I think this blowup happens because the default step
size in refinement programs is too large for this "crowded atom" situation.
- I guess O or C as an atom type should do it, but what about B factor
(would you refine B as well?)?
I find oxygen will do in peak-picking cases, but for grid atoms under
very smooth density and closely-spaced atoms I have found the individual
occupancies get rather small, the round-off error of occupancy from 0.01
to 0.00 creates a granularity of ~0.1 electron, and this can start
creating artificial noise in the fit. For cases like this I have gone
to "liquid helium", or modelling dummy atoms as He. Yes, you might think
that hydrogen would be better, but H atoms have so many special checks
and whiz-bangs for how they are treated I eventually gave up and went to
He. (One could also argue that at low enough temperature He atoms are
allowed to "overlap" anyway. ;) )
Formally, it shouldn't matter (much) what the B factors refine to, since
B does not affect the number of electrons in the atom. However, B
factors most certainly do affect peak heights, and the "tails" can start
to be lost when the B factor gets big. This is because the program that
generates the calculated map used to make Fcalc only plots each atom out
to a certain radius. I'm not sure what it is in phenix nor refmac, but
in CCP4's sfall it is 4.5 A. Sfall doesn't support B>99 because atoms
with higher B factors start to lose significant density beyond that
radius. For B=500 you need a radius of at least 10 A. You'd think that
the practice of deleting B=500 atoms or slowly lowering their
occupancies would be a good strategy, but oddly enough I have not found
that to be the case in practice. Sometimes a big, flat atom is what the
density wants. I have had some limited success with breaking up high-B
atoms into a 2x2x2 grid of new atoms with lower B factors, and that
seems to work fairly well. Exactly what threshold to use is still a good
question. SHELX, I believe, has a mechanism for splitting highly
anisotropic B factors into two atoms, so there is precedence.
- if you refine B, how do you deconvolute occupancy from refined B
values (and eventually from effects of positional errors of your DAs)?
For the grid-of-atoms approach I've found a fixed overall B factor and
xyz positions at first is a good place to start. I then add B
refinement, then xyz. For a small blob this can be stable, but if you
try to do this for the whole bulk solvent region, for example, it blows up.
-....
- How all these choices are going to affect the result?
I always recommend doing multi-start refinement, trying a variety of
strategies to see how much the result jumps around. One might even want
to compare integrated density to occupancy-refined recovered electron
count? There is the old adage that "a person with two thermocouples
never knows what temperature it is", but personally, I'd rather know the
error bars.
And the overall accuracy? I doubt it will ever be less than Rfree, which
for macromolecules is still in the 20%s. For now....
Cheers!
-James Holton
MAD Scientist
On Mon, Aug 15, 2022 at 4:38 PM James Holton <jmhol...@lbl.gov> wrote:
There are several programs for integrating electron density, but
please let me assure you that it is almost always the wrong thing
to do.
A much better strategy is occupancy refinement. Throw in dummy
atoms, turn off non-bonded interactions to them, and refine their
occupancy until it a) stops changing (may be more than one round),
and b) there are no Fo-Fc differences left in the region of
interest. Then all you do is add up the occupancies, multiply by
the relevant atomic number (usually 8), and voila! you get the
best-fit number of electrons in your blob. You may want to try
re-running with random starting points to get an idea of the error
bars.
What is wrong with integrating density? Well, for one, it is hard
to know where to set the boundaries. Integrated density can be
VERY sensitive to the choice of radius, making your arbitrary
decision of which radius to use a source of error. Too small and
you miss stuff. Too big and you add unnecessary noise. Also,
neighboring atoms have tails, and if you don't subtract them
properly, that is another source of error. Also, because of the
missing F000 term, there is an offset, which adds a term
proportional to the integration volume. For example, an integral
resulting in zero "electrons" does NOT mean you have vacuum. It
just means that the area you integrated has the same average
density as the entire map. This may not be the number you want.
The beauty of occupancy refinement is that it automatically
handles all these problems. The "vacuum level" and F000 are known
quantities in the calculated map. The B factors given to the dummy
atoms als o allow the borders of your integration region to be
"soft": down-weighting the contribution of map noise far from your
region of interest. And, finally, by including atoms in the green
density, neighboring atoms won't be sucked into it.
Think of it as fitting a smooth curve to noisy data and the number
of electrons is just a parameter in that fit, rather than trying
to integrate the noisy data itself. This is not an analogy.
Refinement programs are really just very sophisticated
curve-fitting programs. And if you have a forest of overlapping
peaks and you are trying to de-convolute the area/volume of one
peak, it is best to include that peak in the fit, rather than
leave it out. Shoulder peaks especially tend to get "eaten" by
large neighboring peaks.
How do you turn off non-bonds? Well, there is documentation for
refmac:
http://www.ysbl.york.ac.uk/refmac/data/refmac_keywords.html
and phenix:
https://phenix-online.org/documentation/reference/refinement.html
All that said, to answer the original question:
One very easy thing to do within the CCP4 suite is to use
"mapmask" to make a mask corresponding to your "sphere", or other
region of interest. Perhaps place a water at the center of your
peak, and either use the "border" feature of mapmask, or use
"sfall" to compute a calculated map and convert that to a mask
using "threshold" in mapmask. This mask should have values of 0
or 1 at every voxel. (or, if you feel like being clever, something
between 0 and 1 to reflect how much you want to weight a given
voxel). You can check it in coot. If you then multiply this mask
by your mFo-DFc map the result will have a non-zero average value.
This will be printed out in the log file. Multiply this average
value by the volume of the unit cell and you have your integrated
number of electrons. Yes, its that simple.
One issue you may have is map parameter compatibility (grid
spacing, axis order, xyz limits, etc.). You get around these by
using the same grid in all your fft or sfall runs, and then use
mapmask to make the axis and limits match before you multiply the
map and mask. The only other issue here might be the average
value being a very small number and rounded off by the default
print precision. You can fix this by multiplying the map by a
large constant (again, using mapmask), then the printed value will
have lots of digits.
This may seem complicated, but the use of masks can be a very
valuable skill to develop. In fact, one way to simplify,
stabilize and accelerate the occupancy refinement described above
is to use a mask to isolate the region of interest. That is, take
the mFo-DFc map, zero out everything far away from your peak, and
convert the result to structure factors. You can then call these
structure factors "Fobs" (alongside the original sigma(Fobs)) in a
new refinement. The Rwork/Rfree then becomes a local statistic,
indicative of the % error in your refined total occupancy. One
caveat is that if every atom in the new refinement is having its
occupancy refined you will lose the absolute scale. To fix this,
you need to add back at least one well-ordered atom into "Fobs",
and also include it in the model. For example, take a
well-ordered helix, extract those atoms, calculate a map using
"sfall", and add it to the masked-off difference map before doing
the "new Fobs" structure factor calculation. Include these same
atoms in the new refinement. They won't move, but they will keep
the scale stable. Oh, and don't forget to turn off the bulk
solvent correction! The bulk solvent has already been subtracted
in the mFo-DFc map.
Hope this all makes sense and feel free to ask follow-up questions,
-James Holton
MAD Scientist
On 8/10/2022 9:59 AM, Neno Vuksanovic wrote:
Dear All,
I would like to quantify electron density inside of positive
Fo-Fc blobs in active sites of multiple protomers in the map and
compare them. I am aware that I can interpolate maps and obtain
density values at coordinate points using either MapMan, Chimera
or Coot, but I would like to know if there is a tool that would
let me designate a sphere of a certain volume and calculate total
electron density inside it?
Best Regards,
Neno
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/