Sounds like what you are trying to do is similar to an old jiffy script
of mine:
http://bl831.als.lbl.gov/~jamesh/pickup/local_corr.com
My purpose was to compute the correlation coefficient (CC) for a bunch
of different rotamers of a side chain, but I think the script will work
for you "as is". What I learned is that you want to make a set of PDB
files that contain only the "variable" atoms in the structure (in your
case, just the ligands, no protein). Otherwise, the "signal" you are
trying to measure is swamped by all the other atoms in the structure.
Then you want to "select" map grid points that are near ANY of the atoms
in you set of PDB files and score all your PDBs against that SAME map.
If you don't do this, you will always find that bigger stuff matches
better because bigger stuff simply intersects more density.
I also found it is much better to use the CC of the Laplacian of the
electron density maps, rather than the CC of the raw electron density
itself. By "better" I mean that a "wrong" rotamer that happens to stick
itself into the middle of a nearby helix or heavy metal will "correlate"
very well, and at poor resolution it will actually correlate better than
the "right" rotamer. This is because the CC (and the R factor)
essentially "score" the overall density overlap, whereas the Laplacian
seems to "score" how connected the density is. The Laplacian filter
does have the unfortunate effect of amplifying the noise of high-angle
spots, so applying a B factor after the Laplacian can make things behave
better. Exactly which smoothing B factor is optimal is something I have
yet to figure out.
I think the reason comparing Laplacian-ized maps works better is
because when we mortals look at maps, what we are looking at is an edge
detection (we contour the map) next to another kind of edge detection
(bonds between atoms). I'm told that comparing Laplacians instead of
direct pixels this is a fairly standard methodology in machine vision,
but I don't have a reference for that.
As for the real-space R factor, I have always found this to be highly
sensitive to the scale and offset of the maps, whereas the correlation
coefficient is completely insensitive to scale factors. Since I can't
think of anything that the real-space R would tell me that the CC
wouldn't, I have always used the latter.
Oh, and if you are getting zero or negative CC for perfectly good
models, you might want to check and be sure that SFALL is doing the map
calculation properly. A while ago I noticed that if I were missing the
CRYST1 line in the PDB file, then SFALL would happily give me a random
map, even if I gave it the cell and SG in the input cards! This was
probably fixed in the latest release, but I have not checked...
-James Holton
MAD Scientist
On 10/4/2011 1:55 PM, Brigitte Ziervogel wrote:
Hi,
I am using the program Overlapmap to calculate real-space R-factors and
correlation coefficients in order to find ligand conformations that fit best
within the density.
I'm confused by the Overlapmap output, which includes "Fobs" and "Fcalc" values
that are used to calculate the R-factors and corr coeff. However, I'm not sure what these F values
are as they should not be structure factors since the program seems to only deal with maps.
Additionally, in many cases the Fobs and Fcalc values are either 0 or negative values, even for
protein residues that are well-defined in the density.
Has anyone used this program before or have an idea of what could be going on
here?
I have been supplying the program with a refmac mtz file with ligand unmodeled
as map 1 and a pdb file with both protein and ligand coordinates to calculate
the map 2.
Any suggestions or ideas of better ways to score ligand fits are appreciated,
thanks.
Brigitte