Re: [ccp4bb] Fo simulators - summary

James Holton Sat, 07 Sep 2013 04:55:33 -0700

I feel like I should point out that there is about a 20% differencebetween "Fcalc" and something I would call a "simulated Fobs". Fcalc issomething that refinement programs compute many times every second asthey apply 100 years worth of brilliant ideas to make your model (Fcalc)match your data (Fobs) as best we know how. Despite all this, one ofthe great mysteries of macromolecular structure determination is justhow awful the "final" match is: R/Rfree in the 20%s or high teens atbest. Small molecule structures don't have this problem. In fact, theyonly recently started depositing "Fobs" in to the CSD because for themost small molecule structures "Fcalc" is more accurate than "Fobs" anyway.

This has been hashed over on this BB a number of times, so I refer theinterested reader to the archives. But there are two majorconsiderations in turning a "pdb file" into a "simulated Fobs":

1) the solvent

SFALL (part of the CCP4 suite) is a convenient tool for turningcoordinates into maps, or structure factors, but it doesn't "do" bulksolvent unless you trick it. I wrote a jiffy for doing this here:

http://bl831.als.lbl.gov/~jamesh/mlfsom/ano_sfall.com

download the script, make it executable, and run it with no arguments tosee instructions for how to use it. What is fascinating about this verycrude bulk solvent implementation I did is that refinement programs withmuch more sophisticated bulk solvent implementations have a heck of atime trying to "match" it. If you want exactly the bulk solvent youwould get from phenix, use phenix.fmodel, but this will not be exactlythe same as the bulk solvent you get from REFMAC. Which one is right?Probably none of them.


2) The R-factor Gap

One can try to simulate the R-factor gap (between Rmeas and Rfree)by adding random numbers to "Fcalc" so that it becomes 20% differentfrom Fobs, but this is hardly a physically reasonable source of error.If you do this enough times for the same PDB file and then "average overdifferent crystals" you'll still end up with a dataset that will refineto R/Rfree ~ 0/0.

This is the fundamental problem with making "simulated Fobs": weactually have no good way of "modelling" whatever is causing thisR-factor Gap, and therefore no good way of simulating it. If we couldsimulate it, then some refinement program would quickly implement a wayto model the effect, and give you R/Rfree of 0% again. There are aboutas many ideas for the cause of the R-factor Gap as there arecrystallographers out there, but to this day nobody has come up with a"systematic error" that, when accounted for in refinement, gives you asmall-molecule-style R/Rfree for pretty much anything in the PDB. Noteven lysozyme.


-James Holton
MAD Scientist


On 9/5/2013 9:35 AM, Alastair Fyfe wrote:

Below are some links to tools for simulating Fobs data:

phenix.fake_f_obs:http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fake_f_obs.pyphenix.fmodel:http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fmodel.py

sftools (calc keyword):  http://www.ccp4.ac.uk/html/sftools.html

diffraction image simulators from James Holton
mlfsom: http://bl831.als.lbl.gov/~jamesh/mlfsom/
nearBragg: http://bl831.als.lbl.gov/~jamesh/nearBragg/
fastBragg: http://bl831.als.lbl.gov/~jamesh/fastBragg/

many thanks for the replies.
Alastair

Re: [ccp4bb] Fo simulators - summary

Reply via email to