Re: [ccp4bb] Fo simulators - summary

Jrh Mon, 09 Sep 2013 23:10:25 -0700

I prepared this on Sunday, here it is now:-

Well, I was 'shaken but not stirred' to see a program 'fake_Fobs'. However 
James' posting on the Rfactor gap in MX is a more respectable, Sunday morning, 
topic. I tried to find the previous threads on this via google and couldn't. So 
apologies to all for the danger of a rehash.
James and I did talk about this in Madrid. 
So my two pennies worth:-
We have errors in the Fo part and in the Fc part. Given the current typical 
size of the gap the errors in the Fo part tend not to show up, that would need 
the gap to be say <10%. In general we are not there yet. There are also cases 
of very weak data sets though and where Fo errors are much larger than the 
norm. But in the majority of cases we have to focus on the errors in Fc ie the 
inadequacy of the model. Solvent is a key suspect:- bulk, ordered and semi 
ordered. But now the sense of mystery develops since the Bragg spots are from 
the ordered portion and the ordered and semi ordered solvent is a quite small 
fraction of the ordered portion. (for neutron work though the fraction of 
scattering is larger due to strong deuterium scattering effects). So what else? 
Another key suspect is the general absence of high resolution reflections, 
typically ie vs small molecule work, which means the ordered atoms are simply 
not that well placed as they could be if they were in a hard crystal ( solvent 
free). So given our two key suspects what happens when we have low solvent 
content and we have atomic resolution data? In both cases the Rfactor gap goes 
down. In an occam's razor sense, this is encouraging that we do know what is 
going on.


Greetings,
John

Prof John R Helliwell DSc



On 7 Sep 2013, at 12:54, James Holton <jmhol...@lbl.gov> wrote:

> I feel like I should point out that there is about a 20% difference between 
> "Fcalc" and something I would call a "simulated Fobs".  Fcalc is something 
> that refinement programs compute many times every second as they apply 100 
> years worth of brilliant ideas to make your model (Fcalc) match your data 
> (Fobs) as best we know how.  Despite all this, one of the great mysteries of 
> macromolecular structure determination is just how awful the "final" match 
> is: R/Rfree in the 20%s or high teens at best. Small molecule structures 
> don't have this problem.  In fact, they only recently started depositing 
> "Fobs" in to the CSD because for the most small molecule structures "Fcalc" 
> is more accurate than "Fobs" anyway.
> 
> This has been hashed over on this BB a number of times, so I refer the 
> interested reader to the archives.  But there are two major considerations in 
> turning a "pdb file" into a "simulated Fobs":
> 1) the solvent
>   SFALL (part of the CCP4 suite) is a convenient tool for turning coordinates 
> into maps, or structure factors, but it doesn't "do" bulk solvent unless you 
> trick it.  I wrote a jiffy for doing this here:
> http://bl831.als.lbl.gov/~jamesh/mlfsom/ano_sfall.com
> download the script, make it executable, and run it with no arguments to see 
> instructions for how to use it.  What is fascinating about this very crude 
> bulk solvent implementation I did is that refinement programs with much more 
> sophisticated bulk solvent implementations have a heck of a time trying to 
> "match" it.  If you want exactly the bulk solvent you would get from phenix, 
> use phenix.fmodel, but this will not be exactly the same as the bulk solvent 
> you get from REFMAC.  Which one is right? Probably none of them.
> 
> 2) The R-factor Gap
>  One can try to simulate the R-factor gap (between Rmeas and Rfree) by adding 
> random numbers to "Fcalc" so that it becomes 20% different from Fobs, but 
> this is hardly a physically reasonable source of error.  If you do this 
> enough times for the same PDB file and then "average over different crystals" 
> you'll still end up with a dataset that will refine to R/Rfree ~ 0/0.
> 
> This is the fundamental problem with making "simulated Fobs": we actually 
> have no good way of "modelling" whatever is causing this R-factor Gap, and 
> therefore no good way of simulating it.  If we could simulate it, then some 
> refinement program would quickly implement a way to model the effect, and 
> give you R/Rfree of 0% again.  There are about as many ideas for the cause of 
> the R-factor Gap as there are crystallographers out there, but to this day 
> nobody has come up with a "systematic error" that, when accounted for in 
> refinement, gives you a small-molecule-style R/Rfree for pretty much anything 
> in the PDB.  Not even lysozyme.
> 
> -James Holton
> MAD Scientist
> 
> 
> On 9/5/2013 9:35 AM, Alastair Fyfe wrote:
>> Below are some links to tools for simulating Fobs data:
>> 
>> phenix.fake_f_obs: 
>> http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fake_f_obs.py
>> phenix.fmodel: http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fmodel.py
>> sftools (calc keyword):  http://www.ccp4.ac.uk/html/sftools.html
>> 
>> diffraction image simulators from James Holton
>> mlfsom: http://bl831.als.lbl.gov/~jamesh/mlfsom/
>> nearBragg: http://bl831.als.lbl.gov/~jamesh/nearBragg/
>> fastBragg: http://bl831.als.lbl.gov/~jamesh/fastBragg/
>> 
>> many thanks for the replies.
>> Alastair

Re: [ccp4bb] Fo simulators - summary

Reply via email to