On Thu, Jan 28, 2010 at 1:13 PM, Paul Emsley <paul.ems...@bioch.ox.ac.uk> wrote:
> WinCoot: > > http://www.ysbl.york.ac.uk/~emsley/software/binaries/stable/ > > (actually, now that I think about it, I am not sure that this feature is in > the WinCoot version :-) > > Extensions -> Refine... -> Autoweight refinement > > Paul. You're right, it doesn't seem to function in WinCoot, downloading Centos 4 version right now! I found the auto-weight function in 'scheme/coot-utils.scm': I see it depends on 'chi-squares', but searching for this finds only: xserver11:~/coot-0.6.1-1189> find * -type f | xargs grep chi-squares greg-tests/01-pdb+mtz.scm: (chi-squares (map (lambda (x) (list-ref x 2)) nnb-list)) greg-tests/01-pdb+mtz.scm: (n (length chi-squares)) greg-tests/01-pdb+mtz.scm: (sum (apply + chi-squares))) scheme/coot-utils.scm: (chi-squares (map (lambda (x) (list-ref x 2)) nnb-list)) scheme/coot-utils.scm: (n (length chi-squares)) scheme/coot-utils.scm: (sum (apply + chi-squares))) i.e. not the source for the 'chi-squares' function: is it hidden somewhere or am I not searching for it right? I'm puzzled how it can calculate chi-squared at all since this will depend on (1) the SD of the Engh & Huber library values, (2) the SD of the calculated values (which can only be obtained by doing a full-matrix refinement in Shel-X), and most importantly (3) the correlation coefficient between these. Since the library and calculated values will be highly correlated except at ultra-high resolution (i.e. particularly at resolutions lower than ~ 2.3 the calculated values will be determined almost completely by the library values since 2.3 or lower data tells you almost nothing about individual bond lengths & angles), then any estimate of chi-squared which ignores the correlation is likely to be in error by at least a factor of 4, i.e. the correct target value for an improperly calculated chi-squared is likely to be ~ 0.25, not 1.0. This value is that which is obtained consistently as the average of all refinements in the PDB (even including the incorrectly weighted ones!). Unfortunately there's no way of estimating the correlation coefficient (at least no way that I can think of!), so AFAICS the only workable method is to use data-mining of the PDB to come up with an average chi-squared. Robbie Joosten & I have come up with a more accurate estimate of chi-squared (or to be more precise its sqrt, aka the RMSZ), based just on his recent PDB-REDO refinements that do correct weighting by maximising the free log-likelihood. The results exhibit significant resolution-dependence (as you would expect it to). This is the same result I submitted to the VTF a while back, but of course most COOTBB subscribers will not have seen these results. Even a blanket resolution-independent value of 0.25 would be a huge improvement on 1.0! Cheers -- Ian