Dear Ethan, List,

> Surely someone must have done this!  But I can't recall ever reading
> an analysis of such a refinement protocol.  
> Does anyone know of relevant reports in the literature?

Total statistical cross validation is indeed what we should be doing, but 
for large structures the computational cost may be significant. In the 
absence of total statistical cross validation the reported Rfree may be an 
'outlier' (with respect to the distribution of the Rfree values that would 
have been obtained from all disjoined sets). To tackle this, we usually 
resort to the following ad hoc procedure :

 At an early stage of the positional refinement, we use a shell script 
which (a) uses Phil's PDBSET with the NOISE keyword to randomly shift 
atomic positions, (b) refine the resulting models with each of the 
different free sets to completion, (c) Calculate the mean of the resulting 
free R values, (d) Select (once and for all) the free set which is closer 
to the mean of the Rfree values obtained above.

For structures with a small number of reflections, the statistical noise 
in the 5% sets can be very significant indeed. We have seen differences 
between Rfree values obtained from different sets reaching up to 4%. 

Ideally, and instead of PDBSET+REFMAC we should have been using simulated 
annealing (without positional refinement), but moving continuously between 
the CNS-XPLOR and CCP4 was too much for my laziness.

All the best,
Nicholas


-- 


          Dr Nicholas M. Glykos, Department of Molecular Biology
     and Genetics, Democritus University of Thrace, University Campus,
  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

Reply via email to