Dear Derek,

I suggest you not not use the cross validation at all. With small data sets the 
refinement with cross validation is very unstable and the choice of the TEST 
set dependent. We explained why and suggested to use an alternative function, 
which can use all data in refinement.

Acta Cryst. (2014). D70, 3124-3134  [ doi:10.1107/S1399004714021336 
<http://dx.doi.org/10.1107/S1399004714021336> ]
Free kick instead of cross-validation in maximum-likelihood refinement of 
macromolecular crystal structures

J. Praznikar 
<http://scripts.iucr.org/cgi-bin/citedin?search_on=name&author_name=Praznikar%2C%20J%2E>
 and D. Turk 
<http://scripts.iucr.org/cgi-bin/citedin?search_on=name&author_name=Turk%2C%20D%2E>
Synopsis: The maximum-likelihood free-kick target, which calculates model error 
estimates from the work set and a randomly displaced model, proved superior in 
the accuracy and consistency of refinement of crystal structures compared with 
the maximum-likelihood cross-validation target, which calculates error 
estimates from the test set and the unperturbed model.

Online 22 November 2014    

best regards,
dusan


> On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system 
> <lists...@jiscmail.ac.uk> wrote:
> 
> Date:    Fri, 19 Dec 2014 11:18:37 +0000
> From:    Derek Logan <derek.lo...@biochemistry.lu.se 
> <mailto:derek.lo...@biochemistry.lu.se>>
> Subject: Cross-validation when test set is miniscule
> 
> Hi everyone,
> 
> Right now we have one of those very difficult Rfree situations where it's 
> impossible to generate a single meaningful Rfree set. Since we're in a bit of 
> a hurry with this structure it would be good if someone could point me in the 
> right direction. We have crystals with 1542 non-H atoms in the asymmetric 
> unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 
> reflections in total. 5% of this is only about 100 reflections. Luckily the 
> protein is only a single point mutation of a wild type that has been solved 
> to much better resolution, so we know what it should look like and I simply 
> want to investigate the effect of different levels of conservatism in the 
> refinement, e.g. NCS in xyz and B, group B-factors, reference model, 
> Ramachandran restraints etc. However since the quality criterion for this is 
> Rfree I'm not able to do this.
> 
> I believe the correct approach is k-fold statistical cross-validation, but 
> can someone remind me of the correct way to do this? I've done a bit of 
> Googling without finding anything very helpful.
> 
> Thanks
> Derek
> ________________________________________________________________________
> Derek Logan                                         tel: +46 46 222 1443
> Associate Professor                                 mob: +46 76 8585 707
> Dept. of Biochemistry and Structural Biology              www.cmps.lu.se 
> <http://www.cmps.lu.se/><http://www.cmps.lu.se <http://www.cmps.lu.se/>>
> Centre for Molecular Protein Science            www.maxlab.lu.se/crystal 
> <http://www.maxlab.lu.se/crystal>
> Lund University, Box 124, 221 00 Lund, Sweden           www.saromics.com 
> <http://www.saromics.com/>

Dr. Dusan Turk, Prof.
Head of Structural Biology Group http://bio.ijs.si/sbl/ 
<http://bio.ijs.si/sbl/> 
Head of Centre for Protein  and Structure Production
Centre of excellence for Integrated Approaches in Chemistry and Biology of 
Proteins, Scientific Director
http://www.cipkebip.org/
Professor of Structural Biology at IPS "Jozef Stefan"
e-mail: dusan.t...@ijs.si    
phone: +386 1 477 3857       Dept. of Biochem.& Mol.& Struct. Biol.
fax:   +386 1 477 3984       Jozef Stefan Institute
                            Jamova 39, 1 000 Ljubljana,Slovenia
Skype: dusan.turk (voice over internet: www.skype.com <http://www.skype.com/>










Reply via email to