Re: [ccp4bb] Cross-validation when test set is miniscule
Dear Derek, I suggest you try 10% for the test set. You should still be able to judge the effect of various restraints (or constraints) as long as you keep the same test set. If you switch test sets, and re-refine, Rfree might change as much as 2% for a test set consisting of 200 reflections - see Fig. 6 in ref. (A. T. Brunger, Free R value: Cross-validation in crystallography, Methods in Enzym. 277, 366-396, 1997). However, using the same test set may allow you to judge the best restraints protocol or weights. Axel PS: The Methods in Enzym. review also briefly discusses complete cross-validation. PPS: For refinement at very low resolution, see also: A.T.Brunger, P.D.Adams, P.Fromme, R.Fromme, M.Levitt, G.F. Schroder. Improving the accuracy of macromolecular structure refinement at 7 A resolution. Structure 20, 957-966 (2012). On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system lists...@jiscmail.ac.uk wrote: Date:Fri, 19 Dec 2014 11:18:37 + From:Derek Logan derek.lo...@biochemistry.lu.se Subject: Cross-validation when test set is miniscule Hi everyone, Right now we have one of those very difficult Rfree situations where it's impossible to generate a single meaningful Rfree set. Since we're in a bit of a hurry with this structure it would be good if someone could point me in the right direction. We have crystals with 1542 non-H atoms in the asymmetric unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 reflections in total. 5% of this is only about 100 reflections. Luckily the protein is only a single point mutation of a wild type that has been solved to much better resolution, so we know what it should look like and I simply want to investigate the effect of different levels of conservatism in the refinement, e.g. NCS in xyz and B, group B-factors, reference model, Ramachandran restraints etc. However since the quality criterion for this is Rfree I'm not able to do this. I believe the correct approach is k-fold statistical cross-validation, but can someone remind me of the correct way to do this? I've done a bit of Googling without finding anything very helpful. Thanks Derek Derek Logan tel: +46 46 222 1443 Associate Professor mob: +46 76 8585 707 Dept. of Biochemistry and Structural Biology www.cmps.lu.sehttp://www.cmps.lu.se Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal Lund University, Box 124, 221 00 Lund, Sweden www.saromics.com Axel T. Brunger Investigator, Howard Hughes Medical Institute Professor and Chair, Dept. of Molecular and Cellular Physiology Stanford University Web:http://atbweb.stanford.edu Email: brun...@stanford.edu Phone: +1 650-736-1031
[ccp4bb] Cross-validation when test set is miniscule
Hi everyone, Right now we have one of those very difficult Rfree situations where it's impossible to generate a single meaningful Rfree set. Since we're in a bit of a hurry with this structure it would be good if someone could point me in the right direction. We have crystals with 1542 non-H atoms in the asymmetric unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 reflections in total. 5% of this is only about 100 reflections. Luckily the protein is only a single point mutation of a wild type that has been solved to much better resolution, so we know what it should look like and I simply want to investigate the effect of different levels of conservatism in the refinement, e.g. NCS in xyz and B, group B-factors, reference model, Ramachandran restraints etc. However since the quality criterion for this is Rfree I'm not able to do this. I believe the correct approach is k-fold statistical cross-validation, but can someone remind me of the correct way to do this? I've done a bit of Googling without finding anything very helpful. Thanks Derek Derek Logan tel: +46 46 222 1443 Associate Professor mob: +46 76 8585 707 Dept. of Biochemistry and Structural Biology www.cmps.lu.sehttp://www.cmps.lu.se Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal Lund University, Box 124, 221 00 Lund, Sweden www.saromics.com
[ccp4bb] Cross-validation when test set is miniscule
Dear Derek, I suggest you not not use the cross validation at all. With small data sets the refinement with cross validation is very unstable and the choice of the TEST set dependent. We explained why and suggested to use an alternative function, which can use all data in refinement. Acta Cryst. (2014). D70, 3124-3134 [ doi:10.1107/S1399004714021336 http://dx.doi.org/10.1107/S1399004714021336 ] Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures J. Praznikar http://scripts.iucr.org/cgi-bin/citedin?search_on=nameauthor_name=Praznikar%2C%20J%2E and D. Turk http://scripts.iucr.org/cgi-bin/citedin?search_on=nameauthor_name=Turk%2C%20D%2E Synopsis: The maximum-likelihood free-kick target, which calculates model error estimates from the work set and a randomly displaced model, proved superior in the accuracy and consistency of refinement of crystal structures compared with the maximum-likelihood cross-validation target, which calculates error estimates from the test set and the unperturbed model. Online 22 November 2014 best regards, dusan On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system lists...@jiscmail.ac.uk wrote: Date:Fri, 19 Dec 2014 11:18:37 + From:Derek Logan derek.lo...@biochemistry.lu.se mailto:derek.lo...@biochemistry.lu.se Subject: Cross-validation when test set is miniscule Hi everyone, Right now we have one of those very difficult Rfree situations where it's impossible to generate a single meaningful Rfree set. Since we're in a bit of a hurry with this structure it would be good if someone could point me in the right direction. We have crystals with 1542 non-H atoms in the asymmetric unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 reflections in total. 5% of this is only about 100 reflections. Luckily the protein is only a single point mutation of a wild type that has been solved to much better resolution, so we know what it should look like and I simply want to investigate the effect of different levels of conservatism in the refinement, e.g. NCS in xyz and B, group B-factors, reference model, Ramachandran restraints etc. However since the quality criterion for this is Rfree I'm not able to do this. I believe the correct approach is k-fold statistical cross-validation, but can someone remind me of the correct way to do this? I've done a bit of Googling without finding anything very helpful. Thanks Derek Derek Logan tel: +46 46 222 1443 Associate Professor mob: +46 76 8585 707 Dept. of Biochemistry and Structural Biology www.cmps.lu.se http://www.cmps.lu.se/http://www.cmps.lu.se http://www.cmps.lu.se/ Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal http://www.maxlab.lu.se/crystal Lund University, Box 124, 221 00 Lund, Sweden www.saromics.com http://www.saromics.com/ Dr. Dusan Turk, Prof. Head of Structural Biology Group http://bio.ijs.si/sbl/ http://bio.ijs.si/sbl/ Head of Centre for Protein and Structure Production Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director http://www.cipkebip.org/ Professor of Structural Biology at IPS Jozef Stefan e-mail: dusan.t...@ijs.si phone: +386 1 477 3857 Dept. of Biochem. Mol. Struct. Biol. fax: +386 1 477 3984 Jozef Stefan Institute Jamova 39, 1 000 Ljubljana,Slovenia Skype: dusan.turk (voice over internet: www.skype.com http://www.skype.com/