Re: [ccp4bb] Cross-validation when test set is miniscule

2014-12-20 Thread Axel Brunger
Dear Derek,

I suggest you try 10% for the test set.  You should still be able to judge the 
effect of 
various restraints (or constraints) as long as you keep the same test set.  If 
you switch test sets, and re-refine, Rfree
might change as much as 2% for a test set consisting of 200 reflections - see 
Fig. 6 in  ref. (A. T. Brunger, Free  
R value: Cross-validation in crystallography, Methods in Enzym. 277, 366-396, 
1997). However, using the 
same test set may allow you to judge the best restraints protocol or weights. 

Axel

PS: The Methods in Enzym. review also briefly discusses complete 
cross-validation.

PPS: For refinement at very low resolution, see also:

A.T.Brunger, P.D.Adams, P.Fromme, R.Fromme, M.Levitt, G.F. Schroder. Improving 
the accuracy of 
macromolecular structure refinement at 7 A resolution. Structure 20, 957-966 
(2012).


 On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system 
 lists...@jiscmail.ac.uk wrote:
 
 Date:Fri, 19 Dec 2014 11:18:37 +
 From:Derek Logan derek.lo...@biochemistry.lu.se
 Subject: Cross-validation when test set is miniscule
 
 Hi everyone,
 
 Right now we have one of those very difficult Rfree situations where it's 
 impossible to generate a single meaningful Rfree set. Since we're in a bit of 
 a hurry with this structure it would be good if someone could point me in the 
 right direction. We have crystals with 1542 non-H atoms in the asymmetric 
 unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 
 reflections in total. 5% of this is only about 100 reflections. Luckily the 
 protein is only a single point mutation of a wild type that has been solved 
 to much better resolution, so we know what it should look like and I simply 
 want to investigate the effect of different levels of conservatism in the 
 refinement, e.g. NCS in xyz and B, group B-factors, reference model, 
 Ramachandran restraints etc. However since the quality criterion for this is 
 Rfree I'm not able to do this.
 
 I believe the correct approach is k-fold statistical cross-validation, but 
 can someone remind me of the correct way to do this? I've done a bit of 
 Googling without finding anything very helpful.
 
 Thanks
 Derek
 
 Derek Logan tel: +46 46 222 1443
 Associate Professor mob: +46 76 8585 707
 Dept. of Biochemistry and Structural Biology  
 www.cmps.lu.sehttp://www.cmps.lu.se
 Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal
 Lund University, Box 124, 221 00 Lund, Sweden   www.saromics.com

Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor and Chair, Dept. of Molecular and Cellular Physiology
Stanford University

Web:http://atbweb.stanford.edu
Email:  brun...@stanford.edu  
Phone:  +1 650-736-1031


[ccp4bb] Cross-validation when test set is miniscule

2014-12-19 Thread Derek Logan
Hi everyone,

Right now we have one of those very difficult Rfree situations where it's 
impossible to generate a single meaningful Rfree set. Since we're in a bit of a 
hurry with this structure it would be good if someone could point me in the 
right direction. We have crystals with 1542 non-H atoms in the asymmetric unit 
that diffract to only 3.6 Å in P65, which gives us a whopping 2300 reflections 
in total. 5% of this is only about 100 reflections. Luckily the protein is only 
a single point mutation of a wild type that has been solved to much better 
resolution, so we know what it should look like and I simply want to 
investigate the effect of different levels of conservatism in the refinement, 
e.g. NCS in xyz and B, group B-factors, reference model, Ramachandran 
restraints etc. However since the quality criterion for this is Rfree I'm not 
able to do this.

I believe the correct approach is k-fold statistical cross-validation, but can 
someone remind me of the correct way to do this? I've done a bit of Googling 
without finding anything very helpful.

Thanks
Derek

Derek Logan tel: +46 46 222 1443
Associate Professor mob: +46 76 8585 707
Dept. of Biochemistry and Structural Biology  
www.cmps.lu.sehttp://www.cmps.lu.se
Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal
Lund University, Box 124, 221 00 Lund, Sweden   www.saromics.com









[ccp4bb] Cross-validation when test set is miniscule

2014-12-19 Thread dusan turk
Dear Derek,

I suggest you not not use the cross validation at all. With small data sets the 
refinement with cross validation is very unstable and the choice of the TEST 
set dependent. We explained why and suggested to use an alternative function, 
which can use all data in refinement.

Acta Cryst. (2014). D70, 3124-3134  [ doi:10.1107/S1399004714021336 
http://dx.doi.org/10.1107/S1399004714021336 ]
Free kick instead of cross-validation in maximum-likelihood refinement of 
macromolecular crystal structures

J. Praznikar 
http://scripts.iucr.org/cgi-bin/citedin?search_on=nameauthor_name=Praznikar%2C%20J%2E
 and D. Turk 
http://scripts.iucr.org/cgi-bin/citedin?search_on=nameauthor_name=Turk%2C%20D%2E
Synopsis: The maximum-likelihood free-kick target, which calculates model error 
estimates from the work set and a randomly displaced model, proved superior in 
the accuracy and consistency of refinement of crystal structures compared with 
the maximum-likelihood cross-validation target, which calculates error 
estimates from the test set and the unperturbed model.

Online 22 November 2014

best regards,
dusan


 On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system 
 lists...@jiscmail.ac.uk wrote:
 
 Date:Fri, 19 Dec 2014 11:18:37 +
 From:Derek Logan derek.lo...@biochemistry.lu.se 
 mailto:derek.lo...@biochemistry.lu.se
 Subject: Cross-validation when test set is miniscule
 
 Hi everyone,
 
 Right now we have one of those very difficult Rfree situations where it's 
 impossible to generate a single meaningful Rfree set. Since we're in a bit of 
 a hurry with this structure it would be good if someone could point me in the 
 right direction. We have crystals with 1542 non-H atoms in the asymmetric 
 unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 
 reflections in total. 5% of this is only about 100 reflections. Luckily the 
 protein is only a single point mutation of a wild type that has been solved 
 to much better resolution, so we know what it should look like and I simply 
 want to investigate the effect of different levels of conservatism in the 
 refinement, e.g. NCS in xyz and B, group B-factors, reference model, 
 Ramachandran restraints etc. However since the quality criterion for this is 
 Rfree I'm not able to do this.
 
 I believe the correct approach is k-fold statistical cross-validation, but 
 can someone remind me of the correct way to do this? I've done a bit of 
 Googling without finding anything very helpful.
 
 Thanks
 Derek
 
 Derek Logan tel: +46 46 222 1443
 Associate Professor mob: +46 76 8585 707
 Dept. of Biochemistry and Structural Biology  www.cmps.lu.se 
 http://www.cmps.lu.se/http://www.cmps.lu.se http://www.cmps.lu.se/
 Centre for Molecular Protein Sciencewww.maxlab.lu.se/crystal 
 http://www.maxlab.lu.se/crystal
 Lund University, Box 124, 221 00 Lund, Sweden   www.saromics.com 
 http://www.saromics.com/

Dr. Dusan Turk, Prof.
Head of Structural Biology Group http://bio.ijs.si/sbl/ 
http://bio.ijs.si/sbl/ 
Head of Centre for Protein  and Structure Production
Centre of excellence for Integrated Approaches in Chemistry and Biology of 
Proteins, Scientific Director
http://www.cipkebip.org/
Professor of Structural Biology at IPS Jozef Stefan
e-mail: dusan.t...@ijs.si
phone: +386 1 477 3857   Dept. of Biochem. Mol. Struct. Biol.
fax:   +386 1 477 3984   Jozef Stefan Institute
Jamova 39, 1 000 Ljubljana,Slovenia
Skype: dusan.turk (voice over internet: www.skype.com http://www.skype.com/