Hi Ed,
 

> This is a follow up (or a digression) to James comparing test set to
> missing reflections. I also heard this issue mentioned before but was
> always too lazy to actually pursue it.
> 
> So.
> 
> The role of the test set is to prevent overfitting. Let's say I have
> the final model and I monitored the Rfree every step of the way and can
> conclude that there is no overfitting. Should I do the final refinement
> against complete dataset?
> 
> IMCO, I absolutely should. The test set reflections contain
> information, and the "final" model is actually biased towards the
> working set. Refining using all the data can only improve the accuracy
> of the model, if only slightly.
Hmm, if your R-free set is small the added value will also be small. If it is 
relatively big, then your previously established optimal weights may no longer 
be optimal. A more elegant thing to would be refine the model with, say, 20 
different 5% R-free sets, deposit the ensemble and report the average R(-free) 
plus a standard deviation. AFAIK, this is what the R-free set numbers that 
CCP4's FREERFLAG generates are for. Of course, in that case you should do 
enough refinement (and perhaps rebuilding) to make sure each R-free set is 
free. 

> The second question is practical. Let's say I want to deposit the
> results of the refinement against the full dataset as my final model.
> Should I not report the Rfree and instead insert a remark explaining the
> situation? If I report the Rfree prior to the test set removal, it is
> certain that every validation tool will report a mismatch. It does not
> seem that the PDB has a mechanism to deal with this.
The deposited R-free sets in the PDB are quite frequently 'unfree' or the wrong 
set was deposited (checking this is one of the recommendations in the VTF 
report in Structure). So at the moment you would probably get away with 
depositing an unfree R-free set ;)
 
Cheers,
Robbie
 
 
> 
> Cheers,
> 
> Ed.
> 
> 
> 
> -- 
> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
> Julian, King of Lemurs
                                          

Reply via email to