[ccp4bb] should the final model be refined against full datset

Ed Pozharski Fri, 14 Oct 2011 12:53:06 -0700

This is a follow up (or a digression) to James comparing test set to
missing reflections.  I also heard this issue mentioned before but was
always too lazy to actually pursue it.


So.

The role of the test set is to prevent overfitting.  Let's say I have
the final model and I monitored the Rfree every step of the way and can
conclude that there is no overfitting.  Should I do the final refinement
against complete dataset?

IMCO, I absolutely should.  The test set reflections contain
information, and the "final" model is actually biased towards the
working set.  Refining using all the data can only improve the accuracy
of the model, if only slightly.

The second question is practical.  Let's say I want to deposit the
results of the refinement against the full dataset as my final model.
Should I not report the Rfree and instead insert a remark explaining the
situation?  If I report the Rfree prior to the test set removal, it is
certain that every validation tool will report a mismatch.  It does not
seem that the PDB has a mechanism to deal with this.

Cheers,

Ed.



-- 
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
                                                Julian, King of Lemurs

[ccp4bb] should the final model be refined against full datset

Reply via email to