For those who have strong opinions on what data should be deposited...

The IUCR is just starting a serious discussion of this subject. Two
committees, the "Data Deposition Working Group", led by John Helliwell,
and the Commission on Biological Macromolecules (chaired by Xiao-Dong Su)
are working on this.

Two key issues are (1) feasibility and importance of deposition of raw
images and (2) deposition of sufficient information to fully reproduce the
crystallographic analysis.

I am on both committees and would be happy to hear your ideas (off-list). 
I am sure the other members of the committees would welcome your thoughts
as well.

-Tom T

Tom Terwilliger
terwilli...@lanl.gov


>> This is a follow up (or a digression) to James comparing test set to
>> missing reflections.  I also heard this issue mentioned before but was
>> always too lazy to actually pursue it.
>>
>> So.
>>
>> The role of the test set is to prevent overfitting.  Let's say I have
>> the final model and I monitored the Rfree every step of the way and can
>> conclude that there is no overfitting.  Should I do the final refinement
>> against complete dataset?
>>
>> IMCO, I absolutely should.  The test set reflections contain
>> information, and the "final" model is actually biased towards the
>> working set.  Refining using all the data can only improve the accuracy
>> of the model, if only slightly.
>>
>> The second question is practical.  Let's say I want to deposit the
>> results of the refinement against the full dataset as my final model.
>> Should I not report the Rfree and instead insert a remark explaining the
>> situation?  If I report the Rfree prior to the test set removal, it is
>> certain that every validation tool will report a mismatch.  It does not
>> seem that the PDB has a mechanism to deal with this.
>>
>> Cheers,
>>
>> Ed.
>>
>>
>>
>> --
>> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>>                                                 Julian, King of Lemurs
>>

Reply via email to