> Selecting a test set that minimizes Rfree is so wrong on so many levels. > Unless, of course, the only thing I know about Rfree is that it is the > magic number that I need to make small by all means necessary.
By using a simple genetic algorithm, I managed to get Rfree for a well-refined model as low as 14.6% and as high as 19.1%. The dataset is not too small (~40,000 reflection in all with the standard sized 5% test set). So you can get spread as wide as 4.5% even with not-so-small dataset. Only ~1/3 of test reflections are exchanged to achieve this. What's curious is that, contrary to my expectations, the test set remains well distributed throughout resolution shells upon this awful "optimization" and the <F/sigF> for the working set and test set remain close. Not sure how to judge which model is actually better, but it's noteworthy that the FOM gets worse for *both* upward and downward "optimization" of the test set. -- After much deep and profound brain things inside my head, I have decided to thank you for bringing peace to our home. Julian, King of Lemurs