I would agree mostly with what Dale said, and point out that it applies as well to the SigmaA estimation that is a necessary part of ML refinement. When we were developing the ML targets that went into CNS, we did a number of tests to see how many cross-validation reflections were needed. The fewest we could get away with, for relatively poor starting models, was about 500-1000. I would only recommend as few as 500 if you have a small cell or low resolution (where you might only have 5000 reflections in total) and can't afford to give up more. If I had a large cell or high resolution, I would probably prefer to take up to 2000 reflections for cross-validation, because the precision of the SigmaA estimates would be improved with little cost. But there's certainly no need to take a defined proportion of the data regardless of the total number.

The precision of the likelihood-based estimates for SigmaA depends not only on the number of reflections but also on the quality of the model. As the model gets better and the true SigmaA values increase, the estimates of SigmaA become more precise. So one could probably afford to reduce the size of the cross-validation set towards the end of refinement.

Which brings me to one of the things Dale said, which is that his tests showed no correlation of the precision of Rfree with the true Rfree. He qualified this by saying that his models only ranged from 35% to 55%; if he had looked at a wider range, I think he would have found a strong correlation. I believe that Ian Tickle showed this in a paper he published a few years ago.

Randy Read

Reply via email to