Hi Pavel,

 
>       I agree that you bias R-free after the real-space refinement
> 
> 
> well, ok, isn't it enough to realize that this is bad and should be avoided ? 
> (I
> guess we all know we should never bias Rfree!)
> 
> 
>       My point was that we normally do not calculate R-free after real-
> space refinement,
> 
> 
> It's not about whether you compute something or not. It's about whether
> you expose free-r reflections to refinement and to current model, given  that
> they should never ever see any refinement or model or optimization under
> any circumstances. Otherwise they immediately become non-free.
Yet we do optimise weights against R-free, both in phenix.refine (from the 
manual: "phenix.refine uses automatic procedure to determine the weights 
between X-ray target and stereochemistry or ADP restraints. To optimize these 
weights (that is to find those resulting in lowest Rfree factors)") and in 
PDB_REDO (well against LL-free, but that is the same test set). And we do not 
go the distance to use something like an R-sleep set to keep a set of 
reflections really independent. Here we are introducing bias as well, but we 
seem to have accepted that this is not a big problem.  Or we chose to ignore it.

 
>       but after reciprocal space refinement. Here the bias is removed
> again.
> 
> 
> Hm.. How you know this? I guess it requires a great deal of effort to prove
> this!.
We use this whenever we are forced to use a new test set. E.g. when doing 
molecular replacement into an isomorphous cell without access to the original 
test set (this has been discussed on the BB a few times) or when doing k-fold 
cross validation. Whenever a new test set in chosen, enough model tuning (i.e. 
refinement) is needed to make the model independent of the test set (you may 
need to perturb the model somewhat by resetting the B-factors or by giving the 
coordinates a nudge). At refinement convergence the maximum amount of tuning is 
attained. And at true convergence, the starting point doesn't matter (although 
local minima complicate matters).  

With k-fold cross validation we can actually show that there is no obvious 
bias: Say we build and refine a model with test set 0 left out, then any 
further refinement with k different test sets left out should start out with 
R-free substantially biased for any set i!=0, that is R-free(i) < R-free(0). 
This is indeed what we see. 
Now if the refinements converge and there still is bias then we would still get 
R-free(i) < R-free(0) for all sets i!=0. This is not what we see (at least not 
for most PDB_REDO test cases). Of course, these are examples of the worst sort 
of test set bias. The bias introduced in real-space refinement is more subtle. 
At least that is what I assume, but this is something you said you could 
quantify. A proper estimate of the problem (rather than a principle we don't 
quite adhere to all the time anyway), would really help.

Cheers,
Robbie  

> 
> 
>       In practical terms we have to choose the best option:
>       1) Refine against maps with missing reflections and the possibility of
> artefacts that lead to suboptimal results. This keeps R-free unbiased, even
> directly after real-space refinement.
>       2) Refine against maps with test reflections set to Fcalc. This biases
> against the current model, but should have less artefacts. This too keeps R-
> free unbiased directly after real-space refinement.
>       3) Refine against maps with all reflections. This should give the best
> fitting results, but does introduce bias to the test set. However, this bias
> 
> 
> These are all valid points. However they do not advocate for biasing Rfree.
> Yuo can always fill in missing reflections with something else (different from
> your genuine free-r set of reelections) and thus address two issues at once :
> eliminate missing terms and not use free-r reflections for this!
> 
> Pavel

Reply via email to