Hi Ian
(Yes, your technical point about semantics is correct, I meant
over-fitting.)
To pin down your points, though, you're saying:
1) Don't use Rfree, instead look at LLfree or Hamilton Rfree.
2) Compare only the final values at convergence when choosing
between different parametrizations (=models)
Point #1 - fair point; the reason Rfree is popular, though, is because
it is a /relative/ metric, i.e. by now we have a sense of what "good"
is. So I predict an uphill fight for LLfree.
Point #2 would hold if we routinely let our refinements run to
convergence; seems common though to run "10 cycles" or "50 cycles"
instead and draw conclusions from the behaviour of the metrics. Are the
conclusions really much different from the comparison-at-convergence you
advocate? Which is in practice often less convenient.
Cheers
phx
> Isn't this the purpose of cross-validation, to use an independent
measure to judge when the refinement is /not/ producing the "best" model?
If the value of your chosen X-validation metric at convergence
indicates a problem with the model, parameterisation, weighting etc
then clearly the target function is not indeed the final word: the
solution is to fix whatever is wrong and do the refinement again,
until you get a satisfactory value for your metric.
This may be true; but as it is independent of refinement, is it
not nevertheless the only measure I should trust?
No there several possible functions of the test set (e.g. Hamilton
Rfree, LLfree) that you could use, all potentially equally valid
X-validation metrics. I would have more faith in a function such as
LLfree in which the contributions of the reflections are at least
weighted according to their reliability. It just seems bizarre that
important decisions are being based on measurements that may have
gross errors without taking those errors into account.
Or maybe what you intended to say: only trust refinements for
which Rfree decreases monotonically, because only then do you have
a valid choice of parameters.
No, as I indicated above, what Rfree does before convergence is
attained is totally meaningless, only the value obtained _at_
convergence is meaningful as a X-validation statistic. We wouldn't be
having this discussion if the refinement program omitted the
meaningless intermediate values and only printed out the final Rfree
or LLfree. I'm saying that Rfree is not the best X-validation metric
because poorly measured data are not properly weighted: this is what
Acta paper I referenced is saying.
Cheers
-- Ian