Re: Cross-validation of ROC

Koen Vermeer Mon, 21 Jul 2003 05:33:46 -0700

> This is exactly the same as your March 19 post to which I replied on
> March 22. Did you see my reply? In it I tried to clarify the ROC concept
> in order to rectify what appeared to be your lack of understanding.


You are right. My newsreader decided to post two old messages again. I
don't know what happened...

> Please read my reply in Google Groups and respond if you disagree or
> don't understand.

Where we disagree is that you are saying a ROC is created by changing a
single parameter, while I don't think that's necessarily true. In fact,
what happens if that one parameter is coded? Suppose x is an integer, and
y is a fraction. If I write z=x+y, and I create a ROC by varying z, I am
actually doing the same as when I would change both x and y. Another
example is when the parameter is actually an index to a list of
permutations, as to select a subset of features that is to be used by the
algorithm.

Ofcourse, to see the influence of a parameter on the performance of the
algorithm, you would change just one parameter (and preferably your
algorithm is such that the results do not change 'chaotically'). However,
what I'm interested in (and this is the way ROCs are used often in medical
literature) is how good my algorithm is in terms of sensitivity and
specificity. The thing is, I don't know the prevalence or the costs of
false negatives and false positives, and therefore, I do not know the best
sensitivity/specificity tradeoff.

>> I am having conceptual problems with cross-validating an ROC.
> Me too. I have never heard of such a concept.

You surely did, between March 19 and 22 :-) (Sorry, I couldn't resist.)

>> The thing is
>> that for me the only reason to draw an ROC is to show the individual
>> fpr/tpr pairs, so one can choose the optimal setting for a specific
>> application (depending on prevalence, cost of FP/FN, etc). So, in fact,
>> the ROC just shows various algorithms, and you choose one that suits
>> you best.
> No.
[...]
> The result of changing a single parameter setting should *not* be
> referred to as "another algorithm".

If you restrict the parameter to be a cut-off, you have a point. If the
parameter can have a broader meaning (such as the previously mentioned
index), I don't agree.

>> The thing with validation is that it is supposed to be done on the
>> final algorithm, not on some intermediate result.
> According to commonly accepted neural network terminology (see the
> comp.ai.neural-net FAQ), validation is performed on multiple
> intermediate designs in order to pick the "best" one. Testing is then
> performed on the chosen algorithm to quantify its generalization
> performance on independent test data that was not used for design or
> validation.

I'm sorry; according to the FAQ I'm not the only one to confuse validation
and test sets.

>> More detailed:
>> Consider algorithm A. It tests a number of algorithms (1..N) and
>> chooses the best one (say number i). Even if algorithm A uses
>> cross-validation to train and test all N algorithms, we cannot say that
>> the error rate of algorithm A is the same as the estimated error rate
>> of algorithm i.
> 
> What you have done is to introduce a novel concept: Estimating the error
> of an algorithm (A) that designs and validates other algorithms (1..N).
> I don't know if you meant to do this or you have misinterpreted what is
> done conventionally. If you meant to do this then you are in uncharted
> land. If A chooses i based on the validation set performance (say 25%)
> and testing shows that i also has the best test set performance (say
> 30%), then the error rate of A is 0.
> This does not appear to be a very useful concept, and is probably not
> what you want.

Yes and no. The concept is indeed estimating the error of the algorithm A,
but what you are describing sounds more like the 'additional' error
introduced by A. The algorithm I was trying to describe is more like a
wrapper in that it contains all N algorithms. Therefore, if I'm talking
about the performance of algorithm A, I'm talking about the performance of
the algorithm i it chooses. Now, this algorithm may not be stable in the
sense that it might not select the same i each time you feed it the same
dataset. Therefore, it makes sense to determine the performance of A,
instead of the performance of i.

Before going deeper into this: Would you say that the error as indicated
by the ROC is the generalization error you would actually expect to get
when you select the corresponding cut-off value?

I'd say no: Selecting a cut-off value is in my eyes the same as tuning
some other parameter in a classification algorithm. Therefore, it is just
as likely that there is some overfitting and that the generalization error
should be estimated by using a separate test set.

>> So, we
>> cross-validate algorithm A: We use a data set to train it (and thus to
>> select i)
> This training data set consists of two subsets: a design data set and an
> independent validation data set.
> However, it is not clear how you are distinguishing algorithm i from
> algorithm j.

What do you mean by algorithm j?

>> Now, if we compare this to the ROC, the ROC is like the outcome of all
>> N algorithms.
> No.
> A ROC summarizes the performance of *one* algorithm w.r.t. varying a
> single parameter.
> You are comparing algorithm i with a ROC that comes from where?

I'm not comparing ROCs, or comparing an algorithm with an ROC. However,
that does help to prove my point: If algorithm 1 has a higher true
positive rate for a small false negative rate and algorithm 2 has a higher
true positive rate for a larger false negative rate, which is better? It
depends on the prevalence and costs. Neither algorithm is better by
itself. By creating a wrapper algorithm that chooses algorithm 1 for small
false negative rates and algorithm 2 for larger negative rates, a combined
ROC arises. Somehow, you seem to disagree with that possibility.

>> Based on the application, one would choose the best algorithm.
> Are you now referring to using "the" ROC to choose the best algorithm or
> are you referring to choosing the algorithm that yields the best ROC?

The first one; using 'the' ROC to choose the best setting of the
parameter, resulting in your final algorithm. (Again, I consider setting
the cut-off value the same as tuning the other parameters of the
algorithm.)

>> Cross-validation is therefore not possible before this selection has
>> been made.
> I'm lost.

I'll try again: Estimating the generalization error is therefore not
possible before this selection has been made.

> I will comment on the alternative you propose in the next to last
> paragraph in a separate post.

Regardless of whether we can sort this out: Thanks for your comments!

Koen

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Cross-validation of ROC

Reply via email to