Re: testing if classifier accuracy differs significantly

Donald Burrill Sat, 19 Aug 2000 16:04:31 -0700
Sorry about that.  Someone else tried to call out on the phone line I was 
logged in on.  What I was trying to say was:

Two different strategies occur to mind, both of which might I suppose be 
implemented severally: 

  1.  I suppose your image is a rectangular array of pixels, so that each 
pixel may be thought of as at the intersection of a row and a column of 
pixels.  There are then the individual pixels (all RxC of them) and three 
different aggregations: by row, by column, and by the whole image.  Seems 
to me this would permit an ANOVA-like analysis, using for dependent 
variable some suitable error function between the known label for each 
function and the classifier's label, with sources of variation 
representing rows and columns (in neither of which you would have much 
interest, I imagine), classifiers (whose main effect is equivalent to the 
t-test you mention below), and interactions between (classifiers and 
rows) and (classifiers and columns) (these latter two  representing 
different levels of aggregation than the whole image.

  2.  Instead of the structural components (rows & columns), you might 
use the various labels themselves for a one-way ANOVA partitioning.  You 
would then get (using the same sort of dependent variable) measures of 
accuracy in classifying for each separate (semantic?) piece of the image, 
separately for each classifier;  and the interaction between those pieces 
and the classifiers, to measure systematic differences between 
classifiers, again possible to decompose by the separate pieces.

  3.  If there are other systematic distinctions in the image (color, 
perhaps?) additional analyses similar to #2 above can be imagined.
Both of these may be more or less difficult to implement across a 
collection of different images.

Hope these ideas help suggest something useful to you ...

> On Sat, 19 Aug 2000, Mark Everingham wrote in part:

> I have two classifier systems which take as input an image and produce
> as output a label for each pixel in the image, for example the input
> might be of an outdoor scene, and the labels sky/road/tree etc.
> 
> I have a set of images with the correct labels, so I can test how
> accurately a classifier performs by calculating for example the mean
> number of pixels correctly classified per image or the mean number of
> sky pixels correctly classified etc.
> 
> The problem is this: Given *two* different classifiers, I want to test
> if the accuracy achieved by each classifier differs *significantly*. One
> way I can think of doing this is:
> 
> for classifier 1,2
>       for each image
>               get % pixels correct
>       calculate mean and sd across images
> apply t-test
> 
> Because the images used for each classifier are the same, I assume I can
> use a paired t-test. Assuming the distribution of % correct across
> images is approximately normal, this should work fine.
> 
> However, I have two nagging objections to this:
> 
>  i) the accumulation of statistics across *images* rather than any other
> unit is fairly arbitrary
> 
> ii) because the *pixels* in each image are identical as well as the
> images, it seems to me that there may be a stronger statistic I can 
> use, rather than just lumping all the pixels of an image together and 
> taking the sum of correct pixels.  The analogy I am thinking of is 
> comparing performance on a pair of exams and looking at individual 
> questions rather than just taking the overall number of correct 
> responses. 
                Sensible analogy.  Leads to ANOVA-like analyses.
                                                                -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: testing if classifier accuracy differs significantly

Reply via email to