Sorry about that. Someone else tried to call out on the phone line I was
logged in on. What I was trying to say was:
Two different strategies occur to mind, both of which might I suppose be
implemented severally:
1. I suppose your image is a rectangular array of pixels, so that each
pixel may be thought of as at the intersection of a row and a column of
pixels. There are then the individual pixels (all RxC of them) and three
different aggregations: by row, by column, and by the whole image. Seems
to me this would permit an ANOVA-like analysis, using for dependent
variable some suitable error function between the known label for each
function and the classifier's label, with sources of variation
representing rows and columns (in neither of which you would have much
interest, I imagine), classifiers (whose main effect is equivalent to the
t-test you mention below), and interactions between (classifiers and
rows) and (classifiers and columns) (these latter two representing
different levels of aggregation than the whole image.
2. Instead of the structural components (rows & columns), you might
use the various labels themselves for a one-way ANOVA partitioning. You
would then get (using the same sort of dependent variable) measures of
accuracy in classifying for each separate (semantic?) piece of the image,
separately for each classifier; and the interaction between those pieces
and the classifiers, to measure systematic differences between
classifiers, again possible to decompose by the separate pieces.
3. If there are other systematic distinctions in the image (color,
perhaps?) additional analyses similar to #2 above can be imagined.
Both of these may be more or less difficult to implement across a
collection of different images.
Hope these ideas help suggest something useful to you ...
> On Sat, 19 Aug 2000, Mark Everingham wrote in part:
> I have two classifier systems which take as input an image and produce
> as output a label for each pixel in the image, for example the input
> might be of an outdoor scene, and the labels sky/road/tree etc.
>
> I have a set of images with the correct labels, so I can test how
> accurately a classifier performs by calculating for example the mean
> number of pixels correctly classified per image or the mean number of
> sky pixels correctly classified etc.
>
> The problem is this: Given *two* different classifiers, I want to test
> if the accuracy achieved by each classifier differs *significantly*. One
> way I can think of doing this is:
>
> for classifier 1,2
> for each image
> get % pixels correct
> calculate mean and sd across images
> apply t-test
>
> Because the images used for each classifier are the same, I assume I can
> use a paired t-test. Assuming the distribution of % correct across
> images is approximately normal, this should work fine.
>
> However, I have two nagging objections to this:
>
> i) the accumulation of statistics across *images* rather than any other
> unit is fairly arbitrary
>
> ii) because the *pixels* in each image are identical as well as the
> images, it seems to me that there may be a stronger statistic I can
> use, rather than just lumping all the pixels of an image together and
> taking the sum of correct pixels. The analogy I am thinking of is
> comparing performance on a pair of exams and looking at individual
> questions rather than just taking the overall number of correct
> responses.
Sensible analogy. Leads to ANOVA-like analyses.
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================