"Silvert, Henry" wrote:
> 
> I would like to add that with this kind of data [three-level ordinal] 
> we use the median instead of the average.

   Might I suggest that *neither* is appropriate for most purposes?  In
many ways, three-level ordinal data is like dichotomous data - though
there are a couple critical differences.

   Nobody would use the median (which essentially coincides with the
mode) for dichotomous data unless thay had a very specific reason for
wanting that specific bit of information (and I use the word "bit" in
its technical sense.)  By contrast, the mean (=proportion) is a lossless
summary of the data up to permutation (and hence a sufficient statistic
for any inference that assumes an IID model) - about as good as you can
get.  

  With three levels, the mean is of course hopelessly uninterpretable
without some way to establish the relative distances between the levels.
However, the median is still almost information-free (total calorie
content per 100-gram serving <= log_2(3) < 2 bits).  I would suggest
that unless there is an extremely good reason to summarize the data as
ONE number, three-level ordinal data should be presented as a frequency
table. Technically one row could be omitted but there is no strong
reason to do so. 

        "What about inference?"  Well, one could create various nice
modifications on a confidence interval; most informative might be a
confidence (or likelihood) region within a homogeneous triangle plot,
but a double confidence interval for the two cutoff points would be
easier. As for testing - first decide what your question is. If it *is*
really "are the employees in state X better than those in state Y?" you
must then decide what you mean by "better". *Do* you give any weight to
the number of "exceeded expectations" responses?  Do you find 30-40-30
to be better than 20-60-20, equal, or worse? What about 20-50-30?  If
you can answer all questions of this type, by the way, you may be ready
to establish a scale to convert your data to ratio. If you can't, you
will have to forego your hopes of One Big Hypothesis Test.  

        I do realize that we have a cultural belief in total ordering and
single parameters, and we tend to take things like stock-market and
cost-of-living indices, championships and MVP awards, and quality- of-
living indices, more seriously than we should. We tend to prefer events
not to end in draws; sports that can end in a draw tend to have
(sometimes rather silly) tiebreaking mechanisms added to them. Even in
sports (chess, boxing) in which the outcomes of (one-on-one) events are
known to be sometimes intransitive, we insist on "finding a champion". 
But perhaps the statistical community ought to take the lead in opposing
this bad habit!

        To say that "75% of all respondents ranked Ohio employees as having
'Met Expectations' or 'Exceeded Expectations.' ", as a single measure,
is not a great deal better than taking the mean in terms of information
content *or* arbitrariness. Pooling  two levels and taking the
proportion is just taking the mean with a 0-1-1 coding.  It says, in
effect, that we will consider 

        (Exceed - Meet)/(Meet - Fail) = 0 

while taking the mean with a 0-1-2 coding says that we will consider 

        (Exceed - Meet)/(Meet - Fail) = 1.

One is no less arbitrary than the other. (An amusing analogy can be
drawn with regression, when users of OLS regression, implicitly assuming
all the variation to be in the dependent variable, sometimes criticise
the users of neutral regression for being "arbitrary" in assuming the
variance to be equally divided.)

        -Robert Dawson


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to