Re: Weighted Kappa

2000-03-08 Thread John Uebersax

Your post makes it seem unclear that kappa is the right statistic.
 
Usually one uses kappa when each rater/clinician rates a sample of
patients or cases.  But you merely describe a questionnaire (sp?) that
each clinician completes.  Assuming each clinician completes the
questionnaire only one time (as opposed to, say, one time in relation to
each of a sample of patients), then I don't see that kappa is
appropriate.  Instead, one would use simpler statistics--such as
calculating the standard deviation, across clinicians, for each item.
 
You also raise the issue of many possible rater pairs--note, though,
that there are only (16 * 15) / 2 = 120 unique pairs that involve
different raters.  Rather than calculate 120 different kappa
coefficients, a simpler alternative might be to calculate the general
kappa that measures agreement between any two raters--considering all
raters simultaneously.  That is done with Fleiss' kappa (as opposed to
Cohen's kappa, which only applies for pairwise comparisons).  For a
discussion of the difference between these two types of kappa, see
Joseph Fleiss, Statistical Methods for Rates and Proportions, 1981.
 
--
John Uebersax
[EMAIL PROTECTED]


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Weighted Kappa

2000-03-08 Thread Alex Yu


As I recall, Kappa is a measurement of agreement. It is best used for 
dichotomous outcomes such as judgment by raters in terms of 
"mastery/non-mastery" "pass/fail". I am not sure if it is proper for your 
data. If the data are continuous-scaled and more than two raters 
involved, a repeated measures approach can be used to check the reliability:

Horst, P. (1949). A Generalized expression for the reliability of 
measures. Psychometrika, 14, 21-31.


Chong-ho (Alex) Yu, Ph.D., MCSE, CNE
Instruction and Research Support
Information Technology
Arizona State University
Tempe AZ 85287-0101
Voice: (602)965-7402
Fax: (602)965-6317
Email: [EMAIL PROTECTED]
URL:http://seamonkey.ed.asu.edu/~alex/
   
  




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Weighted Kappa

2000-03-05 Thread Rich Ulrich

On 3 Mar 2000 11:36:25 -0800, [EMAIL PROTECTED] (Marie Elaine Rump)
wrote:
  ...
 
 We are in the middle of a study that compares 16 clinicians 
 answers to a questionnaire (answers selected from 0,1,2,3) and 
 would like to use weighted kappa to analyse our intra and inter 
 rater results.  For inter rater analysis the 16 raters produce 256 
 pairings.  We are looking for a some advice/program that might be 
 able to help us.
  SNIP 
Are these clinicians each rating the same set of patients, and if so,
how many?  The "reliability" that you will figure is going to be an
assessment over *this sample*,  as is always the case.  

If the clinicians are not rating patients, then I don't yet know what
your design is about; you will have to give more detail; and kappa
might be quite irrelevant.

Kappa is best for 2x2 tables, anyway.  You probably should be
interested in the Pearson correlation, plus the paired t-test.  See my
stats-FAQ for a bit more discussion.

What are your hypotheses?  Are  you looking for accuracy, or are you
looking for styles of rating?  If you are looking for styles, then you
might want to do some sort of factor analysis across some key scores.

For accuracy:  you will vastly reduce your complexity if you designate
a modal score, or a "gold standard" of a correct score.  Then you will
have 16 comparisons instead of 120 (or, does order matter?)
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Weighted Kappa

2000-03-03 Thread Marie Elaine Rump

We are a group of undergraduate physio students and were 
 wondering if you could help us.

We are in the middle of a study that compares 16 clinicians 
answers to a questionnaire (answers selected from 0,1,2,3) and 
would like to use weighted kappa to analyse our intra and inter 
rater results.  For inter rater analysis the 16 raters produce 256 
pairings.  We are looking for a some advice/program that might be 
able to help us.

Thanks for your time.

Claudia and Annie.
3rd year physiotherapy students.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===