Re: Weighted Kappa
Your post makes it seem unclear that kappa is the right statistic. Usually one uses kappa when each rater/clinician rates a sample of patients or cases. But you merely describe a questionnaire (sp?) that each clinician completes. Assuming each clinician completes the questionnaire only one time (as opposed to, say, one time in relation to each of a sample of patients), then I don't see that kappa is appropriate. Instead, one would use simpler statistics--such as calculating the standard deviation, across clinicians, for each item. You also raise the issue of many possible rater pairs--note, though, that there are only (16 * 15) / 2 = 120 unique pairs that involve different raters. Rather than calculate 120 different kappa coefficients, a simpler alternative might be to calculate the general kappa that measures agreement between any two raters--considering all raters simultaneously. That is done with Fleiss' kappa (as opposed to Cohen's kappa, which only applies for pairwise comparisons). For a discussion of the difference between these two types of kappa, see Joseph Fleiss, Statistical Methods for Rates and Proportions, 1981. -- John Uebersax [EMAIL PROTECTED] === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Weighted Kappa
As I recall, Kappa is a measurement of agreement. It is best used for dichotomous outcomes such as judgment by raters in terms of "mastery/non-mastery" "pass/fail". I am not sure if it is proper for your data. If the data are continuous-scaled and more than two raters involved, a repeated measures approach can be used to check the reliability: Horst, P. (1949). A Generalized expression for the reliability of measures. Psychometrika, 14, 21-31. Chong-ho (Alex) Yu, Ph.D., MCSE, CNE Instruction and Research Support Information Technology Arizona State University Tempe AZ 85287-0101 Voice: (602)965-7402 Fax: (602)965-6317 Email: [EMAIL PROTECTED] URL:http://seamonkey.ed.asu.edu/~alex/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Weighted Kappa
On 3 Mar 2000 11:36:25 -0800, [EMAIL PROTECTED] (Marie Elaine Rump) wrote: ... We are in the middle of a study that compares 16 clinicians answers to a questionnaire (answers selected from 0,1,2,3) and would like to use weighted kappa to analyse our intra and inter rater results. For inter rater analysis the 16 raters produce 256 pairings. We are looking for a some advice/program that might be able to help us. SNIP Are these clinicians each rating the same set of patients, and if so, how many? The "reliability" that you will figure is going to be an assessment over *this sample*, as is always the case. If the clinicians are not rating patients, then I don't yet know what your design is about; you will have to give more detail; and kappa might be quite irrelevant. Kappa is best for 2x2 tables, anyway. You probably should be interested in the Pearson correlation, plus the paired t-test. See my stats-FAQ for a bit more discussion. What are your hypotheses? Are you looking for accuracy, or are you looking for styles of rating? If you are looking for styles, then you might want to do some sort of factor analysis across some key scores. For accuracy: you will vastly reduce your complexity if you designate a modal score, or a "gold standard" of a correct score. Then you will have 16 comparisons instead of 120 (or, does order matter?) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Weighted Kappa
We are a group of undergraduate physio students and were wondering if you could help us. We are in the middle of a study that compares 16 clinicians answers to a questionnaire (answers selected from 0,1,2,3) and would like to use weighted kappa to analyse our intra and inter rater results. For inter rater analysis the 16 raters produce 256 pairings. We are looking for a some advice/program that might be able to help us. Thanks for your time. Claudia and Annie. 3rd year physiotherapy students. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===