First, let's consider the 2 observation case. I have 2 assessments of a behavior rating taken 20 minutes apart; I wish to know how reliable the assessments are. There are two potential sources of error, the relative error over time, in which the order of scores for subject a and subject b on the two assessments may be the same or different, and the absolute error in which all subjects may be lower on the second assessment. If I do a Pearson correlation between the two, I find a correlation of .78097 (n=313, p < .0001). I do an analysis of variance with repeated measures on time (the equivalent of the paired t-test, and find a significant difference between the means (time 1, mean = 3.377, sd=1.10; mean 2 = 3.291, sd=1.16; F(1, 312) = 4.16; p = .0422). Now, I do a generalizability analysis. I find the following variance components:
Subjects .99269 Time .00300 Subjects by Time .27842 The generalizability coefficient (or ICC) considering only the relative error (interaction) is .99269 / (.99269 + .003) = .99269/1.27111 = .78096 which is the Pearson Correlation within rounding. I then figure the coefficient taking into account the mean difference as well. .99269 / (.99269 + .003 + .27842) = .99269 / 1.27411 = .779. I have had a minimal effect on the reliability as should be obvious by the variance component for time, which is very small relative to the other variance components. Thus, even though the difference between time 1 and 2 is significant (due in part to the large sample and the strong correlation between two observations taken 20 minutes apart), the effect on the reliability is small. Of course, I could observe that in the means as well, since they re very close, but of course, when you see two means, many people want to know if they are statistically different. Add to this result, the fact that, because in reality I have 5 assessments of the observed variable over an hour's time, the generalizability result is much easier to deal with than is 10 unique Pearson correlations and an ANOVA (hopefully not 10 paired t-tests), and it becomes clear that the generalizability analysis is cleaner than breaking the analysis into two parts. Paul R. Swank, Ph.D. Professor, Developmental Pediatrics Medical School UT Health Science Center at Houston -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard Ulrich Sent: Wednesday, May 12, 2004 2:52 PM To: [EMAIL PROTECTED] Subject: Re: [edstat] paired t-test for test-retest reliability reference? On 12 May 2004 06:37:30 -0700, [EMAIL PROTECTED] (Paul R Swank) wrote: > And doing a Pearson Coorelation and a t-test doesn't tell you the > overall impact of the error. If the t-test is *not* relevant, which may be true for test-retest, the Pearson can be a more proper measure of impact than the ICC which slightly decreases the reported score. - there are extra issues for your study if the variances are should not be pooled, for any choice of coefficient. If the t-test *is* relevant, it can be a warning of a grievous impact, all by itself; and that warning is generally masked by reporting an ICC which may be only slightly less than the Pearson r. Those are two reasons why the two tests together are better for *examining* your data, than looking at ICCs. Yes, it is the overall impact, and that can be useful for the *final* statement, especially when a very precise statement of overall impact is warranted -- because, for instance, power analyses are being based on the exact value of the exact form of ICC that is needed: Same versus different raters; single versus multiple scorers. And I think it is an over-generalization to prefer an ICC when the issue is the cruder one of apparent adequacy. The ICC is less informative (about means) and less transparent (multiple versions available to select, all of them burying the means). [snip, rest] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . ================================================================= . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
