Re: paired t-test for test-retest reliability reference?

Richard Hoenes Wed, 12 May 2004 05:27:43 -0700

On Tue, 11 May 2004 16:06:16 -0400, Richard Ulrich
<[EMAIL PROTECTED]> wrote:


>posted to sci.stat.edu  and sci.stat.consult
>
>RH  has made two separate posts to different groups, but 
>with slightly varied descriptions of the problem.
>
>
>On Mon, 10 May 2004 14:52:21 -0500, Richard Hoenes <[EMAIL PROTECTED]>
>wrote:
>
>> I submitted a paper that used a Pearson correlation and a paired
>> t-test to estimate test-retest reliability.  The journal has a new
>> statistician who first had me remove that Pearson correlation as
>> unnecessary and who now wants a reference for using a paired t-test as
>> a measure of test-retest reliability.  Of course, I can find other
>
>Perhaps you will to borrow this reply, to quote to the editor.
>It seems that the statistician's opinions don't survive the 'eyeball'
>test, if he doesn't want to consider either similarities or 
>differences in a question of reliability.
>
>Of course, if the measurement is so obvious that any demonstration
>about it should be considered redundant, then he should have
>said so, bluntly.  The between-time t-test could be large, without
>necessarily destroying the evidence of consistency across time:
>In this case, the Pearson should be preferred for its evidence, over
>any usual ICC, since the ICC computes with the pooled mean.
>
>
>> papers that use a paired t-test for a test-retest measure but I cannot
>> find an actual reference that says it is valid.  Any help would be
>> appreciated.
>
> - There are articles about analyzing longitudinal data, which
>make that point, that one should use all the evidence that is on
>hand, including the cross-period correlations.
>
>======= originally posted to sci.stat.consult
>On Tue, 11 May 2004 11:06:45 -0500, Richard Hoenes <[EMAIL PROTECTED]>
>wrote:
>
>> In a paper I used Pearson's r and a paired t-test to test test-retest
>> reliability.  The statistical reviewer has rejected it stating that
>> he wants me to include a reference that says using Pearson's r and a
>> paired t-test is a valid way to measure test-retest reliability.  Does
>> anyone know of such a reference?
>
>I don't know if you want to go this route -- but I have said
>it a number of time, over the last several years, in the sci.stat.*
>groups, and you could dig up those references.  
>
>The t-test part is important for between-rater differences; that
>is the equivalent of testing them by Oneway Anova.  Its relevance
>is more ambiguous for test-retest, across two weeks.
>
>I'm willing to concede to a demurrals that have been posted,
>to the effect that some editors insist on some ICC.  Unfortunately
>(as I see it), the most convenient testing for examining your 
>data does not correspond to what editors or some theoretical
>statisticians have been taught.  

If it was the ICC he was pushing I wouldn't mind so much, but he has
insisted we include Bland & Altman's limits of agreement (which is
simply the mean difference +/- [1.96*stddev] which has no signficance
test), and he is now systematically having us remove every other
statistical test we've included in the paper.  The only other test
left in the paper is the paired t-test and now he wants a reference to
show it is valid to use.  I'm hoping to find a reference that will
allow us to keep the paired t-test and bring back the Pearson's r.

The question regarding Pearson's r and ICC below just popped into my
head when I was working on all this and for this paper.
>
>> 
>> Also, two identical tests were given approximately two weeks apart
>> with no expected difference in means/std dev (ie, what was being
>> tested is stable over such a period of time).  It is my understanding
>> that under such conditions Pearson's r equals ICC.  Is this correct?
>> Is there a reference for this as well?
>
>That is nearly correct, but not quite.  For one thing, there are
>a dozen varieties of ICC, and at least a couple of them could
>be used in any situation -- The clarity of communication is 
>one reason that I prefer the Pearson's r.  You get an ICC, precisely,
>when you *compute* the correlation with the pooled mean.
>That is common knowledge that should be mentioned in 
>passing, in texts.  If the means are hardly different, then the
>computations will be practically identical, though not to the
>nth decimal.

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: paired t-test for test-retest reliability reference?

Reply via email to