posted to sci.stat.edu  and sci.stat.consult

RH  has made two separate posts to different groups, but 
with slightly varied descriptions of the problem.


On Mon, 10 May 2004 14:52:21 -0500, Richard Hoenes <[EMAIL PROTECTED]>
wrote:

> I submitted a paper that used a Pearson correlation and a paired
> t-test to estimate test-retest reliability.  The journal has a new
> statistician who first had me remove that Pearson correlation as
> unnecessary and who now wants a reference for using a paired t-test as
> a measure of test-retest reliability.  Of course, I can find other

Perhaps you will to borrow this reply, to quote to the editor.
It seems that the statistician's opinions don't survive the 'eyeball'
test, if he doesn't want to consider either similarities or 
differences in a question of reliability.

Of course, if the measurement is so obvious that any demonstration
about it should be considered redundant, then he should have
said so, bluntly.  The between-time t-test could be large, without
necessarily destroying the evidence of consistency across time:
In this case, the Pearson should be preferred for its evidence, over
any usual ICC, since the ICC computes with the pooled mean.


> papers that use a paired t-test for a test-retest measure but I cannot
> find an actual reference that says it is valid.  Any help would be
> appreciated.

 - There are articles about analyzing longitudinal data, which
make that point, that one should use all the evidence that is on
hand, including the cross-period correlations.

======= originally posted to sci.stat.consult
On Tue, 11 May 2004 11:06:45 -0500, Richard Hoenes <[EMAIL PROTECTED]>
wrote:

> In a paper I used Pearson's r and a paired t-test to test test-retest
> reliability.  The statistical reviewer has rejected it stating that
> he wants me to include a reference that says using Pearson's r and a
> paired t-test is a valid way to measure test-retest reliability.  Does
> anyone know of such a reference?

I don't know if you want to go this route -- but I have said
it a number of time, over the last several years, in the sci.stat.*
groups, and you could dig up those references.  

The t-test part is important for between-rater differences; that
is the equivalent of testing them by Oneway Anova.  Its relevance
is more ambiguous for test-retest, across two weeks.

I'm willing to concede to a demurrals that have been posted,
to the effect that some editors insist on some ICC.  Unfortunately
(as I see it), the most convenient testing for examining your 
data does not correspond to what editors or some theoretical
statisticians have been taught.  

> 
> Also, two identical tests were given approximately two weeks apart
> with no expected difference in means/std dev (ie, what was being
> tested is stable over such a period of time).  It is my understanding
> that under such conditions Pearson's r equals ICC.  Is this correct?
> Is there a reference for this as well?

That is nearly correct, but not quite.  For one thing, there are
a dozen varieties of ICC, and at least a couple of them could
be used in any situation -- The clarity of communication is 
one reason that I prefer the Pearson's r.  You get an ICC, precisely,
when you *compute* the correlation with the pooled mean.
That is common knowledge that should be mentioned in 
passing, in texts.  If the means are hardly different, then the
computations will be practically identical, though not to the
nth decimal.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to