Re: Scale Reliability (long)

John Donovan Fri, 10 Dec 1999 16:00:33 -0800
This topic certainly seems to have drawn quite a bit of attention
recently, so I wanted to address a recent post.

First, I highly disagree with the notion that "face validity" somehow
provides evidence that the test is measuring what it is supposed to. 
Perhaps you were referring to content validity, which is based on
judgments of item validity on the part of subject matter experts, rather
than face validity which refers to perceptions of test validity on the
part of the test taker.  I'll assume that you mean content validity.

Second, simply because a test functioned properly in the past, it
doesn't mean that it can be assumed to be functioning in a new study. 
This line of reasoning is what has led researchers to continue to use
outdated scales in assessing attitudes and personality characteristics. 
There has been quite a bit of research demonstrating that the
psychometric properties of a number of personality tests has changed
over the past ten years (particularly early measures of the big 5 model
of personality), so to assume that something is okay because it was okay
in the past seems a little questionable.  If you are simply going to
ignore the alpha and assume that the scale is fine to use, then why
compute it in the first place?

>     Consider this: if I have a math test made up of 20 questions, and everybody
> agrees that adding up the number right to get a total score is a good way to
> decide on a student's grade, wouldn't you also add up the number right and use
> if only 5 of the questions happened to be available? How reliably the total
> score measures the concept needs to be separated from whether the total score is
> a respectable measure of the concept. The VALIDITY of the shorter test should be
> distinguished from its (obviously lower) RELIABILITY.

Third, I would argue (along with a sizable number of psychometricians
such as Nunnally, 1994) would argue that reliability is a prerequisite
for validity, assuming that we are trying to measure a unitary construct
with this measure.  I can't really see any reason why an unreliable
measure can be a valid one, especially since this particular measure is
trying to assess a stable trait.  

If the four item measure is supposed to be measuring a single construct,
and the responses to the items bear little relation to one another,
shouldn't that be a sign that something is wrong?  It would point to the
possibility that what you are trying to measure is not a unitary
construct.

> . Have you
> considered it possible that their disappointment with your study might be due to
> the fact that "locus of control" is an outdated concept that has been overused
> by graduate students for 4 decades without contributing much of anything to our
> understanding of social behavior?

I see that we have now moved on from the realm of statistical analyses
to a discussion of what and what is not "outdated".  Aside from the
sweeping generalizations made (is it only grad students that use this
construct?), I have a problem with this critique of locus of control. 
Given that social psychologists have demonstrated that locus of control
is related to a number of group processes (for a review see Hackman,
1990, 1992;  Levine and Moreland, 1991, etc.), I think that you may be
overstating how "outdated" the concept of locus of control is.

Back to the main issue, I find it hard to defend the use of four items
that do not statistically relate to one another as a single scale score
to represent a unitary construct.  I'm not sure why people are arguing
that a low reliability is acceptable given that you only have four
items. I have personally used, and read of a number of 3 or 4 item
scales that demonstrated reliabilities of at least .70.

Just my 2 cents.  (Actually probably more like my 50 cents)

John
Re: Scale Reliability (long)

Reply via email to