There is presumably a reason (you haven't told us otherwise, anyway)
for using 4 items instead of the entire original scale of 20-some.
Might be useful to state that reason somewhere.
Consult your favourite measurement textbook: there's a formula
(Spearman-Brown) for estimating the reliability of a measure as a
function of the number of items, given an empirical estimate of the
reliability of a measure made up of a given number of items. Starting
with the known reliability of the full scale (and you _should_ know
exactly both the number of items on that scale and its published
reliability -- if in presenting your results it became evident that you
DIDn't have that information cold, _I_ would have been a bit frosty
with you had I been either on your committee or invited to be present
and comment!), estimate the reliability to be expected of a measure
consisting of only 4 items. My guess is that the estimated reliability
would not exceed the 0.3 you report empirically.
On Tue, 7 Dec 1999, Magill, Brett wrote:
> Just wanted people's thought on the following:
>
> I am a graduate student in sociology studying individual's perceptions
> of control (locus of control) using existing data.
By "existing data" do you mean that you're conducting a secondary
analysis of data originally collected for some other purpose? If that be
the case, the decision to include only four items must have predated your
association with the data; I fail to see why you would be criticized for
summing them, since the choice of "four" was not yours. And if as I
suspect the reliability you report is about what one would expect, it's
not at all clear why "your peers" would bother to criticize (unless
they're addicted to nit-picking, in which case there's hardly any reason
to pay them any attention).
You also do not mention whether the summing of these four items
produced a variable that showed some encouraging utility. I'm inclined
to suspect you wouldn't be raising these questions if it didn't, though.
The fact that a measure (however apparently "unreliable" it may be)
shows a useful validity is itself evidence that the measure is useful.
[One should bear in mind also that reliability is not, strictly
speaking, a characteristic merely of a measure -- it's a characteristic
of the measure IN THE CONTEXT IN WHICH IT WAS USED. That is, it
reflects BOTH some feature(s) of the measure AND some feature(s) of the
sample of respondents on which the measure was used. Use the same
measure on a different population and you'll get a different value of
alpha. (And different validity as well, but that's another story.)]
> The data set includes four items to measure this construct which were
> taken from a larger scale of more than twenty, the larger scale
> reaching an acceptable level of reliability (I do not know the exact
> level, but it is a widely researched and used instrument) ...
(As remarked above, you SHOULD know. Exactly. And you should
know something about the population on a sample of which the published
reliability was based. YOUR population might or might not be comparable
-- and if not, the unlikeness might naturally tend toward low alpha
values.)
> ... in previous research. The four items that were included were
> selected as the best measures of the construct based on empirical
> evidence (item-total correlations, factor analysis).
> In my own research, I used these items and decided to sum responses
> across these four likert-type items. However, the Alpha reliability is
> very low 0.30 (items were reverse scored as necessary and coding was
> double-checked).
This may not be unreasonably low...
> I defended the decision to sum the items, despite the low Alpha, based
> on the fact that they were selected from a larger set of items which
> are internally consistent. In presenting my findings, I was heavily
> criticized for this decision.
Who were the critics? Below you refer to "my peers": fellow
graduate students? (They tend to be the worst critics. Not enough
experience to have a sense of proportion about things, and nothing to lose
by being hypercritical; they may even perceive themselves to be "making
points" in some sense.) Faculty members? (What were the reactoins of your
own committee members, both to your presentation and to the criticisms
that arose?) What alternatives, if any, were offered, and how were they
justified? (If none, I'd be inclined to suspect the critics of expressing
unhappiness that the research reported wasn't the research they wish had
been carried out. Those criticisms one can ignore. It is not proper to
criticize an orange for not being an apple.)
> Now, I could use individual items and a procedure such as logistic
> regression (I was using GLM before with this scale as the dependent and
> a sample of better than 5000) without changing my conclusions (I ran
> logistic models anticipating the criticism), however I was not
> convinced that this is necessary.
Doubtless there are all sorts of things you _could_ do. But to
what end? Is the question you want to ask of the universe of discourse
answered, at least in part, by the analyses you chose to report? Would
that question (or those questions) be any better answered by another
procedure? Would interpretationsof the results of another procedure be
more readily understood by your audience? [One has suspicions...]
> My question is, is summing these items defensible or at least as
> defensible as summing any set of likert-type items to produce a single
> score?
Seems to me the problem hinges on the small (tiny, really) number
of items. If that small number is enough to produce interesting results,
that fact is itself interesting. (Given your alpha of 0.3, it might be
interesting to estimate the number of like items required to generate an
alpha of, say, 0.8 or 0.9. Or whatever value is considered "respectable"
by the fashion-setters of your colleagues. That could be presented as
the first step in the design of a follow-up study to confirm and extend
the present results (assuming, of course, that they're worth confirming
and extending).)
If, contrary to what I thought I understood, your research was
devised and carried out by you (instead of making use of existing data),
it may be fair game to fault you for using so few items. As remarked
above, you could have predicted that the measured reliability would be
low, and should have addressed that potential problem in the proposal,
either by using more items than 4, OR by addressing the intrinsic nature
of "reliability" and its ramifications in this context.
> Where could I find support for what I am doing if it is (clearly
> my peers won't just take my word for it)?
Perhaps some of the previous remarks will suggest avenues you
might pursue ...
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128