Will,
I gotta reply to this one!  I've done this type of thing a number of times.

Will Hopkins wrote:

> I have an important (for me) question, but first a preamble and 
> hopefully some useful info for people using Likert scales.
> 
> A week or so ago I initiated a discussion about how non-normal the 
> residuals have to be before you stop trusting analyses based on 
> normality.  Someone quite rightly pointed out that it depends on the 
> sample size, because the sampling distribution of almost every 
> statistic derived from a variable with almost any distribution is near 
> enough to normal for a large enough sample, thanks to the central 
> limit theorem.  Therefore you get believable confidence limits from t 
> statistics.

The distribution of the average of 12 observations, taken from a 'saw 
tooth' population, is about 1 significant line width away from a normal 
population.  n, the sample size, doesn't have to be very big.

> 
> But how non-normal, and how big a sample? I have been doing 
> simulations to find out.  I've limited the simulations to t tests for 
> Likert scales with only a few levels, because these crop up often in 
> research, and Likert-scale variables with responses stacked up at one 
> end are not what you call normally distributed.   Yes, I know you can 
> and maybe should analyze these with logistic regression, but it's hard 
> work for statistically challenged research students, and the outcomes 
> (odds ratios) are hard for all but statisticians to understand.  
> Scoring the levels with integers and working out averages is so much 
> easier to do and interpret.
> 
> My simulations have produced some seemingly amazingly good results.  
> For example, with a 3-point scale with values of 1, 2 and 3, samples 
> of as few as 10 in each of two groups give accurate confidence 
> intervals for the difference in the means of the groups when both 
> means are ~2.0 (i.e. in the middle) and SDs are ~0.7 (i.e. the 
> majority of observations on 2, with a reasonable number on 1 and 3).  
> They are still accurate even when one of the groups is stacked up at 
> one end with a mean of 2.6 (and SD ~0.5).  If both means are stacked 
> up at one or either end, sample sizes of 20 or more are needed, 
> depending on how extreme the stacking is.  Likert scales with more 
> than 3 levels work perfectly for anything except responses stacked up 
> in the same extreme way at either end.

Aren't these getting over toward some kind of binary distribution?

> Now, my question. Suppose in real life I have a sample of 10 
> observations of, say, a 5-point scale scored as 1 to 5.  Suppose I get 
> 1 response on 3, 5 responses on 4 and 4 responses on 5.  

You have assumed that a response must be integer - i.e., ordinal scale.  
The best 'resolution' of your scale is, roughly, 20% - one unit in 5.  
If I knew enough math, I might be able to show what is the least 
difference in two means that you could use, to demonstrate a difference 
in those means.  For a given sample size.

> The mean is therefore 4.3.  Suppose the other group is no problem 
> (e.g., 10 or more responses spread around the middle somewhere).  Now, 
> according to my simulations, it's OK for me to do a t test to get the 
> confidence limits for the difference, isn't it?  Now suppose the first 
> group was stacked more extremely, with 2 on 4 and 8 on 5.  The mean 
> for this group is now 4.8.  According to my simulations, that's too 
> extreme to apply the t test, with a sample of 10, anyway.  

Suppose I have 5 coins, weighted so p(heads) = .96.  Count a head as 1, 
a tail as 0.  Toss 5 and add up the coins.  Multile times.  Average: 
4.8  Could I use the binary caluclations to determine the sample size 
requried before the Student 't' and normal dist. could apply?  You bet!

> Is this the correct way to apply the results of my simulations?  I can 
> see how it could fall over:  you could in principle get a sample of 
> 1x3, 5x4 and 4x5 when the true distribution has a mean of 4.8, but the 
> chance of that happening is small.
> 
> To put the question in a more general context of simulation:  if the 
> observed sample has a value of the outcome statistic that simulations 
> show has an accurate confidence interval for the given sample size 
> when that value is the population value of the statistic, is the 
> resulting confidence interval accurate?
> 
> Will

I'm not clear why you 'give' away information by making your Likert 
scale into an ordinal value, instead of accepting fractional units, such 
as 0.5 (2.5, 4.5, etc.).  Whenever a survey respondent puts the 'x' mark 
part way between the box for 'neutral' and 'somewhat agree,' they are 
trying to tell you that they use a continuous scale.  This additional 
information the researcher throws away when they shift the 'x' to 
'neutral' or worse, throw it out altogether.

If you say that this additional information is not 'real,' because the 
respondent cannot be that 'fine' in their accuracy of response, then I 
would urge that additional effort be placed on getting better precision 
in the respondent.  Elsewhere, I've described ways that I and my techs 
have done this.

the other thing I'm not clear on is why you would not use a logit 
transform to achieve a distribution closer to a normal in the shape of 
the tails.  Odds ratios are not fun for introductory students (and 
others!), granted.  But I would use a spread sheet  - Excel seems to be 
acceptable to this discussion group for spread sheet work :) - to make 
my transform, then do the analysis, and then back transform to get 
predicted intervals I could plot & understand.  The precision of the 
scale I use (with half points, 10 marks on the 'ruler' over its whole 
length) is not so hot anyway.  Like a 12 inch ruler with no fractions 
for inches, or a meter stick with only decimeters marked.  We haven't 
discussed whether the increments are equal, and even so, it probably is 
not a ratio scale with a true zero.  In sum, precise statements of 
prediction and conclusion simply aren't warranted. 

If I am careful to set up 'standards' for the ends and center of the 
scale, then I can be confident of the 0.5 increment.  A prediction to 
less than 0.25 point would be a waste of time.  Assessments/measurements 
with less concrete anchors must result in a less precise prediction.  Or 
address large sample averages.

If I take the logit transform, do my work and CI's in that scale, then 
back transform to the 'real' scale for discussion purposes, I could use 
the standard 't' distribution calculations, with which I and my students 
are presumably familiar.  I could use that means to estimate sample size 
requirements, CI's and significance levels.  OK, it's not exactly 
normal.  But I will get a predicted result, which I can test through a 
confirmation trial (if that is permitted).  Where would this approach go 
wrong?  I really need to know, and what an alternative might be.

Jay

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA

Ph:     (262) 634-9100
FAX:    (262) 681-1133
email:  [EMAIL PROTECTED]
web:    http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to