Will, I gotta reply to this one! I've done this type of thing a number of times. Will Hopkins wrote: > I have an important (for me) question, but first a preamble and > hopefully some useful info for people using Likert scales. > > A week or so ago I initiated a discussion about how non-normal the > residuals have to be before you stop trusting analyses based on > normality. Someone quite rightly pointed out that it depends on the > sample size, because the sampling distribution of almost every > statistic derived from a variable with almost any distribution is near > enough to normal for a large enough sample, thanks to the central > limit theorem. Therefore you get believable confidence limits from t > statistics. The distribution of the average of 12 observations, taken from a 'saw tooth' population, is about 1 significant line width away from a normal population. n, the sample size, doesn't have to be very big. > > But how non-normal, and how big a sample? I have been doing > simulations to find out. I've limited the simulations to t tests for > Likert scales with only a few levels, because these crop up often in > research, and Likert-scale variables with responses stacked up at one > end are not what you call normally distributed. Yes, I know you can > and maybe should analyze these with logistic regression, but it's hard > work for statistically challenged research students, and the outcomes > (odds ratios) are hard for all but statisticians to understand. > Scoring the levels with integers and working out averages is so much > easier to do and interpret. > > My simulations have produced some seemingly amazingly good results. > For example, with a 3-point scale with values of 1, 2 and 3, samples > of as few as 10 in each of two groups give accurate confidence > intervals for the difference in the means of the groups when both > means are ~2.0 (i.e. in the middle) and SDs are ~0.7 (i.e. the > majority of observations on 2, with a reasonable number on 1 and 3). > They are still accurate even when one of the groups is stacked up at > one end with a mean of 2.6 (and SD ~0.5). If both means are stacked > up at one or either end, sample sizes of 20 or more are needed, > depending on how extreme the stacking is. Likert scales with more > than 3 levels work perfectly for anything except responses stacked up > in the same extreme way at either end. Aren't these getting over toward some kind of binary distribution? > Now, my question. Suppose in real life I have a sample of 10 > observations of, say, a 5-point scale scored as 1 to 5. Suppose I get > 1 response on 3, 5 responses on 4 and 4 responses on 5. You have assumed that a response must be integer - i.e., ordinal scale. The best 'resolution' of your scale is, roughly, 20% - one unit in 5. If I knew enough math, I might be able to show what is the least difference in two means that you could use, to demonstrate a difference in those means. For a given sample size. > The mean is therefore 4.3. Suppose the other group is no problem > (e.g., 10 or more responses spread around the middle somewhere). Now, > according to my simulations, it's OK for me to do a t test to get the > confidence limits for the difference, isn't it? Now suppose the first > group was stacked more extremely, with 2 on 4 and 8 on 5. The mean > for this group is now 4.8. According to my simulations, that's too > extreme to apply the t test, with a sample of 10, anyway. Suppose I have 5 coins, weighted so p(heads) = .96. Count a head as 1, a tail as 0. Toss 5 and add up the coins. Multile times. Average: 4.8 Could I use the binary caluclations to determine the sample size requried before the Student 't' and normal dist. could apply? You bet! > Is this the correct way to apply the results of my simulations? I can > see how it could fall over: you could in principle get a sample of > 1x3, 5x4 and 4x5 when the true distribution has a mean of 4.8, but the > chance of that happening is small. > > To put the question in a more general context of simulation: if the > observed sample has a value of the outcome statistic that simulations > show has an accurate confidence interval for the given sample size > when that value is the population value of the statistic, is the > resulting confidence interval accurate? > > Will I'm not clear why you 'give' away information by making your Likert scale into an ordinal value, instead of accepting fractional units, such as 0.5 (2.5, 4.5, etc.). Whenever a survey respondent puts the 'x' mark part way between the box for 'neutral' and 'somewhat agree,' they are trying to tell you that they use a continuous scale. This additional information the researcher throws away when they shift the 'x' to 'neutral' or worse, throw it out altogether. If you say that this additional information is not 'real,' because the respondent cannot be that 'fine' in their accuracy of response, then I would urge that additional effort be placed on getting better precision in the respondent. Elsewhere, I've described ways that I and my techs have done this. the other thing I'm not clear on is why you would not use a logit transform to achieve a distribution closer to a normal in the shape of the tails. Odds ratios are not fun for introductory students (and others!), granted. But I would use a spread sheet - Excel seems to be acceptable to this discussion group for spread sheet work :) - to make my transform, then do the analysis, and then back transform to get predicted intervals I could plot & understand. The precision of the scale I use (with half points, 10 marks on the 'ruler' over its whole length) is not so hot anyway. Like a 12 inch ruler with no fractions for inches, or a meter stick with only decimeters marked. We haven't discussed whether the increments are equal, and even so, it probably is not a ratio scale with a true zero. In sum, precise statements of prediction and conclusion simply aren't warranted. If I am careful to set up 'standards' for the ends and center of the scale, then I can be confident of the 0.5 increment. A prediction to less than 0.25 point would be a waste of time. Assessments/measurements with less concrete anchors must result in a less precise prediction. Or address large sample averages. If I take the logit transform, do my work and CI's in that scale, then back transform to the 'real' scale for discussion purposes, I could use the standard 't' distribution calculations, with which I and my students are presumably familiar. I could use that means to estimate sample size requirements, CI's and significance levels. OK, it's not exactly normal. But I will get a predicted result, which I can test through a confirmation trial (if that is permitted). Where would this approach go wrong? I really need to know, and what an alternative might be. Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. 4444 North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================