(Cross-posted to the edstat list) I think it not useful to refer the idea of "ceiling effect" to features like <% correct>. Much depends on the distribution of effective difficulty levels of the items. Your <90%> is <two items wrong>. If (for example) the 2 wrong items are items of much higher difficulty than the rest of the items, 90% could well represent a strong ceiling effect; even if only one item is very difficult, for that matter. (This argument is not so persuasive if the 2 wrong items vary a lot from person to person among those scoring 18/20.)
And you need to consider the effect of the general noise level of the measure, and the probabilities of getting difficult items right "by chance" -- which probabilities are probably higher for multiple-choice items than for constructed-response items. Same considerations apply to your <82%>, which represents <3 or 4 items wrong, on average>. (Presumably the two tests have different items? So that the extra 1 or 2 items wrong does not really represent a distinct decrement in performance/knowledge/...? In which case the random noise level around an individual score is so large that you can't distinguish anything anyway?) Another characteristic one expects to find, if a "ceiling effect" is really present, is a distinctly skewed distribution of individual scores, trailing off toward the low scores. But distributional shapes are notoriously difficult to assess with small numbers of cases, and you apparently have at most 15 subjects per group. If your 18/20 average arose from a distribution like case A or case B (use a monospaced font to see the graphical effect intended): score frequency, case A frequency, case B 20 1 * 0 19 3 *** 3 *** 18 7 ******* 10 ********** 17 3 *** 1 * 16 1 * 1 * < 16 0 0 Equivalently, without the graphical display: score, items correct: 20 19 18 17 16 15 or less frequency, case A: 1 3 7 3 1 0 frequency, case B: 0 3 10 1 1 0 there would clearly be no evidence to support the idea of a ceiling effect as one usually thinks of such an effect: both distributions is nicely symmetric around 18, if of perhaps unexpectedly low variance. But one ought to take stock of how "one usually thinks". Sometimes <all Ss clustered closely about a high score> may represent a kind of ceiling effect, and to look for a long-tailed distribution is to look for a red herring. Especially if there is interesting agreement among Ss about which items are answered correctly / incorrectly. Similar considerations would apply to floor effects, but probably not symmetrically so in practice. Having a few very difficult items is a situation more likely to be encountered than having a few very easy items, I think. HTH. -- Don. On Thu, 11 Mar 2004, Paul Ginns wrote: > Dear colleagues, > > I'm interested in your experiences as to what level of test > performance would constitute a "ceiling effect". > > I'm just doing some data entry and appear to be getting ceiling > effects in both my experimental groups (total n = 30), on both tests. > On the first test, the average for both groups is 18/20 (90% correct), > and for the second test, 16.4/20 (82%) (ie, the manipulation failed to > produce any difference between groups). I'd be happy to accept that > 90% correct represents a ceiling effect, but how about 82%? Any > thpughts on how low a range of scores would represent floor effects, > while we're discussing such things? ------------------------------------------------------------ Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
