Re: regression to the mean

dennis roberts Wed, 17 Jan 2001 07:17:38 -0800

here is an example to ponder ...
let's say that you are an instructor in a course and have decided to 
administer a 100 point final exam ... the very first day of class ... and 
then some alternate form of that 100 item test the very last day of class 
... in general, to see what people "gain"

now, scores are pretty low on the first day ... and since kids learn alot 
... the scores went up alot by the end of the course .... have a look at 
these data
=========


          -                                         *
  post    -                                         * *
          -                                                     *
          -                                             2 *
        80+                                 * 2 *
          -                             2           *   * *
          -                             *   *         *
          -                                 *     *
          -                           *     *
        60+
          -       *                                 *
          -             *           *   *
          -
          -                         *
        40+                                       *
          -
          -
            ----+---------+---------+---------+---------+---------+--pre
             10.0      15.0      20.0      25.0      30.0      35.0

positive r between pre and post ... makes sense

MTB > desc c16 c17

Descriptive Statistics: pre, post


Variable             N       Mean     Median     TrMean      StDev    SE Mean
pre                 30     25.200     25.500     25.615      5.006      0.914
post                30      71.53      76.50      72.23      14.30       2.61

Variable       Minimum    Maximum         Q1         Q3
pre             11.000     34.000     22.000     29.000
post             39.00      95.00      61.50      81.50

MTB > corr c16 c17

Correlations: pre, post


Pearson correlation of pre and post = 0.604
P-Value = 0.000

now, what if you look at the gain ... from pre to post ... then plot the 
pre scores against the gain

MTB > plot c30 c16

Plot


  gain    -
          -                                         *
          -                                         *
        60+                                           *
          -                             2   * 2 *       * *
          -                             *               *       *
          -                                 *       *   * *
          -       *                   *     *
        40+             *                   *     *   *
          -
          -                         *   *
          -                                         *
          -                         *
        20+
          -
          -                                       *
          -
            ----+---------+---------+---------+---------+---------+--pre
             10.0      15.0      20.0      25.0      30.0      35.0

MTB > corr c16 c30

Correlations: pre, gain


Pearson correlation of pre and gain = 0.303
P-Value = 0.104

the correlation between pre and gain is POSITIVE .3 ... not high of course 
but, it is POSITIVE
this means that the ones who scored highest on the pre GAINED THE MOST
the ones who scored lowest on the pre ... GAINED THE LEAST



MTB > sort c16(c30), c31(c32);
SUBC> desc c16.
MTB > prin c31 c32

if i sort the pre from high to low and then list the gain ... we can see 
easily that the high pres gain more in fact, the top 6 gain about 51 points 
on average ... while the low 6 gain only about 35 points on average

  Row  sortpre  samegain

    1       34        53
    2       31        54
    3       31        46
    4       30        53
    5       30        47
    6       30        55
    7       29        41
    8       29        61
    9       28        67
   10       28        49
   11       28        29
   12       28        62
   13       27        12
   14       27        41
   15       26        54
   16       25        54
   17       25        54
   18       24        57
   19       24        46
   20       24        44
   21       24        41
   22       22        54
   23       22        50
   24       22        55
   25       22        31
   26       21        42
   27       20        32
   28       20        24
   29       14        39
   30       11        43

if you are thinking about regression to the mean in the typical way ... how 
come this "regression reversal" seems to have occured?




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: regression to the mean

Reply via email to