On Fri, 30 Jun 2000, Bob Hayden wrote:

> Tom Moore asked...
> 
> Does anyone know of a good example of cubic regression that you'd be 
> willing to share?

and Bob replied with an example.  Here's another;  Bob, would you forward 
it to Tom, as I don't have his address?

As I vaguely recall, I found this years ago in Snedecor and Cochran.
Data are dry weights (in grams) of chick embryos, of ages 6 to 16 days. 
There is one observation per day, which may or may not (the source was 
not explicit) be the average of several individuals.  For this range 
of ages, the data rather nicely fit an exponential growth curve, which 
is of course superior theoretically to a polynomial function, but 'twas 
useful for sorting out students who understood some things from those who 
didn't.  The data set below contains, besides the two original variables, 
coefficients for orthogonal polynomials from linear through sextic.  
I used to ask students, at the end of the intermediate course, to carry 
out a regression analysis using the polynomials, and one regressing 
log(weight) on age, choose which they preferred, and explain why.
        Using linear through cubic predictors almost works, but has the 
interesting defect of predicting a negative weight for day 6;  since one 
clearly doesn't want to stop at quadratic, the optimal polynomial is 
quartic (quintic and sextic contributions being negligible).  Some 
students would prefer the polynomial function because the R-sq is larger 
than for the exponential (although the R-sq values are not strictly 
comparable, being based on different dependent variables);  I gave full 
credit for this if it was accompanied by a demonstration that the R-sq 
was significantly larger.
        To Dennis' concern about "R-sq = 100.0%", I point out that the 
values in question are actually 99.966% (using all 6 orthogonal 
polynomials), 99.957% (using linear through quartic), and 99.832% (using 
log(weight) as the dependent variable).  
 **********************************************************************
MTB > print c1-c8

 ROW   day  dry.wt  linear  quadratc  cubic  quartic  quintic  sextic

   1     6   0.029      -5        15    -30        6       -3      15
   2     7   0.052      -4         6      6       -6        6     -48
   3     8   0.079      -3        -1     22       -6        1      29
   4     9   0.125      -2        -6     23       -1       -4      36
   5    10   0.181      -1        -9     14        4       -4     -12
   6    11   0.261       0       -10      0        6        0     -40
   7    12   0.425       1        -9    -14        4        4     -12
   8    13   0.738       2        -6    -23       -1        4      36
   9    14   1.130       3        -1    -22       -6       -1      29
  10    15   1.882       4         6     -6       -6       -6     -48
  11    16   2.812       5        15     30        6        3      15

MTB > let c9 = logten(c2)
MTB > name c9 'log.wt'

Plots of the raw data (dry.wt vs. day) and of log.wt vs. day are shown 
at the end.  Although they're the low-resolution character plots that 
Dennis dislikes, they do show the substantial curvature in the raw data 
and the nearly linear relationship of log(weight) with age.

MTB > regress c2 6 c3-c8;
SUBC> residuals c10.

The regression equation is
dry.wt = 0.701 + 0.235 linear + 0.0463 quadratc + 0.00743 cubic
           + 0.00460 quartic - 0.00163 quintic -0.000160 sextic

Predictor       Coef       Stdev    t-ratio        p
Constant    0.701273    0.008001      87.65    0.000
linear      0.235073    0.002530      92.91    0.000
quadratc   0.0463497   0.0009059      51.17    0.000
cubic      0.0074296   0.0004051      18.34    0.000
quartic     0.004598    0.001569       2.93    0.043
quintic    -0.001628    0.002124      -0.77    0.486
sextic    -0.0001604   0.0002505      -0.64    0.557

s = 0.02653     R-sq = 100.0%    R-sq(adj) = 99.9%

Since all predictors are mutually orthogonal, we see at once that the 
quintic and sextic terms are of no interest, and proceed (below) to a 
reduced model that only goes up to the quartic term.  The values of the 
coefficients do not change, of course;  their standard errors ("StDev"), 
t values, and p values change only because the error mean square is 
smaller in the reduced model.

Analysis of Variance

SOURCE       DF          SS          MS         F        p
Regression    6      8.1653      1.3609   1932.82    0.000
Error         4      0.0028      0.0007
Total        10      8.1681

SOURCE       DF      SEQ SS
linear        1      6.0785
quadratc      1      1.8432
cubic         1      0.2368
quartic       1      0.0060
quintic       1      0.0004
sextic        1      0.0003

MTB > regress c2 4 c3-c6;
SUBC> residuals c11.

The regression equation is
dry.wt = 0.701 + 0.235 linear + 0.0463 quadratc + 0.00743 cubic
           + 0.00460 quartic

Predictor       Coef       Stdev    t-ratio        p
Constant    0.701273    0.007302      96.04    0.000
linear      0.235073    0.002309     101.81    0.000
quadratc   0.0463497   0.0008267      56.06    0.000
cubic      0.0074296   0.0003697      20.09    0.000
quartic     0.004598    0.001432       3.21    0.018

s = 0.02422     R-sq = 100.0%    R-sq(adj) = 99.9%

Analysis of Variance

SOURCE       DF          SS          MS         F        p
Regression    4      8.1646      2.0411   3480.52    0.000
Error         6      0.0035      0.0006
Total        10      8.1681

SOURCE       DF      SEQ SS
linear        1      6.0785
quadratc      1      1.8432
cubic         1      0.2368
quartic       1      0.0060

MTB > regress c9 1 c1;
SUBC> residuals c12.

The regression equation is
log.wt = - 2.69 + 0.196 day

Predictor       Coef       Stdev    t-ratio        p
Constant    -2.68920     0.03055     -88.02    0.000
day         0.195881    0.002669      73.38    0.000

s = 0.02800     R-sq = 99.8%     R-sq(adj) = 99.8%

Analysis of Variance

SOURCE       DF          SS          MS         F        p
Regression    1      4.2206      4.2206   5384.94    0.000
Error         9      0.0071      0.0008
Total        10      4.2277

MTB > plot c2 c1   Raw data,  dry weight  vs  age in days

      3.0+
         -                                                       *
 dry.wt  -
         -
         -
      2.0+
         -                                                  *
         -
         -
         -                                             *
      1.0+
         -                                        *
         -
         -                                   *
         -                    *    *    *
      0.0+     *    *    *
         -
           ----+---------+---------+---------+---------+---------+--day     
             6.0       8.0      10.0      12.0      14.0      16.0

MTB > plot c9 c1     log(weight) vs age in days

         -
 log.wt  -                                                       *
         -                                                  *
         -
     0.00+                                             *
         -                                        *
         -
         -                                   *
         -                              *
    -0.70+                         *
         -                    *
         -
         -               *
         -          *
    -1.40+
         -     *
         -
           ----+---------+---------+---------+---------+---------+--day     
             6.0       8.0      10.0      12.0      14.0      16.0


I leave it to the interested reader to repeat the analysis and produce 
residual plots like those Bob Hayden supplied with his data.
                                                                -- Don.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  


===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to