On Fri, 30 Jun 2000, Bob Hayden wrote:
> Tom Moore asked...
>
> Does anyone know of a good example of cubic regression that you'd be
> willing to share?
and Bob replied with an example. Here's another; Bob, would you forward
it to Tom, as I don't have his address?
As I vaguely recall, I found this years ago in Snedecor and Cochran.
Data are dry weights (in grams) of chick embryos, of ages 6 to 16 days.
There is one observation per day, which may or may not (the source was
not explicit) be the average of several individuals. For this range
of ages, the data rather nicely fit an exponential growth curve, which
is of course superior theoretically to a polynomial function, but 'twas
useful for sorting out students who understood some things from those who
didn't. The data set below contains, besides the two original variables,
coefficients for orthogonal polynomials from linear through sextic.
I used to ask students, at the end of the intermediate course, to carry
out a regression analysis using the polynomials, and one regressing
log(weight) on age, choose which they preferred, and explain why.
Using linear through cubic predictors almost works, but has the
interesting defect of predicting a negative weight for day 6; since one
clearly doesn't want to stop at quadratic, the optimal polynomial is
quartic (quintic and sextic contributions being negligible). Some
students would prefer the polynomial function because the R-sq is larger
than for the exponential (although the R-sq values are not strictly
comparable, being based on different dependent variables); I gave full
credit for this if it was accompanied by a demonstration that the R-sq
was significantly larger.
To Dennis' concern about "R-sq = 100.0%", I point out that the
values in question are actually 99.966% (using all 6 orthogonal
polynomials), 99.957% (using linear through quartic), and 99.832% (using
log(weight) as the dependent variable).
**********************************************************************
MTB > print c1-c8
ROW day dry.wt linear quadratc cubic quartic quintic sextic
1 6 0.029 -5 15 -30 6 -3 15
2 7 0.052 -4 6 6 -6 6 -48
3 8 0.079 -3 -1 22 -6 1 29
4 9 0.125 -2 -6 23 -1 -4 36
5 10 0.181 -1 -9 14 4 -4 -12
6 11 0.261 0 -10 0 6 0 -40
7 12 0.425 1 -9 -14 4 4 -12
8 13 0.738 2 -6 -23 -1 4 36
9 14 1.130 3 -1 -22 -6 -1 29
10 15 1.882 4 6 -6 -6 -6 -48
11 16 2.812 5 15 30 6 3 15
MTB > let c9 = logten(c2)
MTB > name c9 'log.wt'
Plots of the raw data (dry.wt vs. day) and of log.wt vs. day are shown
at the end. Although they're the low-resolution character plots that
Dennis dislikes, they do show the substantial curvature in the raw data
and the nearly linear relationship of log(weight) with age.
MTB > regress c2 6 c3-c8;
SUBC> residuals c10.
The regression equation is
dry.wt = 0.701 + 0.235 linear + 0.0463 quadratc + 0.00743 cubic
+ 0.00460 quartic - 0.00163 quintic -0.000160 sextic
Predictor Coef Stdev t-ratio p
Constant 0.701273 0.008001 87.65 0.000
linear 0.235073 0.002530 92.91 0.000
quadratc 0.0463497 0.0009059 51.17 0.000
cubic 0.0074296 0.0004051 18.34 0.000
quartic 0.004598 0.001569 2.93 0.043
quintic -0.001628 0.002124 -0.77 0.486
sextic -0.0001604 0.0002505 -0.64 0.557
s = 0.02653 R-sq = 100.0% R-sq(adj) = 99.9%
Since all predictors are mutually orthogonal, we see at once that the
quintic and sextic terms are of no interest, and proceed (below) to a
reduced model that only goes up to the quartic term. The values of the
coefficients do not change, of course; their standard errors ("StDev"),
t values, and p values change only because the error mean square is
smaller in the reduced model.
Analysis of Variance
SOURCE DF SS MS F p
Regression 6 8.1653 1.3609 1932.82 0.000
Error 4 0.0028 0.0007
Total 10 8.1681
SOURCE DF SEQ SS
linear 1 6.0785
quadratc 1 1.8432
cubic 1 0.2368
quartic 1 0.0060
quintic 1 0.0004
sextic 1 0.0003
MTB > regress c2 4 c3-c6;
SUBC> residuals c11.
The regression equation is
dry.wt = 0.701 + 0.235 linear + 0.0463 quadratc + 0.00743 cubic
+ 0.00460 quartic
Predictor Coef Stdev t-ratio p
Constant 0.701273 0.007302 96.04 0.000
linear 0.235073 0.002309 101.81 0.000
quadratc 0.0463497 0.0008267 56.06 0.000
cubic 0.0074296 0.0003697 20.09 0.000
quartic 0.004598 0.001432 3.21 0.018
s = 0.02422 R-sq = 100.0% R-sq(adj) = 99.9%
Analysis of Variance
SOURCE DF SS MS F p
Regression 4 8.1646 2.0411 3480.52 0.000
Error 6 0.0035 0.0006
Total 10 8.1681
SOURCE DF SEQ SS
linear 1 6.0785
quadratc 1 1.8432
cubic 1 0.2368
quartic 1 0.0060
MTB > regress c9 1 c1;
SUBC> residuals c12.
The regression equation is
log.wt = - 2.69 + 0.196 day
Predictor Coef Stdev t-ratio p
Constant -2.68920 0.03055 -88.02 0.000
day 0.195881 0.002669 73.38 0.000
s = 0.02800 R-sq = 99.8% R-sq(adj) = 99.8%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 4.2206 4.2206 5384.94 0.000
Error 9 0.0071 0.0008
Total 10 4.2277
MTB > plot c2 c1 Raw data, dry weight vs age in days
3.0+
- *
dry.wt -
-
-
2.0+
- *
-
-
- *
1.0+
- *
-
- *
- * * *
0.0+ * * *
-
----+---------+---------+---------+---------+---------+--day
6.0 8.0 10.0 12.0 14.0 16.0
MTB > plot c9 c1 log(weight) vs age in days
-
log.wt - *
- *
-
0.00+ *
- *
-
- *
- *
-0.70+ *
- *
-
- *
- *
-1.40+
- *
-
----+---------+---------+---------+---------+---------+--day
6.0 8.0 10.0 12.0 14.0 16.0
I leave it to the interested reader to repeat the analysis and produce
residual plots like those Bob Hayden supplied with his data.
-- Don.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================