Hi L.Y, Thank you for your advice.
Are you talking about Trevor Hastie's gam()? I did not see anywhere from the result that it has an automatic Cross Validation? I also could not verify that the gam() function will automatically find the degree-of-freedom if I don't specify the df, and just use tems such as s(col1) + s(col2) ... Does the "step()" function also include the gam() with CV and auto-tweaking for df? I wondered if I have called "step()" correctly, because it looks to me that it only run at a very short time(1second), and immediately returned two models, in fact has even larger residual deviance than the model I have provided to it initially... (obviously I've included every possibilities in the initial model, and rely on the step() function to cut off some terms for me...) Thanks a lot! On 3/16/06, Dr L. Y Hin <[EMAIL PROTECTED]> wrote: > > The engine of gam() lies in a function called smooth.spline() that is > found > in the > library splines. If you leave out specifying the degree of freedom in the > formulary determination, > it will automatically specify it for you via cross-validation. The results > of model fit obtainable via > summary(mygam) will show you the "degree of freedom as choosen by the > cross-validation method". > On a more philosophical plane, Buja et al. (Ann Stat. 1989;17(2):453-510) > pointed out that the fact > that linear smoothers such as cubic splines and smoothing splines are > linear > lies in the fact that > they are x-dependent and not y-dependent. By using cross-validation, you > will invariably involve the > use of y, which renders the determination of degree of freedom > y-dependent, > hence the smoothing > parameter \lambda y-dependent, and for such a case, the smoothing matrix, > strictly speaking, > non-linear becasue S= (I + \lambda * K)^-1 in the non weighted form with > unique x-points. > > If you increase the degree of freedom, the \lambda decreases, to a point > where you will efffectively > have a straightforward interpolation of points on the graph. Conversely, > if > \lambda is increased, > the smoothing line reduces to a linear regression line through all the > points. > > In my opinion, AIC and Residual sum of squares are competing tools looking > for the best fit. > The minimum of AIC and that of RSS may not concur. If you believe in AIC, > then I would assume > you also believe that it is a better tool than RSS in that the former uses > an information theoretic > approach, which is not sensitive to offset in accuracy due to penalization > of outliers. Following that, > I would disregard RSS and go according to what AIC tells me. > > I don't think you have used step.gam incorrectly, but I think you have > been > observant enough to > realize not all statistical tools agree all the times :) > > Lin > > ----- Original Message ----- > From: "Michael" <[EMAIL PROTECTED]> > To: <R-help@stat.math.ethz.ch> > Sent: Thursday, March 16, 2006 5:30 PM > Subject: [R] Did I use "step" function correctly? (Is R's step() > functionreliable?) > > > > Hi all, > > > > I put up an exhaustive model to use R's "step" function: > > > > ------------------------ > > > > mygam=gam(col1 ~ 1 > > + col2 + col3 + col4 > > + col2 ^ 2 + col3 ^ 2 + col4 ^ 2 > > + col2 ^ 3 + col3 ^ 3 + col4 ^ 3 > > + s(col2, 1) + s(col3, 1) + s(col4, 1) > > + s(col2, 2) + s(col3, 2) + s(col4, 2) > > + s(col2, 3) + s(col3, 3) + s(col4, 3) > > + s(col2, 4) + s(col3, 4) + s(col4, 4) > > + s(col2, 5) + s(col3, 5) + s(col4, 5) > > + s(col2, 6) + s(col3, 6) + s(col4, 6) > > + s(col2, 7) + s(col3, 7) + s(col4, 7) > > + s(col2, 8) + s(col3, 8) + s(col4, 8) > > + s(col2, 9) + s(col3, 9) + s(col4, 9), > > data=X); > > > > mystep=step(mygam); > > > > --------------------- > > After a long list, the following are two lowest AIC: > > > > Step: AIC= 152.1 > > col1 ~ col2 + col3 + col4 + s(col2, 3) + s(col3, 3) + s(col4, 3) > > > > > > Step: AIC= 153.45 > > col1 ~ col2 + col3 + col4 + s(col2, 3) + s(col3, 3) > > ----------------------------------------------- > > > > However, the lowest AIC model, " col1 ~ col2 + col3 + col4 + s(col2, 3) > + > > s(col3, 3) + s(col4, 3)" does not give the best Residual Deviance. > > > > Instead, the model "mygam3=gam(col1 ~ s(col2, 6) + s(col3, 6) + s(col4, > > 6), > > data=X)" is the best, in fact, > > > > I found that as I increase the "degree-of-freedom", it always give > better > > residual deviance, lower than that of the "best" model returned by > "step" > > function... Please see below. > > > > I am wondering if I need to increase "degree-of-freedom" all the way > up... > > Perhaps to avoid overfitting, I should do a cross validation. Is there > an > > automatic Cross Validation inside "step" or "gam"? > > > > Is "step" function result reliable? Or perhaps I used it incorrectly? > > > > Thanks a lot, > > > > Michael. > > > > -------------------------- > > > >> > >> mygam1=gam(col1 ~ col2 + col3 + col4 + s(col2, 3) + s(col3, 3) + > s(col4, > > 3), data=X); > >> > >> mygam2=gam(col1 ~ col2 + col3 + col4 , data=X); > >> > >> mygam3=gam(col1 ~ s(col2, 6) + s(col3, 6) + s(col4, 6), data=X); > >> > >> mygam1 > > Call: > > gam(formula = col1 ~ col2 + col3 + col4 + > > s(col2, 3) + s(col3, 3) + s(col4, 3), data = X) > > > > Degrees of Freedom: 110 total; 100.9999 Residual > > Residual Deviance: 20.98365 > >> mygam2 > > Call: > > gam(formula = col1 ~ col2 + col3 + col4, data = X) > > > > Degrees of Freedom: 110 total; 107 Residual > > Residual Deviance: 27.84808 > >> mygam3 > > Call: > > gam(formula = col1 ~ s(col2, 6) + s(col3, 6) + > > s(col4, 6), data = X) > > > > Degrees of Freedom: 110 total; 91.99957 Residual > > Residual Deviance: 18.45776 > >> > >> anova(mygam1, mygam2, mygam3); > > Analysis of Deviance Table > > > > Model 1: col1 ~ col2 + col3 + col4 + s(col2, > > 3) + s(col3, 3) + s(col4, 3) > > Model 2: col1 ~ col2 + col3 + col4 > > Model 3: col1 ~ s(col2, 6) + s(col3, 6) + s(col4, 6) > > Resid. Df Resid. Dev Df Deviance P(>|Chi|) > > 1 100.9999 20.9836 > > 2 107.0000 27.8481 -6.0001 -6.8644 6.115e-06 > > 3 91.9996 18.4578 15.0004 9.3903 3.958e-05 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html