Dear Peter, Ah, I see your point, Professor. The point at x=23.5 is carrying the model. Allow me to clarify. I was making a similar point.
I was suggesting that the cube term could be left in the model, as you did, but rather than dropping the data-point, another model is fit with an additional parameter added to control for the suspected outlier - essentially a point discontinuity. Note that the call to poly() is not necessary for the graphical representation, so I'll continue the example without it. fm3<-lm(y~x+I(x^2)+I(x^3), data=dat) fm3.c<-coef(fm3) # drop data point alternative subseting using logical indexes. index<-dat$x!=23.5 x2<-x[index] x2<-dat$x[index] y2<-dat$y[index] fm4<-lm(y2~x2+I(x2^2)+I(x2^3), data=dat) # controlling for the potential outlier in the model, without throwing out the data. ctrl<-rep(0,length=dim(dat)[1]) ctrl[dat$x==23.5]<-resid(fm3)[[dat$x==23.5] fm5<-lm(y~x+I(x^2)+I(x^3)+ctrl, data=dat) # avoids the predict & lines calls, but requires knowledge of the model. curve(fm3.c[1]+fm3.c[2]*x+fm3.c[3]*x^2+fm3.c[4]*x^3, col="green", add=TRUE) curve(fm4.c[1]+fm4.c[2]*x+fm4.c[3]*x^2+fm4.c[4]*x^3, col="green", add=TRUE) # plot dropped outlier curve(fm5.c[1]+fm5.c[2]*x+fm5.c[3]*x^2+fm5.c[4]*x^3, col="purple", add=TRUE) # plot without ctrl variable anova(fm5) Tells us how much difference one point made, sacrificing a df just to ensure we're not throwing out information haphazardly. F>1 or whatever cutoff you want to use. If it turns out that the point is important, then at least its existence & effect was documented. There is some irony, I suppose, that the graphical representation of dropping the point vs. adding a control variable, is equivalent for this example. I have not decided how I feel about that yet, but I do have a splitting headache. Sincerely, KeithC. -----Original Message----- From: Peter Ehlers [mailto:ehl...@ucalgary.ca] Sent: Sunday, February 14, 2010 10:04 PM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Plot different regression models on one graph kMan wrote: > Peter wrote: >> You like to live dangerously. > > Clue me in, Professor. > > Sincerely, > KeithC. > Okay, Keith, here goes: dat <- structure(list(x = c(62.5, 68.5, 0, 52, 0, 52, 0, 52, 23.5, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), y = c(0.054, 0.055, 0.017, 0.021, 0.02, 0.028, 0.032, 0.073, 0.076, 0.087, 0.042, 0.042, 0.041, 0.045, 0.021, 0.018, 0.017, 0.018, 0.028, 0.022)), .Names = c("x", "y"), row.names = c(NA, -20L), class = "data.frame") fm1 <- lm(y ~ poly(x,3), data = dat) fm2 <- lm(y ~ poly(x,3), data = dat, subset = -9) xx <- 0:86 yhat1 <- predict(fm1, list(x = xx)) yhat2 <- predict(fm2, list(x = xx)) plot(x,y) lines(xx, yhat1, col="blue", lwd=2) lines(xx, yhat2, col="red", lwd=2) That's how much difference *one* point makes in a cubic fit! I'm not much of a gambler, so I wouldn't base any important decisions on the results of the fit. Cheers, -Peter Ehlers > -----Original Message----- > From: Peter Ehlers [mailto:ehl...@ucalgary.ca] > Sent: Sunday, February 14, 2010 6:49 PM > To: kMan > Cc: 'David Winsemius'; 'Rhonda Reidy'; r-help@r-project.org > Subject: Re: [R] Plot different regression models on one graph > > kMan wrote: >> I would use all of the data. If you want to "drop" one, control for >> it in the model & sacrifice a degree of freedom. > > You like to live dangerously. > > -Peter Ehlers > >> Why the call to poly() by the way? >> >> KeithC. >> >> -----Original Message----- >> From: Peter Ehlers [mailto:ehl...@ucalgary.ca] >> Sent: Saturday, February 13, 2010 1:35 PM >> To: David Winsemius >> Cc: Rhonda Reidy; r-help@r-project.org >> Subject: Re: [R] Plot different regression models on one graph >> >> Rhonda: >> >> As David points out, a cubic fit is rather speculative. I think that >> one needs to distinguish two situations: 1) theoretical justification >> for a cubic model is available, or 2) you're dredging the data for the "best" > fit. >> Your case is the second. That puts you in the realm of EDA >> (exploratory > data >> analysis). You're free to fit any model you wish, but you should >> assess > the >> value of the model. One convenient way is to use the >> influence.measures() function, which will show that points 9 and 10 >> in your data have a strong influence on your cubic fit (as, of >> course, your eyes would tell you). A good thing to do at this point is to fit 3 more cubic models: >> 1) omitting point 9, 2) omitting point 10, 3) omitting both. >> >> Try it and plot the resulting fits. You may be surprised. >> >> Conclusion: Without more data, you can conclude nothing about a >> non-straightline fit. >> >> (And this hasn't even addressed the relative abundance of x=0 data.) >> >> -Peter Ehlers >> >> David Winsemius wrote: >>> On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote: >>> >>>> The following variables have the following significant >>>> relationships (x is the explanatory variable): linear, cubic, exponential, logistic. >>>> The linear relationship plots without any trouble. >>>> >>>> Cubic is the 'best' model, but it is not plotting as a smooth curve >>>> using the following code: >>>> >>>> cubic.lm<- lm(y~poly(x,3)) >>> Try: >>> >>> lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2) >>> >>> But I really must say the superiority of that relationship over a >>> linear one is far from convincing to my eyes. Seems to be caused by >>> one data point. I hope the quotes around "best" mean tha tyou have >>> the >> same qualms. >>>> lines(x,predict(cubic.lm),lwd=2) >>>> >>>> How do I plot the data and the estimated curves for all of these >>>> regression models in the same plot? >>>> >>>> x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0) >>>> >>>> y <- >>>> c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042 >>>> ,0 >>>> .042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022) >>>> >>>> >>>> Thanks in advance. >>>> >>>> Rhonda Reidy >>>> >> -- >> Peter Ehlers >> University of Calgary >> >> >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.