Mike, Isn't this just an example of the wrong model giving a spurious impression of precision? or more accurately, precision at the expense of accuracy?
Here's a linear model example of the same thing... set.seed(1) n <- 400 x <- runif(n)-.5 y <- 2+ x*.2+ x^2 + rnorm(n)*.5 m1 <- lm(y~1) m2 <- lm(y~x+I(x^2)) mean(predict(m1,se=TRUE)$se.fit) #[1] 0.02641367 mean(predict(m2,se=TRUE)$se.fit) #[1] 0.04363921 ... so the wrong model (m1, a constant) gives much lower se than the correct model (m2, a quadratic). best, Simon On 17/02/12 14:57, Dunbar, Michael J. wrote:
I've got a small problem. I have some observational data (environmental samples: abiotic explanatory variable and biological response) to which I've fitted both a multiple linear regression model and also a gam (mgcv) using smooths for each term. The gam clearly fits far better than the lm model based on AIC (difference in AIC ~ 8), in addition the adjusted R squared for the gam is clearly better. I then want to make some plots of predicted values from both models with confidence intervals. So I can get my predictions usng predict(my.model(....se.fit=TRUE)) My problem is that this results in the prediction se's (and consequent CI's for the mean prediction) being much wider for the gam than for the linear model. This seems rather counter-intuitive given that the gam appears to fit better, and hence I will find it hard to explain my choice of a gam model in a journal article, despite clear non-linearity. It's not so easy for me to post my own example. The following code gives a flavour, clearly in this instance the gam will fit MUCH better because it is the generating model. Even in this case, most gam se's line above the 1:1 line. In my example, with some observational data, the difference between linear and gam fit is not so pronounced but the gam still clearly fits better than linear, but all gam se's are WAY above their linear equivalents when used to predict for representative new data in order to present results as interaction plots. cheers Mike require(mgcv) require(MASS) dat<- gamSim(1,n=200,dist="normal",scale=2) summary(b<- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)) summary(a<- lm(y~x0 + x1 + x2 + x3,data=dat)) se.result<- data.frame(linear.se=predict(a, se.fit=TRUE)$se.fit, gam.se=predict(b, se.fit=TRUE)$se.fit) with(se.result, eqscplot(linear.se, gam.se)) abline(a=0, b=1)
-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.