Mike,

Isn't this just an example of the wrong model giving a spurious
impression of precision? or more accurately, precision at the expense of
accuracy?

Here's a linear model example of the same thing...

set.seed(1)
n <- 400
x <- runif(n)-.5
y <- 2+ x*.2+ x^2 + rnorm(n)*.5
m1 <- lm(y~1)
m2 <- lm(y~x+I(x^2))
mean(predict(m1,se=TRUE)$se.fit)
#[1] 0.02641367
mean(predict(m2,se=TRUE)$se.fit)
#[1] 0.04363921

... so the wrong model (m1, a constant) gives much lower se than the
correct model (m2, a quadratic).

best,
Simon

On 17/02/12 14:57, Dunbar, Michael J. wrote:

I've got a small problem.

I have some observational data (environmental samples: abiotic
explanatory variable and biological response) to which I've fitted
both a multiple linear regression model and also a gam (mgcv) using
smooths for each term. The gam clearly fits far better than the lm
model based on AIC (difference in AIC ~ 8), in addition the adjusted
R squared for the gam is clearly better.

I then want to make some plots of predicted values from both models
with confidence intervals. So I can get my predictions usng
predict(my.model(....se.fit=TRUE))

My problem is that this results in the prediction se's (and
consequent CI's for the mean prediction) being much wider for the gam
than for the linear model. This seems rather counter-intuitive given
that the gam appears to fit better, and hence I will find it hard to
explain my choice of a gam model in a journal article, despite clear
non-linearity.

It's not so easy for me to post my own example. The following code
gives a flavour, clearly in this instance the gam will fit MUCH
better because it is the generating model. Even in this case, most
gam se's line above the 1:1 line. In my example, with some
observational data, the difference between linear and gam fit is not
so pronounced but the gam still clearly fits better than linear, but
all gam se's are WAY above their linear equivalents when used to
predict for representative new data in order to present results as
interaction plots.



cheers Mike

require(mgcv) require(MASS)

dat<- gamSim(1,n=200,dist="normal",scale=2) summary(b<-
gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)) summary(a<- lm(y~x0 + x1 +
x2 + x3,data=dat))

se.result<- data.frame(linear.se=predict(a, se.fit=TRUE)$se.fit,
gam.se=predict(b, se.fit=TRUE)$se.fit) with(se.result,
eqscplot(linear.se, gam.se)) abline(a=0, b=1)


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to