Re: [R] predicted values in mgcv gam

2006-03-08 Thread Simon Wood
Hi Denis,

Your first plot is of f(x) against x where \sum_i f(x_i) = 0 (x_i are
observed x's).

Your second plot is of \exp(\alpha + f(x)) against x where
\sum_i f(x_i)=0, and \alpha is an intercept parameter.

So the zero line on the first plot, corresponds to the \exp(\alpha) line
on the second plot (which is not the same as the mean of the response
data).

If you replace
> abline(h=mean.y,lty=5,col=grey(0.35))
by
> abline(h=coef(gam2)[1],lty=5,col=grey(0.35))
then everything should work.

best,
Simon

On Sun, 5 Mar 2006, Denis Chabot wrote:

> Hi,
>
> In fitting GAMs to assess environmental preferences, I use the part
> of the fit where the lower confidence interval is above zero as my
> criterion for positive association between the environmental variable
> and species abundance. However I like to plot this on the original
> scale of species abundance. To do so I extract the fit and SE using
> predict.gam.
>
> Lately I compared more carefully the plots I obtain in this way and
> those obtained with plot.gam and noticed differences which I do not
> understand.
>
> To avoid sending a large dataset I took an example from gam Help to
> illustrate this.
>
> Was I wrong to believe that the fit and its confidence band should
> behave the same way on both scales?
>
> Thanks in advance,
>
> Denis Chabot
> ###
> library(mgcv)
> set.seed(0)
> n<-400
> sig<-2
> x0 <- runif(n, 0, 1)
> x1 <- runif(n, 0, 1)
> x2 <- runif(n, 0, 1)
> x3 <- runif(n, 0, 1)
> f0 <- function(x) 2 * sin(pi * x)
> f1 <- function(x) exp(2 * x)
> f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
> f <- f0(x0) + f1(x1) + f2(x2)
> g<-exp(f/4)
> y<-rpois(rep(1,n),g)
> mean.y <- mean(y)
>
> gam2 <- gam(y~ s(x2), poisson)
>
> # to plot on the response scale
> val.for.pred <- data.frame(x2=seq(min(x2), max(x2), length.out=100))
> pred.2.resp <- predict.gam(gam2, val.for.pred ,type="response",
> se.fit=TRUE)
> lower.band <- pred.2.resp$fit - 2*pred.2.resp$se.fit
> upper.band <- pred.2.resp$fit + 2*pred.2.resp$se.fit
> pred.2.resp <- data.frame(val.for.pred, pred.2.resp, lower.band,
> upper.band)
>
> # same thing on term scale
> pred.2.term <- predict.gam(gam2, val.for.pred ,type="terms",
> se.fit=TRUE)
> lower.band <- pred.2.term$fit - 2*pred.2.term$se.fit
> upper.band <- pred.2.term$fit + 2*pred.2.term$se.fit
> pred.2.term <- data.frame(val.for.pred, pred.2.term, lower.band,
> upper.band)
>
> # it is easier to compare two plots instead of looking at these two
> data.frames
>
> plot(gam2, residuals=T, pch=1, cex=0.7)
> abline(h=0)
>
> plot(y~x2, col=grey(0.5))
> lines(fit~x2, col="blue", data=pred.2.resp)
> lines(lower.band~x2, col="red", lty=2, data=pred.2.resp)
> lines(upper.band~x2, col="red", lty=2, data=pred.2.resp)
> abline(h=mean.y,lty=5,col=grey(0.35))
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] predicted values in mgcv gam

2006-03-05 Thread Denis Chabot
Hi,

In fitting GAMs to assess environmental preferences, I use the part  
of the fit where the lower confidence interval is above zero as my  
criterion for positive association between the environmental variable  
and species abundance. However I like to plot this on the original  
scale of species abundance. To do so I extract the fit and SE using  
predict.gam.

Lately I compared more carefully the plots I obtain in this way and  
those obtained with plot.gam and noticed differences which I do not  
understand.

To avoid sending a large dataset I took an example from gam Help to  
illustrate this.

Was I wrong to believe that the fit and its confidence band should  
behave the same way on both scales?

Thanks in advance,

Denis Chabot
###
library(mgcv)
set.seed(0)
n<-400
sig<-2
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f <- f0(x0) + f1(x1) + f2(x2)
g<-exp(f/4)
y<-rpois(rep(1,n),g)
mean.y <- mean(y)

gam2 <- gam(y~ s(x2), poisson)

# to plot on the response scale
val.for.pred <- data.frame(x2=seq(min(x2), max(x2), length.out=100))
pred.2.resp <- predict.gam(gam2, val.for.pred ,type="response",  
se.fit=TRUE)
lower.band <- pred.2.resp$fit - 2*pred.2.resp$se.fit
upper.band <- pred.2.resp$fit + 2*pred.2.resp$se.fit
pred.2.resp <- data.frame(val.for.pred, pred.2.resp, lower.band,  
upper.band)

# same thing on term scale
pred.2.term <- predict.gam(gam2, val.for.pred ,type="terms",  
se.fit=TRUE)
lower.band <- pred.2.term$fit - 2*pred.2.term$se.fit
upper.band <- pred.2.term$fit + 2*pred.2.term$se.fit
pred.2.term <- data.frame(val.for.pred, pred.2.term, lower.band,  
upper.band)

# it is easier to compare two plots instead of looking at these two  
data.frames

plot(gam2, residuals=T, pch=1, cex=0.7)
abline(h=0)

plot(y~x2, col=grey(0.5))
lines(fit~x2, col="blue", data=pred.2.resp)
lines(lower.band~x2, col="red", lty=2, data=pred.2.resp)
lines(upper.band~x2, col="red", lty=2, data=pred.2.resp)
abline(h=mean.y,lty=5,col=grey(0.35))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html