Dear Bert, Thanks for helping.
Your questions 'answers' why I get the expected behavior if 'group' is a factor. My question was why I don't get the expected behavior if 'group' is not a factor. >From a theoretical (non-programming) point of view, there is no difference in a factor with two levels and a 0-1 (bool/integer) variable (in my case the 1-2 variable 'group'). gam() interprets these inputs differently, thus distinguishes these cases (and I was wondering why; In my opinion, this is a purely R/mgcv related question and belongs here). As it turned out, the problem was merely the following: By using factors and thus specifying a GAM, the intercept was 'hidden' in the estimated coefficients. When using integers as group variables, this is a glm and there one needs the intercept. The examples below provide the details. With best wishes, Marius require(mgcv) n <- 10 yrs <- 2000+seq_len(n) loss <- c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n)) ## Version 1: gam() with 'group' as factor ##################################### set.seed(271) dat <- data.frame(year = rep(yrs, 2), group = as.factor(rep(1:2, each=n)), # could also be "A", "B" resp = loss) fit1 <- glm(resp ~ year + group - 1, data=dat) plot(yrs, fit1$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp), xlab="Year", ylab="Response") # fit group A; mean over all responses in this group lines (yrs, fit1$fitted.values[n+seq_len(n)], col="blue") # fit group B; mean over all responses in this group points(yrs, dat$resp[seq_len(n)]) # actual response group A points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B ## Version 2: gam() with 'group' as numeric (=> glm) ########################### set.seed(271) dat <- data.frame(year = rep(yrs, 2), group = rep(1:2, each=n), # could also be 0:1 resp = loss) fit2 <- glm(resp ~ year + group - 1, data=dat) # (*) plot(yrs, fit2$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp), xlab="Year", ylab="Response") # fit group A; mean over all responses in this group lines (yrs, fit2$fitted.values[n+seq_len(n)], col="blue") # fit group B; mean over all responses in this group points(yrs, dat$resp[seq_len(n)]) # actual response group A points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B ## Note: without '-1' (intercept) in (*), an unexpected behavior results ## Explanation: ## S. Wiki GAM (without beta_0): ## g(E(Y)) = f_1(x_1) + f_2(x_2) ## where f_i(x_i) may be functions with a specified parametric form (for example a ## polynomial, or a coefficient depending on the levels of a factor variable) ## => for f_i's being coefficients (numbers) beta_i, this is a GLM: ## g(E(Y)) = beta_1 x_1 + beta_2 x_2 (x_1 = year, x_2 = group) ## Problem: (*) does not specify an intercept and thus the lines are not picked up correctly fit2$coefficients ## Version 3: Version 2 with intercept ######################################### set.seed(271) dat <- data.frame(year = rep(yrs, 2), group = rep(1:2, each=n), # could also be 0:1 resp = loss) fit3 <- glm(resp ~ year + group, data=dat) # now with intercept plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp), xlab="Year", ylab="Response") # fit group A; mean over all responses in this group lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group B; mean over all responses in this group points(yrs, dat$resp[seq_len(n)]) # actual response group A points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B ## => correct/as expected fit3$coefficients ## Note: in Version 1, the intercept is already included in the group coefficients: fit1$coefficients On Tue, Oct 29, 2013 at 9:31 PM, Bert Gunter <gunter.ber...@gene.com> wrote: > Think about it. How can one define a smooth term with a factor??? > > Further discussion is probably offtopic. Post on > stats.stackexchange.com if it still isn't obvious. > > Cheers, > Bert ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.