Dear Øystein,

In your code 'id' is set up to be numeric. In your first gamm call it gets coerced to a factor (since nothing else would make sense). In  your second gamm call it is simply treated as numeric, since that could makes sense. To make the two models equivalent you just need to substitute:

id <- as.factor(rep(1:ng, n/ng))

into your loop.

best,
Simon

On 01/10/18 08:15, Øystein Sørensen wrote:
I have longitudinal data where a response y_{ij} has been measured
repeatedly for i=1,...,n individuals, j=1,...,m times for each individual.
Let x_{ij} be the age of individual i at her/his jth measurement. I wish to
see how the response varies with age, and have reason to believe that a
nonlinear fit is needed. I hence wish to model the relationship using an
additive mixed model of the form y_{ij} = f(x_{ij}) + b_{i} +
\epsilon_{ij}, where f() is some smooth function to be estimated, b_{i}
\sim N(0, \sigma_{b}^{2}) is the random effect for individual i,
\epsilon_{ij} \sim N(0, \sigma^{2}) is the residual.

Reading the documentation to the mgcv package, I see that such models can
be set up by calling the mgcv::gamm two different ways. One way shown to
set up such a model is with the call
b1 <- gamm(y ~ s(x), random = list(id =~ 1)),
where id is an indicator of the specific individual. In this way, the
random effect is specified in a list. The other way is to set up the random
effect with a smooth of the re:
b2 <- gamm(y ~ s(x) + s(id, bs = "re")).

As far as I can understand, these two setups should create the same
underlying model matrices. However, when running them on my data, the
p-values for the smooth terms, as well as their estimated degrees of
freedom, are different.

Below is a reproducible example on simulated data, which show that these
two types of specifying the models give different p-values and estimated
degrees of freedom. On my real data, these differences are sometimes even
more exaggerated.

My question is: Are not these two calls to gamm suppose to estimate the
same model, or have I misunderstood?

Here is a reproducible example:

library(mgcv)
set.seed(100)
n_sim <- 100
df <- data.frame(
   model1_pval = numeric(n_sim),
   model1_edf = numeric(n_sim),
   model2_pval = numeric(n_sim),
   model2_edf = numeric(n_sim)
)

# Number of observations
n <- 500
# Number of groups
ng <- 100

for(i in 1:n_sim){
   # Draw the fixed effect covariate
   x <- rnorm(n)
   # Set up the group
   id <- rep(1:ng, n/ng)
   # Draw the random effect
   z <- rnorm(ng)
   # Draw the response
   y <- 0.1 * x + 0.2 * x^3 + z[id] + rnorm(n)

   # Fit the two different models
   b1 <- gamm(y ~ s(x, k = 5), random = list(id =~ 1))
   b2 <- gamm(y ~ s(x, k = 5) + s(id, bs = "re"))

   df[i, "model1_pval"] <- anova(b1$gam)$s.pv
   df[i, "model1_edf"] <- anova(b1$gam)$edf
   df[i, "model2_pval"] <- anova(b2$gam)$s.pv[[1]]
   df[i, "model2_edf"] <- anova(b2$gam)$edf[[1]]
}

# Differences between model 1 and model 2 p values
mean(df$model1_pval)
#> [1] 6.790546e-21
mean(df$model2_pval)
#> [1] 3.090694e-14
max(abs(df$model1_pval - df$model2_pval))
#> [1] 2.770545e-12

# Differences between model1 and model 2 estimated degrees of freedom
mean(df$model1_edf)
#> [1] 3.829992
mean(df$model2_edf)
#> [1] 3.731438
max(abs(df$model1_edf - df$model2_edf))
#> [1] 0.6320281


Thanks in advance,
Øystein Sørensen

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to