Dear R users, 

I recently stumbled upon problems of reproducibility while running GAM analyses 
in different R and gam package versions. In the example below, a small dataset 
is created in which the y and x1 variables are 100% correlated. The intents of 
this example were primarily for regression testing and, secondarily, to 
evaluate how the gam algorithm behaves under extreme/limit conditions. 

I ran this little snippet in 5 different environments and got 100% consistent 
results until I switched to R 3.3.2. 
* Comparing results from environments 1, 2, and 3 shows that changing the 
version of the gam package did not change the results under R 3.3.0. 
* Comparing results from environments 3 and 4 shows that changing the version 
of R altered the values of the AIC and the output of the step.gam call (changed 
to a NULL object) 
* Comparing results from environments 4 and 5 shows that reverting to an older 
version of the gam package in R 3.3.2 still produced altered AIC values and the 
NULL output from step.gam call 

Further investigations into these differences seem to show that the lm.wfit 
call in the gam.fit function (called from within gam and step.gam) may result 
in different values in R 3.3.0 and 3.3.2. 

So my questions are 2-fold: 
1- Would you have any information about why lm.wfit would produce different 
outcomes in R 3.3.0 and 3.3.2? The source code does not appear significantly 
different.
2- Is it expected that the step.gam function returns a NULL object in R 3.3.2 
while a valid model has been identified? Looking at the source of step.gam, the 
line 157 (if(is.null(form.list)) break) seems to be the reason the function 
breaks out and returns a NULL value. 

I thank you in advance for your time. 

Sebastien Bihorel 






library(gam) 

dat <- data.frame( 
y = 
c(57,57,98,83,122,69,108,86,80,87,75,76,97,101,121,111,105,84,65,54,61,71,125,60,50,112,102,110,77,45,93,62,120,115,70,113,117,85,46,123,89,95,116,55,110,92,109,100,72,88,105,119,94,45,67,58,60,45,107,73,100,79,47,99,51,53,68,125,90,48,82,85,65,52,70,59,125,49,118,103,91,124,78,81,63,63),
 
x1 = 
c(52,52,93,78,117,64,103,81,75,82,70,71,92,96,116,106,100,79,60,49,56,66,120,55,45,107,97,105,72,40,88,57,115,110,65,108,112,80,41,118,84,90,111,50,105,87,104,95,67,83,100,114,89,40,62,53,55,40,102,68,95,74,42,94,46,48,63,120,85,43,77,80,60,47,65,54,120,44,113,98,86,119,73,76,58,58),
 
x2 = 
c(0.0001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0001,0)
 
) 

summary(dat) 

scope <- list( 
x1=c('1','x1','ns(x1, df=2)'), 
x2=c('1','x2') 
) 

gam.object <- gam(y~1, data=dat) 
step.object <- step.gam(gam.object, scope=scope, trace=2) 
is.null(step.object) 

# Run this if you want to evaluate one lm.wfit example call made from within 
gam functions
x <- cbind(rep(1,nrow(dat)), dat$x1)
names(x) <- c('Intercept', 'x1')
z <- dat$y
w <- rep(1,nrow(dat))
str(eval(expression(lm.wfit(x, z, w, method = "qr", singular.ok = TRUE))))





###################################################################### 
#1 R 3.1.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#2: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.12 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#3: R 3.3.0 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5121.796 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5121.796 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5123.816 
Trial: y ~ x1 + x2 ; AIC= -5174.408 
Step:2 y ~ x1 + x2 ; AIC= -5174.408 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5164.829 
> is.null(step.object) 
[1] FALSE 

###################################################################### 
#4: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.14-4 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5122.886 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5122.886 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531 
Trial: y ~ x1 + x2 ; AIC= -5177.531 
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703 
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 
> is.null(step.object) 
[1] TRUE 

###################################################################### 
#5: R 3.3.2 (x86_64-redhat-linux-gnu (64-bit)) / gam 1.09.1 
###################################################################### 
> step.object <- step.gam(gam.object, scope=scope, trace=2) 
Start: y ~ 1; AIC= 797.0034 
Trial: y ~ x1 + 1 ; AIC= -5122.886 
Trial: y ~ 1 + x2 ; AIC= 796.915 
Step:1 y ~ x1 ; AIC= -5122.886 
Trial: y ~ ns(x1, df=2) + 1 ; AIC= -5177.531 
Trial: y ~ x1 + x2 ; AIC= -5177.531 
Step:2 y ~ ns(x1, df = 2) ; AIC= -5177.531 
Trial: y ~ ns(x1, df=2) + x2 ; AIC= -5222.703 
Step:3 y ~ ns(x1, df = 2) + x2 ; AIC= -5222.703 
> is.null(step.object) 
[1] TRUE

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to