I am analyzing some data that came from demographic health surveys. The data contain information for individuals within households, that are located within clusters, that are located within survey years, that are located within countries. We are trying to find the best model from a subset of predictors, and all models must contain the random variable of household within cluster within year within country.
We are running models on a server with 64GB memory and 6 CPU cores Data: Please email me if you are willing to look at this for us and I will send you the data via file transfer. It's a big file and I don't want to post it online just now. First, we tested a simple linear mixed model using the lmer package: ########## Install Packages ########## library(lme4) library(glmulti) ########## Clear all memory/objects ########## rm(list=ls()) ########## Read in Data ########## mydata = read.csv("kr.and.GIS.cleaned.Residents.only.csv") ################################################################### ########## Try lmer model NO Interactions ########## ################################################################### ptm <- proc.time() lmer.model = lmer(stunt.dhs ~ dis_ed_des+ tc_pa+ avg_clu_tc+ dist_road+ pden_lscan+ URBAN_RURA+ time.water+ wealth.index + (1|country.code.short/year/cluster/household), mydata, REML = F) print(lmer.model) proc.time() - ptm This works fine, runs in 12 minutes, and gives the following output: Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: stunt.dhs ~ dis_ed_des + tc_pa + avg_clu_tc + dist_road + pden_lscan + URBAN_RURA + time.water + wealth.index + (1 | country.code.short/year/cluster/household) Data: mydata AIC BIC logLik deviance 218267.5 218420.4 -109116.7 218233.5 Random effects: Groups Name Std.Dev. household:(cluster:(year:country.code.short)) (Intercept) 0.5584 cluster:(year:country.code.short) (Intercept) 0.3727 year:country.code.short (Intercept) 0.0000 country.code.short (Intercept) 0.3028 Residual 1.3819 Number of obs: 59405, groups: household:(cluster:(year:country.code.short)), 40703; cluster:(year:country.code.short), 7436; year:country.code.short, 27; country.code.short, 21 Fixed Effects: (Intercept) dis_ed_des tc_pa avg_clu_tc -1.324e+00 1.348e-07 -7.806e-04 -5.180e-04 dist_road pden_lscan URBAN_RURAU time.water -1.435e-06 9.003e-06 6.637e-02 6.097e-05 wealth.indexpoorer wealth.indexpoorest wealth.indexricher wealth.indexrichest -7.814e-02 -1.794e-01 1.469e-01 4.390e-01 > > proc.time() - ptm user system elapsed 727.842 13.781 741.053 Next, we tried to do model selection using the genetic algorithm of glmulti. We eventually want to include more predictors than in this current model AND we want to include interactions. This could lead to over one billion candidate models. For this reason, we want to eventually use the genetic algorithm and NOT all possible subsets. However, as a first step, we will use this smaller list of predictors and level = 1 so that no interactions are considered. First, we create the lmer wrapper and then just run the model to see how many candidate models we would have with this situation ################################################################### ########## Create lmer wrapper to paste random variable to all models ###### ################################################################### lmer.glmulti <- function (formula, data, random = "", ...) { lmer(paste(deparse(formula), random), data = data, REML=F, ...) } ################################################################## ########## Find canditate number of models NO interactions ########## ################################################################### candidate <- glmulti(stunt.dhs ~ dis_ed_des* tc_pa* avg_clu_tc* dist_road* pden_lscan* URBAN_RURA* time.water* wealth.index, data=mydata, level = 1, fitfunc = lmer.glmulti, random = "+(1|country.code.short/year/cluster/household)", method = "d") With this model, the same that we used for the lmer function, we get the following error. We also get this error if we try to run the same code but with method = "g", and confsetsize=5) Initialization... Error in factor(household:(cluster:(year:country.code.short))) : negative length vectors are not allowed If, however, we delete the household part of the random factor, the function works: candidate <- glmulti(stunt.dhs ~ dis_ed_des* tc_pa* avg_clu_tc* dist_road* pden_lscan* URBAN_RURA* time.water* wealth.index, data=mydata, level = 1, fitfunc = lmer.glmulti, random = "+(1|country.code.short/year/cluster)", method = "d") Gives: Initialization... TASK: Diagnostic of candidate set. Sample size: 59405 2 factor(s). 6 covariate(s). 0 f exclusion(s). 0 c exclusion(s). 0 f:f exclusion(s). 0 c:c exclusion(s). 0 f:c exclusion(s). Size constraints: min = 0 max = -1 Complexity constraints: min = 0 max = -1 Your candidate set contains 256 models. And running the following code seems to be working (slowly): ################################################################### ########## Run models with NO Interactions using genetic algorithm ##### ################################################################### ptm <- proc.time() model <- glmulti(stunt.dhs ~ dis_ed_des* tc_pa* avg_clu_tc* dist_road* pden_lscan* URBAN_RURA* time.water* wealth.index, data=mydata, level = 1, fitfunc = lmer.glmulti, random = "+(1|country.code.short/year/cluster)", crit = bic, imm = 0.5, confsetsize = 5) CAN ANYONE EXPLAIN WHY WE GET THE NEGATIVE LENGTH VECTORS ERROR IF WE INCLUDE HOUSEHOLD NESTED WITHIN THE OTHER RANDOM FACTORS FOR GLMULTI BUT NOT FOR LMER? ANY SUGGESTIONS FOR WHAT WE CAN DO TO DIAGNOSE OR FIX THE PROBLEM? Thanks! -- Alicia Ellis Postdoc Gund Institute for Ecological Economics University of Vermont 617 Main Street Burlington, VT 05405 (802) 656-1046 http://www.wcs-heal.org http://www.uvm.edu/~aellis5 <http://entomology.ucdavis.edu/faculty/scott/aellis/> -- Alicia Ellis Postdoc Gund Institute for Ecological Economics University of Vermont 617 Main Street Burlington, VT 05405 (802) 656-1046 http://www.wcs-heal.org http://www.uvm.edu/~aellis5/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.