Hi all

I am getting warning when I try to perform a bootstrap selection procedure on 
variables (using boot.stepAIC function in the bootStepAIC package). I had 
previously established which variables were collinear and kept the one which 
had the lowest AIC following univariate regression on each predictor. I obtain 
a candidate list of variables that are not correlated at the end of this 
procedure. I then revisit those variables that were excluded at each step using 
bootstrapping. I have referred to other list questions (such as 
http://stackoverflow.com/questions/8596160/why-am-i-getting-algorithm-did-not-converge-and-fitted-prob-numerically-0-or)
 and I see that this is a common problem with logit models, but convergence 
fails only in a bootstrapping context. 

For the first set of previously excluded variables, I added individually each 
variable to candidate list of variables and then performed bootstrapping, and 
then added more than one variable to see if the algorithm would converge. 
Sometimes it did other times not (as indicated by ‘dc’). I suppose the presence 
of multicollinearity affects this process?

>From the next group  on of excluded variables I only really considered adding 
>variables separately one at a time and then checked if there was an 
>improvement in AIC. If one of the previously excluded variables is in the 
>candidate list, then i take that variable out and add the previously excluded 
>one and see if there is an improvement in AIC. 

>From this reasoning I end up adding two new variables to the list. They are 
>not correlated with any of the variables in candidate list, nor are they 
>correlated.  

My question is, is this a valid way to come up with my best set of predictors? 
Is there a way I can monitor more closely what is going on, i.e. if 
multicollinearity in is mathematically causing the algorithm not to converge 
for some variables? 

Here is my workflow using the boot.stepAIC function in the the forward stepwise 
direction (the forward direction seems to be more robust w.r.t convergence):  

(if reproducible code is required I can happily provide it - via dropbox for 
the data)

#kept altitude (15454.23) (but not in candidate list) and excluded: 
# meanTemp (14422.72), minTemp (14435.72), bio1 (14767.88), bio6 (didn't 
converge (dc)), bio8 (15050.46), bio10 (14285.46), bio11 (14655.82), bio18 
(15445.24), bio10+bio11 (dc), # bio10 + bio11 + # bio18 (dc), bio10+bio18 (dc), 
meanTemp + bio10+bio11 (14160.33), minTemp + bio10+bio11 (14204.41), meanTemp + 
minTemp + bio10+bio11 (14135.49), meanTemp+minTemp + bio10+bio11+bio1(dc),
# bio10+bio11+bio1(dc)

fit.1 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, family="binomial")

bootGLM.1 <- boot.stepAIC(fit.1, spatialVars, direction = "backward", alpha = 
0.05, B = 1000)  #15445.24

# add bio10 (lowest AIC - 14285.46)
# (backward drection dc)


# kept bio2 and excluded: bio7 (dc), -bio2 + bio7 (14710.04), -bi02 - bio10 + 
bio7 (15676.62)
fit.2 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, 
family="binomial") #15302.59

bootGLM.2 <- boot.stepAIC(fit.2, spatialVars, direction = "forward", alpha = 
0.05, B = 1000)  

# keep bio2 in candidate list (+bio10)

# kept bio5 and excluded: altitude (dc), -bio5 + altitude (15659.26), -bio5 
+maxTemp (15637.91) 
fit.3 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, 
family="binomial") 

bootGLM.3 <- boot.stepAIC(fit.3, spatialVars, direction = "forward", alpha = 
0.05, B = 1000)  

# keep bio5 in candidate list (+bio10)

# kept bio17 (not in candidate list) (bio17 (14178.88)) and excluded: bio12 
(dc), bio14 (14168.77), bio16 (14287.42), bio19 (14248.65), rain (14287.45), 
bio17+bio12(14162.3)
fit.4 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10 + rain, weights = Examind, data = spatialVars, 
family="binomial") 

bootGLM.4 <- boot.stepAIC(fit.4, spatialVars, direction = "forward", alpha = 
0.05, B = 1000) 

# add bio14 to candiate list (+bio10)

# keptp bio15 (14168.77) (not included in candidate list) and excluded: bio17 
(14161.03)
fit.5 <- glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10 + bio14, weights = Examind, data = spatialVars, 
family="binomial") 

bootGLM.5 <- boot.stepAIC(fit.5, spatialVars, direction = "forward", alpha = 
0.05, B = 1000) 

Thanks very much (for any help, advice or thoughts)
Justin 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to