Re: [R] Non linear regression - Von Bertalanffy Growth Function - "singular gradient matrix at initial parameter estimates"
Thank you for the tip. Indeed, nlxb in nlmrt works and results are not crazy. I would like however to assess goodness-of-fit (gof) and ultimately to compare it with gof from linear regression (fitted with same variables). Before I used AICc to compare the nls() and lm() fit, however I get now an error message concerning the method loglike and its non compatibility with nlmrt class object. I guess it is because we use now Marquardt method to minimise sum-of square instead of Gauss-Newton? I am right? Or this is just an incompatibility coming between AICc function and nlmrt objects? Is there an R function to do that? Best, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 19/08/2015 15:11, ProfJCNash a écrit : Packages nlmrt or minpack.lm use a Marquardt method. minpack.lm won't proceed if the Jacobian singularity is at the starting point as far as I'm aware, but nlxb in nlmrt can sometimes get going. It has a policy that is aggressive in trying to improve the sum of squares, so will use more effort than nls when both work. JN On 15-08-18 12:08 PM, Xochitl CORMON wrote: Dear all, I am trying to estimate VBGF parameters K and Linf using non linear regression and nls(). First I used a classic approach where I estimate both parameters together as below with "alkdyr" being a subset per year of my age-length-key database and running in a loop. vbgf.par <- nls(Lgtcm ~ Linf *(1 - exp(-K * (Age - tzero))), start = c(K= 0.07, Linf = 177.1), data=alkdyr) I obtain an estimation of both parameters that are strongly correlated. Indeed after plotting Linf ~ K and fitting a linear regression I obtain a function (Linf = a + b*K) with R2= 0.8 and a = 215, b = -763. In this context, to take into account explicitly correlation between parameters, I decided to fit a new non linear regression derivate from VBGF but where Linf is expressed depending on K (I am most interested in K). To do so, I tried this model: vbgf.par <- nls(Lgtcm ~ (a + (b*k)) *(1 - exp(-k * (Age - tzero))), start = c(k= 0.07, a= 215, b=-763), data=alkdyr) Unfortunately at this point I cannot go further as I get the error message "singular gradient matrix at initial parameter estimates". I tried to use alg= plinear (which I am not sure I understand properly yet). If I give a starting value for a and b only, I have an error message stating "step factor below minFactor" (even when minFactor is set to 1000). Any help will be more than welcome as this is quite urgent Best, Xochitl C. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non linear regression - Von Bertalanffy Growth Function - "singular gradient matrix at initial parameter estimates"
Dear all, I am trying to estimate VBGF parameters K and Linf using non linear regression and nls(). First I used a classic approach where I estimate both parameters together as below with "alkdyr" being a subset per year of my age-length-key database and running in a loop. vbgf.par <- nls(Lgtcm ~ Linf *(1 - exp(-K * (Age - tzero))), start = c(K= 0.07, Linf = 177.1), data=alkdyr) I obtain an estimation of both parameters that are strongly correlated. Indeed after plotting Linf ~ K and fitting a linear regression I obtain a function (Linf = a + b*K) with R2= 0.8 and a = 215, b = -763. In this context, to take into account explicitly correlation between parameters, I decided to fit a new non linear regression derivate from VBGF but where Linf is expressed depending on K (I am most interested in K). To do so, I tried this model: vbgf.par <- nls(Lgtcm ~ (a + (b*k)) *(1 - exp(-k * (Age - tzero))), start = c(k= 0.07, a= 215, b=-763), data=alkdyr) Unfortunately at this point I cannot go further as I get the error message "singular gradient matrix at initial parameter estimates". I tried to use alg= plinear (which I am not sure I understand properly yet). If I give a starting value for a and b only, I have an error message stating "step factor below minFactor" (even when minFactor is set to 1000). Any help will be more than welcome as this is quite urgent Best, Xochitl C. -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem running stepAIC within a function.
I know it is pretty old but it seems that nobody answer this question. I ran into similar problem and the work around I found use the assign function. Indeed it seems than stepAIC look for the element it needs within the global environment. So if you define an object inside a function, you need afterwards to assign this object to the global environment. See ?assign Maybe that can help some people. -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction of response after glm on whitened data
Thanks Thierry for your quick answer. Indeed this simplifies a lot my method so I decided to apply it. However I will be curious to check in which extend the coefficients obtained with the gls function are similar to the ones obtained using glm and whitening. It seems to me thant the method are indeed pretty similar. So if someone knows a function which allows me to predict my response and its associated variance using R after whitening and glm (see original question), I am still eager to know it. Best, Xo <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 28/01/2015 16:19, ONKELINX, Thierry a écrit : Dear Xochitl, Have a look at gls() from the nlme package. It allows you to fit auto correlated errors. gls(k ~ NPw, correlation = corAR1(form = ~ Time)) Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: R-help [mailto:r-help-boun...@r-project.org] Namens Xochitl CORMON Verzonden: woensdag 28 januari 2015 15:09 Aan: Rlist; Rlist Onderwerp: [R] Prediction of response after glm on whitened data Hi all, Here is a description of my case. I am sorry if my question is also statistic related but it is difficult to disentangle. I will however try to make it only R applied. My response is a growth constant "k" and my descriptor is prey biomass "NP" and time series is of 21 years. I applied a gaussiam GLM (or LM) to this question. After the regression I tested the residuals for autocorrelation using acf(). Because autocorrelation was significant I decided to whiten my data using {car}dwt() in order to obtain rho (an estimation of my correlation) and then applying the following to my data in order to remove autocorrelation: kw_i = k_i - rho * k_i-1 NPw_i = NPw_i - rho * NPw_i-1 (method from Jonathan Taylor, http://statweb.stanford.edu/~jtaylo/courses/stats191/correlated_errors.html). After that I fitted a model on this whitened data (kw_i ~ NPw_i), realised an F-test and obtained classical results such as deviance explained, pvalues and of course the intercept and coefficient of the last regression. However doing that and coming to prediction using predict() I can only obtained predictions of deltaK (kw_i) in function of deltaNP (NPw_i) but I am actually interested in being able to predict k in function of NP... Is there a solution to predict directly k and its associated variance using R without having to detail in the script all the mathematical process necessary to come back to something like k_i = mu + rho * k_i-1 + beta(NPw_i - rho * NPw_i-1) + epsilon with mu being the intercept, beta the regression coefficient and epsilon the error, ? Thank you for your help, Best, Xochitl C. -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer<https://www.inbo.be/nl/disclaimer-mailberichten-van-het-inbo> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction of response after glm on whitened data
Hi all, Here is a description of my case. I am sorry if my question is also statistic related but it is difficult to disentangle. I will however try to make it only R applied. My response is a growth constant "k" and my descriptor is prey biomass "NP" and time series is of 21 years. I applied a gaussiam GLM (or LM) to this question. After the regression I tested the residuals for autocorrelation using acf(). Because autocorrelation was significant I decided to whiten my data using {car}dwt() in order to obtain rho (an estimation of my correlation) and then applying the following to my data in order to remove autocorrelation: kw_i = k_i - rho * k_i-1 NPw_i = NPw_i - rho * NPw_i-1 (method from Jonathan Taylor, http://statweb.stanford.edu/~jtaylo/courses/stats191/correlated_errors.html). After that I fitted a model on this whitened data (kw_i ~ NPw_i), realised an F-test and obtained classical results such as deviance explained, pvalues and of course the intercept and coefficient of the last regression. However doing that and coming to prediction using predict() I can only obtained predictions of deltaK (kw_i) in function of deltaNP (NPw_i) but I am actually interested in being able to predict k in function of NP... Is there a solution to predict directly k and its associated variance using R without having to detail in the script all the mathematical process necessary to come back to something like k_i = mu + rho * k_i-1 + beta(NPw_i - rho * NPw_i-1) + epsilon with mu being the intercept, beta the regression coefficient and epsilon the error, ? Thank you for your help, Best, Xochitl C. -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en écologie marine et science halieutique PhD student in marine ecology and fishery science <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm's for a logistic regression - no warnings?
Le 01/10/2013 17:41, Dimitri Liakhovitski a écrit : Ah, thank you very much - I did not understand first brglm was the name of a package! Dimitri My bad! If there is separation you should see it in the way the coefficient diverges from one (it's pretty exponential). You can increase the number of step if you see nothing but in my opinion 30 steps are enough. There is several packages to handle separated data, brglm and logistf are the two I recall. Good luck! Xochitl C. On Tue, Oct 1, 2013 at 11:34 AM, Xochitl CORMON mailto:xochitl.cor...@ifremer.fr>> wrote: <>< <>< <>< <>< Xochitl CORMON Le 01/10/2013 17:29, Dimitri Liakhovitski a écrit : Thank you very much, Bert - it's very helpful. This post says that R issues a warning: Warning message: *glm.fit: fitted probabilities numerically 0 or 1 occurred * Actually the warning message should be something like: glm.fit: algorithm did not converge The fist warning is not fatal contrary to the second one.. (https://stat.ethz.ch/__pipermail/r-help/2012-March/__307352.html <https://stat.ethz.ch/pipermail/r-help/2012-March/307352.html>) However, in my case there is no warning. How could I detect complete separation in my data? I need to be able to flag it in my function. As said use the separation dectection function: separation.detection{brglm} Thank you very much! Dimitri On Tue, Oct 1, 2013 at 10:52 AM, Xochitl CORMON mailto:xochitl.cor...@ifremer.fr> <mailto:Xochitl.Cormon@__ifremer.fr <mailto:xochitl.cor...@ifremer.fr>>> wrote: Hi, I did have warning messages about convergence issues using binomial GLM with logit link with my data in the past Do you detect separation using the function separation.detection{brglm}? Regards, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit : I have this weird data set with 2 predictors and one dependent variable - attached. predictor1 has all zeros except for one 1. I am runnning a simple logistic regression: temp<-read.csv("x data for reg224.csv") myreg<- glm(dv~predictor1+predictor2,data=temp, family=binomial("logit")) myreg$coef2 Everything runs fine and I get the coefficients - and the fact that there is only one 1 on one of the predictors doesn't seem to cause any problems. However, when I run the same regression in SAS, I get warnings: Model Convergence Status Quasi-complete separation of data points detected. Warning: The maximum likelihood estimate may not exist. Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. And the coefficients SAS produces are quite different from mine. I know I'll probably get screamed at because it's not a pure R question - but any idea why R is not giving me any warnings in such a situation? Does it have no problems with ML estimation in this case? Thanks a lot! __ R-help@r-project.org <mailto:R-help@r-project.org> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/__listinfo/r-help> <https://stat.ethz.ch/mailman/__listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.h
Re: [R] glm's for a logistic regression - no warnings?
<>< <>< <>< <>< Xochitl CORMON Le 01/10/2013 17:29, Dimitri Liakhovitski a écrit : Thank you very much, Bert - it's very helpful. This post says that R issues a warning: Warning message: *glm.fit: fitted probabilities numerically 0 or 1 occurred * Actually the warning message should be something like: glm.fit: algorithm did not converge The fist warning is not fatal contrary to the second one.. (https://stat.ethz.ch/pipermail/r-help/2012-March/307352.html) However, in my case there is no warning. How could I detect complete separation in my data? I need to be able to flag it in my function. As said use the separation dectection function: separation.detection{brglm} Thank you very much! Dimitri On Tue, Oct 1, 2013 at 10:52 AM, Xochitl CORMON mailto:xochitl.cor...@ifremer.fr>> wrote: Hi, I did have warning messages about convergence issues using binomial GLM with logit link with my data in the past Do you detect separation using the function separation.detection{brglm}? Regards, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit : I have this weird data set with 2 predictors and one dependent variable - attached. predictor1 has all zeros except for one 1. I am runnning a simple logistic regression: temp<-read.csv("x data for reg224.csv") myreg<- glm(dv~predictor1+predictor2,__data=temp, family=binomial("logit")) myreg$coef2 Everything runs fine and I get the coefficients - and the fact that there is only one 1 on one of the predictors doesn't seem to cause any problems. However, when I run the same regression in SAS, I get warnings: Model Convergence Status Quasi-complete separation of data points detected. Warning: The maximum likelihood estimate may not exist. Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. And the coefficients SAS produces are quite different from mine. I know I'll probably get screamed at because it's not a pure R question - but any idea why R is not giving me any warnings in such a situation? Does it have no problems with ML estimation in this case? Thanks a lot! R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm's for a logistic regression - no warnings?
Hi, I did have warning messages about convergence issues using binomial GLM with logit link with my data in the past Do you detect separation using the function separation.detection{brglm}? Regards, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit : I have this weird data set with 2 predictors and one dependent variable - attached. predictor1 has all zeros except for one 1. I am runnning a simple logistic regression: temp<-read.csv("x data for reg224.csv") myreg<- glm(dv~predictor1+predictor2,data=temp, family=binomial("logit")) myreg$coef2 Everything runs fine and I get the coefficients - and the fact that there is only one 1 on one of the predictors doesn't seem to cause any problems. However, when I run the same regression in SAS, I get warnings: Model Convergence Status Quasi-complete separation of data points detected. Warning: The maximum likelihood estimate may not exist. Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. And the coefficients SAS produces are quite different from mine. I know I'll probably get screamed at because it's not a pure R question - but any idea why R is not giving me any warnings in such a situation? Does it have no problems with ML estimation in this case? Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise selection with qAIC and qBIC
Here is a solution I applied using qAIC and package bbmle so I share it for next ones. It is not really automatized as I need to read every results of the drop() test an enter manually the less significant variable but I guess a function can be created in this goal. nullQ <- update (null, family = quasibinomial) fullQ <- update (full, family = quasibinomial) # backward manual selection #Zuur, 2010 p.227 ; Faraway, 2006 p.149 drop1(fullQ, test= "F") modelQ1 <- update(fullQ, .~. - #Highest p_value (not significant)#) drop1(modelQ1, test = "F") modelQ2 <- update(modelQ1, .~. -#2nd Highest p_value (not significant)#) drop1(modelQ2, test = "F") modelQ3 <- update(modelQ2, .~. - # 3rd #) drop1(modelQ3, test = "F") modelQ4 <- update(modelQ3, .~. - # 4th #) drop1(modelQ4, test = "F") modelQ5 <- update(modelQ4, .~. - # 5th #) drop1(modelQ5, test = "F") modelQ6 <- update(modelQ5, .~. - # 6th #) drop1(modelQ6, test = "F") library(bbmle) # overdispersion parameter calculated from the most complex model as the sum of squares Pearson residuals divided by the number of degrees of freedom # Burnham & Anderson, 2002, p.67 Qi2 <- sum(residuals(fullQ, type= "pearson")^2) dfr <- summary(fullQ)$df.residual disp <- Qi2/dfr full <- update(fullQ, family = binomial) # we need to retrieve the loglikelihood so we use the "binomial model" model1 <- update(modelQ1, family = binomial) model2 <- update(modelQ2, family = binomial) model3 <- update(modelQ3, family = binomial) model4 <- update(modelQ4, family = binomial) model5 <- update(modelQ5, family = binomial) model6 <- update(modelQ6, family = binomial) qAICtab <- ICtab (full, model1, model2, model3, model4, model5, model6, dispersion = disp, type = "qAIC", base = TRUE) # we use the global dispersion parameter as recommended in Burnham & Anderson, 2002 <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 28/08/2013 17:34, Xochitl CORMON a écrit : Dear list, I am currently working with presence/absence GLM. Therefore I am using binomial family and selection my models this way : null <- glm(respvarPAT ~ 1 , family = binomial, data = datafit) full <- glm(respvarPAT ~ CSpp + FSpp + Gpp + Mpp + Ppp + Lpp + TempPoly2 + DepthPoly2 + DepthPoly3 , family = binomial, data = datafit) model1 <- stepAIC(full, scope = list(lower = null, upper = full), direction = "backward") #AIC backward model2 <- stepAIC(full, scope = list(lower = null, upper = full), direction = "backward", k=log(nobs(full))) #BIC backward model3 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "forward") #AIC forward model4 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "forward", k=log(nobs(full))) #BIC forward model5 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "both") #AIC both model6 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "both", k=log(nobs(full))) #BIC both Every model generated are actually identical being : glm(formula = respvarPAT ~ DepthPoly2 + DepthPoly3 + Gpp + Mpp + TempPoly2 + Lpp, family = binomial, data = datafit) This worked pretty good for the last set of explanatory variables I used however for these ones I have a bit of overdispersion problem. Dispersion parameter for more complex model(full) being : 7.415653. I looked into literature and found out that I should use qAIC and qBIC for selection. Thus my former stepwise selection is biased as using AIC and BIC (binomial family). My question is to know if there is way to change the k parameter in stepAIC in order to get quasi criterion. If not is there a way to automatize the selection using this criterion and having the dispersion parameter, customizing stepAIC function for example? Unfortunately I am reaching here my abilities in statistics and programming and cannot figure out if what I want to do is doable or not. Thank you for your help, Regards, Xochitl C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stepwise selection with qAIC and qBIC
Dear list, I am currently working with presence/absence GLM. Therefore I am using binomial family and selection my models this way : null <- glm(respvarPAT ~ 1 , family = binomial, data = datafit) full <- glm(respvarPAT ~ CSpp + FSpp + Gpp + Mpp + Ppp + Lpp + TempPoly2 + DepthPoly2 + DepthPoly3 , family = binomial, data = datafit) model1 <- stepAIC(full, scope = list(lower = null, upper = full), direction = "backward") #AIC backward model2 <- stepAIC(full, scope = list(lower = null, upper = full), direction = "backward", k=log(nobs(full))) #BIC backward model3 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "forward") #AIC forward model4 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "forward", k=log(nobs(full))) #BIC forward model5 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "both") #AIC both model6 <- stepAIC(null, scope = list(lower = null, upper = full), direction = "both", k=log(nobs(full))) #BIC both Every model generated are actually identical being : glm(formula = respvarPAT ~ DepthPoly2 + DepthPoly3 + Gpp + Mpp + TempPoly2 + Lpp, family = binomial, data = datafit) This worked pretty good for the last set of explanatory variables I used however for these ones I have a bit of overdispersion problem. Dispersion parameter for more complex model(full) being : 7.415653. I looked into literature and found out that I should use qAIC and qBIC for selection. Thus my former stepwise selection is biased as using AIC and BIC (binomial family). My question is to know if there is way to change the k parameter in stepAIC in order to get quasi criterion. If not is there a way to automatize the selection using this criterion and having the dispersion parameter, customizing stepAIC function for example? Unfortunately I am reaching here my abilities in statistics and programming and cannot figure out if what I want to do is doable or not. Thank you for your help, Regards, Xochitl C. -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Naming columns
Assuming you attribute the name "dataset" to your data. A way to name a column using colnames and which. Code to change V1 column name: colnames(dataset)[which(colnames(dataset) == "V1")] <- "Toto" You are asking to R in the column's names of dataset which one is "V1" (TRUE) and attributing to this column the new name "Toto". You can also use the the number of column but even if it seems simpler it is not recommended as you can get problem if the column order is changed. Code to change 1st column name: colnames(dataset)[1] <- "Toto" Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 27/08/2013 02:20, Jeff Newmiller a écrit : Your question prompts more questions than answers. Not sure what "imported from notepad" means (clipboard? delimited how?). Don't know what you obtained in R (what does the str function tell you? What about dput(head(yourdata)))? You really need to tell us what R code you executed and data you gave it for us to understand what happened. Please read the Posting Guide mentioned at the bottom of this message. You would probably also benefit from reading about how to make a reproducible example to illustrate your question. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --- Jeff NewmillerThe . . Go Live... DCN: Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Docbanks84 wrote: Hi , I just imported a large data set from notepad. I want to label the columns in R. I used 'import Dataset' to bring in my data set Now, I would like to label V1,V2,V3 etc?? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Naming-columns-tp4674595.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transform variables
You have what is called a "wide" table and you want a "tall" table. In this case the function melt from reshape2 package is suitable. Be careful specifying properly the variables (columns) you want to use for the reshape using correct syntax. ?melt {reshape2} You can also use as already suggested reshape function. Note the syntax is slightly different. ?reshape Good luck, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< Le 26/08/2013 13:55, Erich Neuwirth a écrit : Have a look at the packages reshape and reshape2 They were written with this type of problems in mind. On Aug 26, 2013, at 1:04 PM, catalin roibu wrote: Dear all! I have a data frame composed by 13 columns (year, and 12 months). I want to transform this data base in another like this year month values 1901 1 1901 2 1901 3 . 1901 12 1902 1 1902 2 1902 12 Is there a possibility to succeed that in R? Thank you! best regards! CR -- --- Catalin-Constantin ROIBU Lecturer PhD, Forestry engineer Forestry Faculty of Suceava Str. Universitatii no. 13, Suceava, 720229, Romania office phone +4 0230 52 29 78, ext. 531 mobile phone +4 0745 53 18 01 +4 0766 71 76 58 FAX:+4 0230 52 16 64 silvic.usv.ro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample {base}
Thank you Sarah for this code, it's exactly what I wanted to reach. Le 05/06/2013 16:49, Sarah Goslee a écrit : What about using instead size = min(5, length(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]]& Gpool$SpCode == SpCode[[2]]]) that would make sure your sample is either the size of the data or 5. Sarah On Wed, Jun 5, 2013 at 10:27 AM, Xochitl CORMON wrote: Hi all, I'm trying to randomly select sample numbers for length class groups (5 per length class). For this I'm using a loop FOR and the function sample () and specified a size for the sampling of 5. Unfortunately, one of the length class group does not contain 5 individuals. For me is not a big deal as I have others groups to complete. However it bothers me that the sampling function does not select any individuals in this group generating the error below : Error in sample(Gpool2$SampleNb[Gpool2$LngtClas == LngtClas[[i]]& Gpool2$SpCode == : can not take a sample larger than the population when 'replace = FALSE'. I understand why this error message appears but I was wondering if there is a way to select all the items present in the group even if it's not 5 (something like size =< 5). LngtClas<- list( "40_49", "50_59", "60_69", "70_") SpCode<- list ("POLLVIR ", "MERLMER") a<- as.character(sample(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]]& Gpool$SpCode == SpCode[[2]]], size = 5, replace = FALSE)) You can find enclosed my dataset, Thank you for the help, Xochitl C. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sample {base}
Hi all, I'm trying to randomly select sample numbers for length class groups (5 per length class). For this I'm using a loop FOR and the function sample () and specified a size for the sampling of 5. Unfortunately, one of the length class group does not contain 5 individuals. For me is not a big deal as I have others groups to complete. However it bothers me that the sampling function does not select any individuals in this group generating the error below : Error in sample(Gpool2$SampleNb[Gpool2$LngtClas == LngtClas[[i]] & Gpool2$SpCode == : can not take a sample larger than the population when 'replace = FALSE'. I understand why this error message appears but I was wondering if there is a way to select all the items present in the group even if it's not 5 (something like size =< 5). LngtClas <- list( "40_49", "50_59", "60_69", "70_") SpCode <- list ("POLLVIR ", "MERLMER") a <- as.character(sample(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]] & Gpool$SpCode == SpCode[[2]]], size = 5, replace = FALSE)) You can find enclosed my dataset, Thank you for the help, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< SampleNb,SpCode,LngtClas 3,MERLMER,40_49 5,POLLVIR ,50_59 6,POLLVIR ,50_59 13,POLLVIR ,60_69 19,MERLMER,40_49 20,MERLMER,50_59 21,POLLVIR ,60_69 22,POLLVIR ,50_59 23,MERLMER,60_69 24,POLLVIR ,40_49 25,POLLVIR ,40_49 30,POLLVIR ,50_59 31,POLLVIR ,40_49 32,MERLMER,60_69 33,POLLVIR ,40_49 34,MERLMER,40_49 35,POLLVIR ,60_69 36,POLLVIR ,60_69 37,POLLVIR ,40_49 38,MERLMER,40_49 39,POLLVIR ,40_49 40,MERLMER,60_69 44,MERLMER,70_ 45,POLLVIR ,70_ 62,POLLVIR ,70_ 63,MERLMER,70_ 74,MERLMER,40_49 75,POLLVIR ,50_59 76,MERLMER,50_59 77,MERLMER,50_59 78,POLLVIR ,60_69 79,POLLVIR ,60_69 96,MERLMER,70_ 97,POLLVIR ,70_ 104,MERLMER,50_59 105,POLLVIR ,50_59 112,MERLMER,50_59 113,POLLVIR ,60_69 114,POLLVIR ,60_69 115,MERLMER,50_59 116,MERLMER,60_69 117,POLLVIR ,60_69 128,MERLMER,60_69 129,POLLVIR ,70_ 130,MERLMER,60_69 131,POLLVIR ,60_69 136,MERLMER,60_69 137,POLLVIR ,60_69 142,POLLVIR ,50_59 143,MERLMER,40_49 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop FOR with histogram() from lattice
Hi Jim, Thank you a lot. Is it a FAQ concerning lattice or FOR loop in general? Regards, Xochitl C. Le 05/06/2013 10:55, Jim Holtman a écrit : This is an FAQ. you have to explicitly 'print' the histogram: print(histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", main = paste("Length distribution per species for Mpool", "2", sep = "_"))) Sent from my iPad On Jun 5, 2013, at 4:37, Xochitl CORMON wrote: Hi all, I'm encountering a problem I do not understand on my data: library (lattice) Mpool1<- Table[Table$Subarea %in% c("52E9", "51E9"),] Mpool2<- Table[Table$Subarea %in% c("53F0", "52F0"),] Mpool3<- Table[Table$Subarea %in% c("51F0", "50F0"),] Mpool4<- Table[Table$Subarea %in% c("51F1", "52F1"),] Mpool<- list(Mpool1, Mpool2, Mpool3, Mpool4) histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", main = paste("Length distribution per species for Mpool", "2", sep = "_")) This part works perfectly and I obtain the graph reprensenting Mpool2 length class count per species. Now when I want to automatize this with a "for" loop nothing is plotted. for (i in c(2)){ windows() histogram(~ Mpool[[i]]$LngtClas | Mpool[[i]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", main = paste("Length distribution per species for Mpool", i, sep = "_")) print (i) } ### Running this loop I obtained windows filled grey (no plot drawn at all) but the print (i) print a "2" as expected. I really dont understand what's wrong with the loop. There is no error message and no notification in R. You can find enclosed my data in txt file. Thank you very much for any help, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop FOR with histogram() from lattice
Hi all, I'm encountering a problem I do not understand on my data: library (lattice) Mpool1 <- Table[Table$Subarea %in% c("52E9", "51E9"),] Mpool2 <- Table[Table$Subarea %in% c("53F0", "52F0"),] Mpool3 <- Table[Table$Subarea %in% c("51F0", "50F0"),] Mpool4 <- Table[Table$Subarea %in% c("51F1", "52F1"),] Mpool <- list(Mpool1, Mpool2, Mpool3, Mpool4) histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", main = paste("Length distribution per species for Mpool", "2", sep = "_")) This part works perfectly and I obtain the graph reprensenting Mpool2 length class count per species. Now when I want to automatize this with a "for" loop nothing is plotted. for (i in c(2)){ windows() histogram(~ Mpool[[i]]$LngtClas | Mpool[[i]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", main = paste("Length distribution per species for Mpool", i, sep = "_")) print (i) } ### Running this loop I obtained windows filled grey (no plot drawn at all) but the print (i) print a "2" as expected. I really dont understand what's wrong with the loop. There is no error message and no notification in R. You can find enclosed my data in txt file. Thank you very much for any help, Xochitl C. <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorante en sciences halieutiques PhD student in fishery sciences <>< <>< <>< <>< IFREMER Centre Manche Mer du Nord 150 quai Gambetta 62200 Boulogne-sur-Mer <>< <>< <>< <>< SpCode,LngtClas,Subarea POLLVIR ,70_,52F0 POLLVIR ,70_,52F0 MERLMER,40_49,51F0 POLLVIR ,60_69,52F0 POLLVIR ,50_59,51F1 POLLVIR ,50_59,51F1 POLLVIR ,40_49,52F0 POLLVIR ,50_59,52F0 MERLMER,40_49,52F0 MERLMER,60_69,52F0 POLLVIR ,60_69,51F0 MERLMER,40_49,50E7 POLLVIR ,70_,50E7 MERLMER,40_49,51F0 MERLMER,50_59,51F0 POLLVIR ,60_69,51F0 POLLVIR ,50_59,51F0 MERLMER,60_69,51F0 POLLVIR ,40_49,50F0 POLLVIR ,40_49,50F0 POLLVIR ,50_59,52E9 MERLMER,50_59,52E9 MERLMER,50_59,52F0 MERLMER,40_49,52F0 POLLVIR ,50_59,51F1 POLLVIR ,40_49,51F1 MERLMER,60_69,51E9 POLLVIR ,40_49,51E9 MERLMER,40_49,51E9 POLLVIR ,60_69,51E9 POLLVIR ,60_69,50F0 POLLVIR ,40_49,51F0 MERLMER,40_49,51F0 POLLVIR ,40_49,51F0 MERLMER,60_69,51F0 POLLVIR ,70_,52F0 POLLVIR ,50_59,52F0 MERLMER,50_59,52F0 MERLMER,70_,51E9 POLLVIR ,70_,51E9 POLLVIR ,70_,52F0 POLLVIR ,60_69,52F0 POLLVIR ,70_,52F0 POLLVIR ,40_49,52F0 POLLVIR ,70_,52F0 MERLMER,70_,52F0 MERLMER,70_,52F1 POLLVIR ,70_,52F1 POLLVIR ,70_,51F0 MERLMER,70_,51F0 POLLVIR ,60_69,52F0 MERLMER,60_69,52F0 MERLMER,40_49,52F0 POLLVIR ,50_59,52F0 POLLVIR ,50_59,53F0 MERLMER,50_59,53F0 MERLMER,70_,53F0 POLLVIR ,70_,53F0 MERLMER,60_69,52F0 POLLVIR ,70_,52F0 MERLMER,40_49,51F0 POLLVIR ,50_59,51F0 MERLMER,50_59,51F0 MERLMER,50_59,51F0 POLLVIR ,60_69,51F0 POLLVIR ,60_69,51F0 POLLVIR ,70_,52F1 MERLMER,70_,52F1 MERLMER,50_59,52F0 MERLMER,40_49,52F0 MERLMER,60_69,53F0 POLLVIR ,60_69,53F0 POLLVIR ,50_59,52E9 MERLMER,50_59,52E9 MERLMER,40_49,52F0 POLLVIR ,40_49,52F0 MERLMER,40_49,53F0 POLLVIR ,40_49,53F0 MERLMER,50_59,52F1 MERLMER,60_69,52F1 POLLVIR ,70_,52F0 MERLMER,70_,52F0 MERLMER,70_,51F0 POLLVIR ,70_,51F0 MERLMER,60_69,52F0 POLLVIR ,60_69,52F0 POLLVIR ,70_,46E3 MERLMER,70_,46E3 MERLMER,50_59,52E9 POLLVIR ,50_59,52E9 MERLMER,50_59,51F0 POLLVIR ,50_59,51F0 MERLMER,60_69,49E7 POLLVIR ,60_69,49E7 MERLMER,70_,53F0 MERLMER,60_69,53F0 MERLMER,60_69,52F0 POLLVIR ,60_69,52F0 MERLMER,50_59,51F0 POLLVIR ,60_69,51F0 POLLVIR ,60_69,50F0 MERLMER,50_59,50F0 MERLMER,60_69,51F1 POLLVIR ,60_69,51F1 POLLVIR ,60_69,49E7 MERLMER,70_,49E7 MERLMER,70_,52F0 POLLVIR ,70_,52F0 POLLVIR ,60_69,52F0 MERLMER,50_59,52F0 POLLVIR ,50_59,53F0 POLLVIR ,50_59,53F0 POLLVIR ,50_59,53F0 POLLVIR ,40_49,53F0 MERLMER,60_69,51E9 POLLVIR ,70_,51E9 MERLMER,60_69,51F1 POLLVIR ,60_69,51F1 POLLVIR ,50_59,52F0 MERLMER,50_59,52F0 MERLMER,40_49,52F1 POLLVIR ,40_49,52F1 MERLMER,60_69,50F0 POLLVIR ,60_69,50F0 MERLMER,60_69,47E2 POLLVIR ,70_,47E2 POLLVIR ,50_59,52E9 MERLMER,40_49,52E9 POLLVIR ,50_59,51F0 MERLMER,40_49,51F0 MERLMER,60_69,47E3 POLLVIR ,60_69,47E3 POLLVIR ,60_69,52F0 POLLVIR ,60_69,52F0 POLLVIR ,50_59,52E9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Separation issue in binary response models - glm, brglm, logistf
Le 28/02/2013 17:22, Ben Bolker a écrit : Thank you for your help ! Xochitl CORMON ifremer.fr> writes: Dear all, I am encountering some issues with my data and need some help. I am trying to run glm analysis with a presence/absence variable as response variable and several explanatory variable (time, location, presence/absence data, abundance data). First I tried to use the glm() function, however I was having 2 warnings concerning glm.fit () : # 1: glm.fit: algorithm did not converge # 2: glm.fit: fitted probabilities numerically 0 or 1 occurred After some investigation I found out that the problem was most probably quasi complete separation and therefor decide to use brglm and/or logistf. * logistf : analysis does not run When running logistf() I get a error message saying : # error in chol.default(x) : # leading minor 39 is not positive definite I looked into logistf package manual, on Internet, in the theoretical and technical paper of Heinze and Ploner and cannot find where this function is used and if the error can be fixed by some settings. chol.default is a function for Cholesky decomposition, which is going to be embedded fairly deeply in the code ... If I understand good I should just not use this package as this error is not easily fixable ? * brglm : analysis run However I get a warning message saying : # In fit.proc(x = X, y = Y, weights = weights, start = start, etastart # = etastart, : # Iteration limit reached Like before i cannot find where and why this function is used while running the package and if it can be fixed by adjusting some settings. In a more general way, I was wondering what are the fundamental differences of these packages. You might also take a crack with bayesglm() in the arm package, which should (?) be able to overcome the separation problem by specifying a not-completely-uninformative prior. Thank you for the tip I will have a look into this package and its doc tomorrow. Do you have any idea of what is this fit.proc function ? I hope this make sense enough and I am sorry if this is kind of statistical evidence that I'm not aware of. --- Here an extract of my table and the different formula I run : head (CPUE_table) Year Quarter Subarea Latitude Longitude Presence.S CPUE.S Presence.H CPUE.H Presence.NP CPUE.NP Presence.BW CPUE.BW Presence.C CPUE.C Presence.P CPUE.P Presence.W CPUE.W 1 2000 1 31F1 51.25 1.5 0 0 0 0 0 0 0 0 1 76.002 0 0 1 3358.667 [snip] logistf_binomPres<- logistf (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, data = CPUE_table) Brglm_binomPres<- brglm (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, family = binomial, data = CPUE_table) It's not much to go on, but: Yeah sorry my table header appeared really bad on the email :s * are you overfitting your data? That is, do you have at least 20 times as many 1's or 0's (whichever is rarer) as the number of parameters you are trying to estimated? I have 16 explanatory variable and with interactions we go to 136 parameters. > length (which((CPUE_table)[,]== 0)) [1] 33466 > length (which((CPUE_table)[,]== 1)) [1] 17552 I assume the over fitting is good, isn't it? * have you examined your data graphically and looked for any strong outliers that might be throwing off the fit? I did check my data graphically in a lot and different ways. However if you have any particular suggestions, please let me know. Concerning strong outliers, I do not really understand what you mean. I have outliers here and there but how can I know that they are strong enough to throw off the fit? Most of the time they are really high abundance coming from the fact that I'm using survey data and probably related to the fact that the boat fished over a fish school. * do you have some strongly correlated/multicollinear predictors? It's survey data so they indeed are correlated in time and space. However I checked the scatterplot matrix and I didn't notice any linear relation between variable. * for what it's worth it looks like a variety of your variables might be dummy variables, which you can often express more compactly by using a factor variable and letting R construct the design matrix (i.e. generating the dummy variables on the fly), although that shouldn't change your results I will check about dummy variable concept as to be honest I don't really understand what it means... Thank you again for your time and help __ R-help@r-project.org mailing list https://stat.ethz.ch/
[R] Separation issue in binary response models - glm, brglm, logistf
Dear all, I am encountering some issues with my data and need some help. I am trying to run glm analysis with a presence/absence variable as response variable and several explanatory variable (time, location, presence/absence data, abundance data). First I tried to use the glm() function, however I was having 2 warnings concerning glm.fit () : # 1: glm.fit: algorithm did not converge # 2: glm.fit: fitted probabilities numerically 0 or 1 occurred After some investigation I found out that the problem was most probably quasi complete separation and therefor decide to use brglm and/or logistf. * logistf : analysis does not run When running logistf() I get a error message saying : # error in chol.default(x) : # leading minor 39 is not positive definite I looked into logistf package manual, on Internet, in the theoretical and technical paper of Heinze and Ploner and cannot find where this function is used and if the error can be fixed by some settings. * brglm : analysis run However I get a warning message saying : # In fit.proc(x = X, y = Y, weights = weights, start = start, etastart # = etastart, : # Iteration limit reached Like before i cannot find where and why this function is used while running the package and if it can be fixed by adjusting some settings. In a more general way, I was wondering what are the fundamental differences of these packages. I hope this make sense enough and I am sorry if this is kind of statistical evidence that I'm not aware of. It is my first time asking a question so I apologize if it's not like it should be and kindly ask you to not hesitate to let me know about it. Thank you for your help Xochitl C. --- Here an extract of my table and the different formula I run : > head (CPUE_table) Year Quarter Subarea Latitude Longitude Presence.S CPUE.S Presence.H CPUE.H Presence.NP CPUE.NP Presence.BW CPUE.BW Presence.C CPUE.C Presence.P CPUE.P Presence.W CPUE.W 1 2000 131F151.25 1.5 0 0 0 0 0 0 0 0 1 76.002 0 0 1 3358.667 2 2000 131F251.25 2.5 0 0 0 0 0 0 0 0 1 12.500 0 0 1 3028.500 3 2000 132F151.75 1.5 0 0 0 0 0 0 0 0 1 5.500 0 0 1 2256.750 4 2000 132F251.75 2.5 0 0 0 0 0 0 0 0 1 10.000 0 0 1 808.000 5 2000 132F351.75 3.5 0 0 0 0 0 0 0 0 1 19.000 0 0 1 277.000 6 2000 133F152.25 1.5 0 0 0 0 0 0 0 0 0 0.000 0 0 12.000 > tail (CPUE_table) Year Quarter Subarea Latitude Longitude Presence.S CPUE.S Presence.H CPUE.H Presence.NP CPUE.NP Presence.BW CPUE.BW Presence.C CPUE.C Presence.P CPUE.P Presence.W CPUE.W 4435 2012 350F360.75 3.5 1 103.000 1 110.000 1 2379.00 1 20.000 1 6.000 0 0 1 22.000 4436 2012 351E861.25 -1.5 1 1311.600 1 12.000 1 4194.78 00.000 1 18.000 0 0 0 0.000 4437 2012 351E961.25 -0.5 1 34.336 1 46.671 1 11031.56 12.668 1 3.335 0 0 1 3.333 4438 2012 351F061.25 0.5 1 430.500 1 148.000 1 1212.22 1 3279.200 1 2.000 0 0 1 2.000 4439 2012 351F161.25 1.5 1 115.000 1 85.000 1 2089.50 11.000 1 22.000 1 2 1 40.000 4440 2012 351F261.25 2.5 1 72.500 1 35.500 1 270.48 1 516.300 1 11.500 1 1 1 16.000 logistf_binomPres <- logistf (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, data = CPUE_table) Brglm_binomPres <- brglm (Presence.S ~ (Presence.BW + Presence.W + Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + Longitude)^2, family = binomial, data = CPUE_table) --- -- <>< <>< <>< <>< Xochitl CORMON +33 (0)3 21 99 56 84 Doctorant