Re: [R] Non linear regression - Von Bertalanffy Growth Function - "singular gradient matrix at initial parameter estimates"

2015-09-08 Thread Xochitl CORMON
Thank you for the tip. Indeed, nlxb in nlmrt works and results are not 
crazy.


I would like however to assess goodness-of-fit (gof) and ultimately to 
compare it with gof from linear regression (fitted with same variables).


Before I used AICc to compare the nls() and lm() fit, however I get now 
an error message concerning the method loglike and its non compatibility 
with nlmrt class object. I guess it is because we use now Marquardt 
method to minimise sum-of square instead of Gauss-Newton? I am right? Or 
this is just an incompatibility coming between AICc function and nlmrt 
objects? Is there an R function to do that?


Best,

Xochitl C.


<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique
PhD student in marine ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 19/08/2015 15:11, ProfJCNash a écrit :

Packages nlmrt or minpack.lm use a Marquardt method. minpack.lm won't
proceed if the Jacobian singularity is at the starting point as far as
I'm aware, but nlxb in nlmrt can sometimes get going. It has a policy
that is aggressive in trying to improve the sum of squares, so will use
more effort than nls when both work.

JN

On 15-08-18 12:08 PM, Xochitl CORMON wrote:

Dear all,

I am trying to estimate VBGF parameters K and Linf using non linear
regression and nls(). First I used a classic approach where I estimate
both parameters together as below with "alkdyr" being a subset per year
of my age-length-key database and running in a loop.

vbgf.par <- nls(Lgtcm ~  Linf *(1 - exp(-K * (Age - tzero))), start =
c(K= 0.07, Linf = 177.1), data=alkdyr)

I obtain an estimation of both parameters that are strongly correlated.
Indeed after plotting Linf ~ K and fitting a linear regression I obtain
a function (Linf = a + b*K) with R2= 0.8 and a = 215, b = -763.

In this context, to take into account explicitly correlation between
parameters, I decided to fit a new non linear regression derivate from
VBGF but where Linf is expressed depending on K (I am most interested in
K). To do so, I tried this model:
vbgf.par <- nls(Lgtcm ~  (a + (b*k)) *(1 - exp(-k * (Age - tzero))),
start = c(k= 0.07, a= 215, b=-763), data=alkdyr)

Unfortunately at this point I cannot go further as I get the error
message "singular gradient matrix at initial parameter estimates".

I tried to use alg= plinear (which I am not sure I understand properly
yet). If I give a starting value for a and b only, I have an error
message stating "step factor below minFactor" (even when minFactor is
set to 1000).

Any help will be more than welcome as this is quite urgent

Best,

Xochitl C.






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non linear regression - Von Bertalanffy Growth Function - "singular gradient matrix at initial parameter estimates"

2015-08-18 Thread Xochitl CORMON

Dear all,

I am trying to estimate VBGF parameters K and Linf using non linear 
regression and nls(). First I used a classic approach where I estimate 
both parameters together as below with "alkdyr" being a subset per year 
of my age-length-key database and running in a loop.


vbgf.par <- nls(Lgtcm ~  Linf *(1 - exp(-K * (Age - tzero))), start = 
c(K= 0.07, Linf = 177.1), data=alkdyr)


I obtain an estimation of both parameters that are strongly correlated. 
Indeed after plotting Linf ~ K and fitting a linear regression I obtain 
a function (Linf = a + b*K) with R2= 0.8 and a = 215, b = -763.


In this context, to take into account explicitly correlation between 
parameters, I decided to fit a new non linear regression derivate from 
VBGF but where Linf is expressed depending on K (I am most interested in 
K). To do so, I tried this model:
vbgf.par <- nls(Lgtcm ~  (a + (b*k)) *(1 - exp(-k * (Age - tzero))), 
start = c(k= 0.07, a= 215, b=-763), data=alkdyr)


Unfortunately at this point I cannot go further as I get the error 
message "singular gradient matrix at initial parameter estimates".


I tried to use alg= plinear (which I am not sure I understand properly 
yet). If I give a starting value for a and b only, I have an error 
message stating "step factor below minFactor" (even when minFactor is 
set to 1000).


Any help will be more than welcome as this is quite urgent

Best,

Xochitl C.




--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique
PhD student in marine ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem running stepAIC within a function.

2015-02-03 Thread Xochitl CORMON
I know it is pretty old but it seems that nobody answer this question. I 
ran into similar problem and the work around I found use the assign 
function. Indeed it seems than stepAIC look for the element it needs 
within the global environment. So if you define an object inside a 
function, you need afterwards to assign this object to the global 
environment.


See ?assign

Maybe that can help some people.

--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique
PhD student in marine ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediction of response after glm on whitened data

2015-01-29 Thread Xochitl CORMON
Thanks Thierry for your quick answer. Indeed this simplifies a lot my 
method so I decided to apply it.


However I will be curious to check in which extend the coefficients 
obtained with the gls function are similar to the ones obtained using 
glm and whitening. It seems to me thant the method are indeed pretty 
similar.


So if someone knows a function which allows me to predict my response 
and its associated variance using R after whitening and glm (see 
original question), I am still eager to know it.


Best,

Xo


<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique
PhD student in marine ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 28/01/2015 16:19, ONKELINX, Thierry a écrit :

Dear Xochitl,

Have a look at gls() from the nlme package. It allows you to fit auto 
correlated errors.

gls(k ~ NPw, correlation = corAR1(form = ~ Time))

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: R-help [mailto:r-help-boun...@r-project.org] Namens Xochitl CORMON
Verzonden: woensdag 28 januari 2015 15:09
Aan: Rlist; Rlist
Onderwerp: [R] Prediction of response after glm on whitened data

Hi all,

Here is a description of my case. I am sorry if my question is also statistic 
related but it is difficult to disentangle. I will however try to make it only 
R applied.

My response is a growth constant "k" and my descriptor is prey biomass "NP" and 
time series is of 21 years.

I applied a gaussiam GLM (or LM) to this question. After the regression I 
tested the residuals for autocorrelation using acf(). Because autocorrelation 
was significant I decided to whiten my data using
{car}dwt() in order to obtain rho (an estimation of my correlation) and then 
applying the following to my data in order to remove autocorrelation:
kw_i = k_i - rho * k_i-1
NPw_i = NPw_i - rho * NPw_i-1
(method from Jonathan Taylor,
http://statweb.stanford.edu/~jtaylo/courses/stats191/correlated_errors.html).

After that I fitted a model on this whitened data (kw_i ~ NPw_i), realised an 
F-test and obtained classical results such as deviance explained, pvalues and 
of course the intercept and coefficient of the last regression. However doing 
that and coming to prediction using
predict() I can only obtained predictions of deltaK (kw_i) in function of 
deltaNP (NPw_i) but I am actually interested in being able to predict k in 
function of NP...

Is there a solution to predict directly k and its associated variance using R 
without having to detail in the script all the mathematical process necessary 
to come back to something like k_i = mu + rho * k_i-1
+ beta(NPw_i - rho * NPw_i-1) + epsilon
with mu being the intercept, beta the regression coefficient and epsilon the 
error, ?

Thank you for your help,

Best,

Xochitl C.


--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique PhD student in marine 
ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Disclaimer<https://www.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prediction of response after glm on whitened data

2015-01-28 Thread Xochitl CORMON

Hi all,

Here is a description of my case. I am sorry if my question is also 
statistic related but it is difficult to disentangle. I will however try 
to make it only R applied.


My response is a growth constant "k" and my descriptor is prey biomass 
"NP" and time series is of 21 years.


I applied a gaussiam GLM (or LM) to this question. After the regression 
I tested the residuals for autocorrelation using acf(). Because 
autocorrelation was significant I decided to whiten my data using 
{car}dwt() in order to obtain rho (an estimation of my correlation) and 
then applying the following to my data in order to remove autocorrelation:

kw_i = k_i - rho * k_i-1
NPw_i = NPw_i - rho * NPw_i-1
(method from Jonathan Taylor, 
http://statweb.stanford.edu/~jtaylo/courses/stats191/correlated_errors.html).


After that I fitted a model on this whitened data (kw_i ~ NPw_i), 
realised an F-test and obtained classical results such as deviance 
explained, pvalues and of course the intercept and coefficient of the 
last regression. However doing that and coming to prediction using 
predict() I can only obtained predictions of deltaK (kw_i) in function 
of deltaNP (NPw_i) but I am actually interested in being able to predict 
k in function of NP...


Is there a solution to predict directly k and its associated variance 
using R without having to detail in the script all the mathematical 
process necessary to come back to something like k_i = mu + rho * k_i-1 
+ beta(NPw_i - rho * NPw_i-1) + epsilon
with mu being the intercept, beta the regression coefficient and epsilon 
the error,

?

Thank you for your help,

Best,

Xochitl C.


--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en écologie marine et science halieutique
PhD student in marine ecology and fishery science

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm's for a logistic regression - no warnings?

2013-10-01 Thread Xochitl CORMON


Le 01/10/2013 17:41, Dimitri Liakhovitski a écrit :

Ah, thank you very much - I did not understand first brglm was the name
of a package!
Dimitri


My bad!

If there is separation you should see it in the way the coefficient 
diverges from one (it's pretty exponential). You can increase the number 
of step if you see nothing but in my opinion 30 steps are enough.


There is several packages to handle separated data, brglm and logistf 
are the two I recall.


Good luck!

Xochitl C.



On Tue, Oct 1, 2013 at 11:34 AM, Xochitl CORMON
mailto:xochitl.cor...@ifremer.fr>> wrote:




<>< <>< <>< <><

Xochitl CORMON

Le 01/10/2013 17:29, Dimitri Liakhovitski a écrit :

Thank you very much, Bert - it's very helpful.
This post says that R issues a warning:

Warning message:
*glm.fit: fitted probabilities numerically 0 or 1 occurred
*


Actually the warning message should be something like:
glm.fit: algorithm did not converge

The fist warning is not fatal contrary to the second one..
(https://stat.ethz.ch/__pipermail/r-help/2012-March/__307352.html
<https://stat.ethz.ch/pipermail/r-help/2012-March/307352.html>)


However, in my case there is no warning. How could I detect complete
separation in my data? I need to be able to flag it in my function.


As said use the separation dectection function:
separation.detection{brglm}

Thank you very much!
    Dimitri



On Tue, Oct 1, 2013 at 10:52 AM, Xochitl CORMON
mailto:xochitl.cor...@ifremer.fr>
<mailto:Xochitl.Cormon@__ifremer.fr
<mailto:xochitl.cor...@ifremer.fr>>> wrote:

 Hi,

 I did have warning messages about convergence issues using
binomial
 GLM with logit link with my data in the past

 Do you detect separation using the function
separation.detection{brglm}?

     Regards,

     Xochitl C.


<>< <>< <>< <><

 Xochitl CORMON
+33 (0)3 21 99 56 84 



 Doctorante en sciences halieutiques
 PhD student in fishery sciences

<>< <>< <>< <><

 IFREMER
 Centre Manche Mer du Nord
 150 quai Gambetta
 62200 Boulogne-sur-Mer

<>< <>< <>< <><



 Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit :

 I have this weird data set with 2 predictors and one
dependent
 variable -
 attached.

 predictor1 has all zeros except for one 1.
 I am runnning a simple logistic regression:

 temp<-read.csv("x data for reg224.csv")
 myreg<- glm(dv~predictor1+predictor2,data=temp,

family=binomial("logit"))
 myreg$coef2

 Everything runs fine and I get the coefficients - and
the fact
 that there
 is only one 1 on one of the predictors doesn't seem to
cause any
 problems.

 However, when I run the same regression in SAS, I get
warnings:
Model Convergence Status  Quasi-complete separation
of data
 points
 detected.

 Warning: The maximum likelihood estimate may not exist.
 Warning: The LOGISTIC procedure continues in spite of
the above
 warning.
 Results shown are based on the last maximum likelihood
 iteration. Validity
 of the model fit is questionable.

 And the coefficients SAS produces are quite different
from mine.

 I know I'll probably get screamed at because it's not a
pure R
 question -
 but any idea why R is not giving me any warnings in such a
 situation?
 Does it have no problems with ML estimation in this case?

 Thanks a lot!





 __
R-help@r-project.org <mailto:R-help@r-project.org>
<mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/__listinfo/r-help>

<https://stat.ethz.ch/mailman/__listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>>
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.h

Re: [R] glm's for a logistic regression - no warnings?

2013-10-01 Thread Xochitl CORMON




<>< <>< <>< <><

Xochitl CORMON

Le 01/10/2013 17:29, Dimitri Liakhovitski a écrit :

Thank you very much, Bert - it's very helpful.
This post says that R issues a warning:

Warning message:
*glm.fit: fitted probabilities numerically 0 or 1 occurred
*


Actually the warning message should be something like:
glm.fit: algorithm did not converge

The fist warning is not fatal contrary to the second one.. 
(https://stat.ethz.ch/pipermail/r-help/2012-March/307352.html)



However, in my case there is no warning. How could I detect complete
separation in my data? I need to be able to flag it in my function.


As said use the separation dectection function: separation.detection{brglm}


Thank you very much!
Dimitri



On Tue, Oct 1, 2013 at 10:52 AM, Xochitl CORMON
mailto:xochitl.cor...@ifremer.fr>> wrote:

Hi,

I did have warning messages about convergence issues using binomial
GLM with logit link with my data in the past

Do you detect separation using the function separation.detection{brglm}?

Regards,

    Xochitl C.


<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84 

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit :

I have this weird data set with 2 predictors and one dependent
variable -
attached.

predictor1 has all zeros except for one 1.
I am runnning a simple logistic regression:

temp<-read.csv("x data for reg224.csv")
myreg<- glm(dv~predictor1+predictor2,__data=temp,
   family=binomial("logit"))
myreg$coef2

Everything runs fine and I get the coefficients - and the fact
that there
is only one 1 on one of the predictors doesn't seem to cause any
problems.

However, when I run the same regression in SAS, I get warnings:
   Model Convergence Status  Quasi-complete separation of data
points
detected.

Warning: The maximum likelihood estimate may not exist.
Warning: The LOGISTIC procedure continues in spite of the above
warning.
Results shown are based on the last maximum likelihood
iteration. Validity
of the model fit is questionable.

And the coefficients SAS produces are quite different from mine.

I know I'll probably get screamed at because it's not a pure R
question -
but any idea why R is not giving me any warnings in such a
situation?
Does it have no problems with ML estimation in this case?

Thanks a lot!






R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.



R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




--
Dimitri Liakhovitski


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm's for a logistic regression - no warnings?

2013-10-01 Thread Xochitl CORMON

Hi,

I did have warning messages about convergence issues using binomial GLM 
with logit link with my data in the past


Do you detect separation using the function separation.detection{brglm}?

Regards,

Xochitl C.


<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 01/10/2013 16:41, Dimitri Liakhovitski a écrit :

I have this weird data set with 2 predictors and one dependent variable -
attached.

predictor1 has all zeros except for one 1.
I am runnning a simple logistic regression:

temp<-read.csv("x data for reg224.csv")
myreg<- glm(dv~predictor1+predictor2,data=temp,
  family=binomial("logit"))
myreg$coef2

Everything runs fine and I get the coefficients - and the fact that there
is only one 1 on one of the predictors doesn't seem to cause any problems.

However, when I run the same regression in SAS, I get warnings:
  Model Convergence Status  Quasi-complete separation of data points
detected.

Warning: The maximum likelihood estimate may not exist.
Warning: The LOGISTIC procedure continues in spite of the above warning.
Results shown are based on the last maximum likelihood iteration. Validity
of the model fit is questionable.

And the coefficients SAS produces are quite different from mine.

I know I'll probably get screamed at because it's not a pure R question -
but any idea why R is not giving me any warnings in such a situation?
Does it have no problems with ML estimation in this case?

Thanks a lot!





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stepwise selection with qAIC and qBIC

2013-09-04 Thread Xochitl CORMON
Here is a solution I applied using qAIC and package bbmle so I share it 
for next ones. It is not really automatized as I need to read every 
results of the drop() test an enter manually the less significant 
variable but I guess a function can be created in this goal.


nullQ <- update (null, family = quasibinomial)
fullQ <- update (full, family = quasibinomial)

# backward manual selection #Zuur, 2010 p.227 ; Faraway, 2006 p.149
drop1(fullQ, test= "F")
modelQ1 <- update(fullQ, .~. - #Highest p_value (not significant)#)
drop1(modelQ1, test = "F")
modelQ2 <- update(modelQ1, .~. -#2nd Highest p_value (not significant)#)
drop1(modelQ2, test = "F")
modelQ3 <- update(modelQ2, .~. - # 3rd #)
drop1(modelQ3, test = "F")
modelQ4 <- update(modelQ3, .~. - # 4th #)
drop1(modelQ4, test = "F")
modelQ5 <- update(modelQ4, .~. - # 5th #)
drop1(modelQ5, test = "F")
modelQ6 <- update(modelQ5, .~. - # 6th #)
drop1(modelQ6, test = "F")

library(bbmle)

# overdispersion parameter calculated from the most complex model as the 
sum of squares Pearson residuals divided by the number of degrees of 
freedom # Burnham & Anderson, 2002, p.67


Qi2 <- sum(residuals(fullQ, type= "pearson")^2)
dfr <-  summary(fullQ)$df.residual
disp <- Qi2/dfr

full <- update(fullQ, family = binomial) # we need to retrieve the 
loglikelihood so we use the "binomial model"

model1 <- update(modelQ1, family = binomial)
model2 <- update(modelQ2, family = binomial)
model3 <- update(modelQ3, family = binomial)
model4 <- update(modelQ4, family = binomial)
model5 <- update(modelQ5, family = binomial)
model6 <- update(modelQ6, family = binomial)

qAICtab <- ICtab (full, model1, model2, model3, model4, model5, model6, 
dispersion = disp, type = "qAIC", base = TRUE) # we use the global 
dispersion parameter as recommended in Burnham & Anderson, 2002



<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 28/08/2013 17:34, Xochitl CORMON a écrit :

Dear list,

I am currently working with presence/absence GLM. Therefore I am using
binomial family and selection my models this way :

null <- glm(respvarPAT ~ 1 , family = binomial, data = datafit)
full <- glm(respvarPAT ~ CSpp + FSpp + Gpp + Mpp + Ppp + Lpp + TempPoly2
+ DepthPoly2 + DepthPoly3 , family = binomial, data = datafit)
model1 <- stepAIC(full, scope = list(lower = null, upper = full),
direction = "backward") #AIC backward
model2 <- stepAIC(full, scope = list(lower = null, upper = full),
direction = "backward", k=log(nobs(full))) #BIC backward
model3 <- stepAIC(null, scope = list(lower = null, upper = full),
direction = "forward") #AIC forward
model4 <- stepAIC(null, scope = list(lower = null, upper = full),
direction = "forward", k=log(nobs(full))) #BIC forward
model5 <- stepAIC(null, scope = list(lower = null, upper = full),
direction = "both") #AIC both
model6 <- stepAIC(null, scope = list(lower = null, upper = full),
direction = "both", k=log(nobs(full))) #BIC both

Every model generated are actually identical being :

glm(formula = respvarPAT ~ DepthPoly2 + DepthPoly3 + Gpp + Mpp +
TempPoly2 + Lpp, family = binomial, data = datafit)

This worked pretty good for the last set of explanatory variables I used
however for these ones I have a bit of overdispersion problem.
Dispersion parameter for more complex model(full) being : 7.415653.

I looked into literature and found out that I should use qAIC and qBIC
for selection. Thus my former stepwise selection is biased as using AIC
and BIC (binomial family).

My question is to know if there is way to change the k parameter in
stepAIC in order to get quasi criterion. If not is there a way to
automatize the selection using this criterion and having the dispersion
parameter, customizing stepAIC function for example? Unfortunately I am
reaching here my abilities in statistics and programming and cannot
figure out if what I want to do is doable or not.

Thank you for your help,

Regards,

Xochitl C.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stepwise selection with qAIC and qBIC

2013-08-28 Thread Xochitl CORMON

Dear list,

I am currently working with presence/absence GLM. Therefore I am using 
binomial family and selection my models this way :


null <- glm(respvarPAT ~ 1  , family = binomial, data = datafit)
full <- glm(respvarPAT ~ CSpp + FSpp + Gpp + Mpp + Ppp + Lpp + TempPoly2 
+ DepthPoly2 + DepthPoly3  , family = binomial, data = datafit)
model1 <- stepAIC(full, scope = list(lower = null, upper = full), 
direction = "backward") #AIC backward
model2 <- stepAIC(full, scope = list(lower = null, upper = full), 
direction = "backward", k=log(nobs(full))) #BIC backward
model3 <- stepAIC(null, scope = list(lower = null, upper = full), 
direction = "forward") #AIC forward
model4 <- stepAIC(null, scope = list(lower = null, upper = full), 
direction = "forward", k=log(nobs(full))) #BIC forward
model5 <- stepAIC(null, scope = list(lower = null, upper = full), 
direction = "both") #AIC both
model6 <- stepAIC(null, scope = list(lower = null, upper = full), 
direction = "both", k=log(nobs(full))) #BIC both


Every model generated are actually identical being :

glm(formula = respvarPAT ~ DepthPoly2 + DepthPoly3 + Gpp + Mpp + 
TempPoly2 + Lpp, family = binomial, data = datafit)


This worked pretty good for the last set of explanatory variables I used 
however for these ones I have a bit of overdispersion problem. 
Dispersion parameter for more complex model(full) being : 7.415653.


I looked into literature and found out that I should use qAIC and qBIC 
for selection. Thus my former stepwise selection is biased as using AIC 
and BIC (binomial family).


My question is to know if there is way to change the k parameter in 
stepAIC in order to get quasi criterion. If not is there a way to 
automatize the selection using this criterion and having the dispersion 
parameter, customizing stepAIC function for example? Unfortunately I am 
reaching here my abilities in statistics and programming and cannot 
figure out if what I want to do is doable or not.


Thank you for your help,

Regards,

Xochitl C.

--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Naming columns

2013-08-27 Thread Xochitl CORMON
Assuming you attribute the name "dataset" to your data. A way to name a 
column using colnames and which.


Code to change V1 column name:
colnames(dataset)[which(colnames(dataset) == "V1")] <- "Toto"

You are asking to R in the column's names of dataset which one is "V1" 
(TRUE) and attributing to this column the new name "Toto".


You can also use the the number of column but even if it seems simpler 
it is not recommended as you can get problem if the column order is changed.


Code to change 1st column name:
colnames(dataset)[1] <- "Toto"

Xochitl C.

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 27/08/2013 02:20, Jeff Newmiller a écrit :

Your question prompts more questions than answers. Not sure what "imported from 
notepad" means (clipboard? delimited how?). Don't know what you obtained in R (what 
does the str function tell you? What about dput(head(yourdata)))? You really need to tell 
us what R code you executed  and data you gave it for us to understand what happened.

Please read the Posting Guide mentioned at the bottom of this message. You 
would probably also benefit from reading about how to make a reproducible 
example to illustrate your question. 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

---
Jeff NewmillerThe .   .  Go Live...
DCN: Basics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

Docbanks84  wrote:

Hi ,

I just imported a large data set from notepad. I want to label the
columns
in R.

I used 'import Dataset' to bring in my data set
Now, I would like to label V1,V2,V3 etc??

Thanks



--
View this message in context:
http://r.789695.n4.nabble.com/Naming-columns-tp4674595.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transform variables

2013-08-27 Thread Xochitl CORMON
You have what is called a "wide" table and you want a "tall" table. In 
this case the function melt from reshape2 package is suitable. Be 
careful specifying properly the variables (columns) you want to use for 
the reshape using correct syntax.

?melt {reshape2}

You can also use as already suggested reshape function. Note the syntax 
is slightly different.

?reshape

Good luck,

Xochitl C.

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><



Le 26/08/2013 13:55, Erich Neuwirth a écrit :

Have a look at the packages reshape and reshape2
They were written with this type of problems in mind.

On Aug 26, 2013, at 1:04 PM, catalin roibu  wrote:


Dear all!

I have a data frame composed by 13 columns (year, and 12 months). I want to
transform this data base in another like this
year month values
1901 1
1901 2
1901 3
.
1901 12
1902  1
1902  2

1902  12

Is there a possibility to succeed that in R?

Thank you!

best regards!
CR

--
---
Catalin-Constantin ROIBU
Lecturer PhD, Forestry engineer
Forestry Faculty of Suceava
Str. Universitatii no. 13, Suceava, 720229, Romania
office phone +4 0230 52 29 78, ext. 531
mobile phone   +4 0745 53 18 01
   +4 0766 71 76 58
FAX:+4 0230 52 16 64
silvic.usv.ro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample {base}

2013-06-05 Thread Xochitl CORMON



Thank you Sarah for this code, it's exactly what I wanted to reach.


Le 05/06/2013 16:49, Sarah Goslee a écrit :

What about using instead
size = min(5, length(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]]&
Gpool$SpCode == SpCode[[2]]])

that would make sure your sample is either the size of the data or 5.

Sarah

On Wed, Jun 5, 2013 at 10:27 AM, Xochitl CORMON
  wrote:

Hi all,

I'm trying to randomly select sample numbers for length class groups (5 per
length class).

For this I'm using a loop FOR and the function sample () and specified a
size for the sampling of 5. Unfortunately, one of the length class group
does not contain 5 individuals. For me is not a big deal as I have others
groups to complete. However it bothers me that the sampling function does
not select any individuals in this group generating the error below :

Error in sample(Gpool2$SampleNb[Gpool2$LngtClas == LngtClas[[i]]&
Gpool2$SpCode ==  : can not take a sample larger than the population when
'replace = FALSE'.

I understand why this error message appears but I was wondering if there is
a way to select all the items present in the group even if it's not 5
(something like size =<  5).

LngtClas<- list( "40_49", "50_59", "60_69", "70_")
SpCode<- list ("POLLVIR ", "MERLMER")

a<- as.character(sample(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]]&
Gpool$SpCode == SpCode[[2]]], size = 5, replace = FALSE))

You can find enclosed my dataset,

Thank you for the help,

Xochitl C.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sample {base}

2013-06-05 Thread Xochitl CORMON

Hi all,

I'm trying to randomly select sample numbers for length class groups (5 
per length class).


For this I'm using a loop FOR and the function sample () and specified a 
size for the sampling of 5. Unfortunately, one of the length class group 
does not contain 5 individuals. For me is not a big deal as I have 
others groups to complete. However it bothers me that the sampling 
function does not select any individuals in this group generating the 
error below :


Error in sample(Gpool2$SampleNb[Gpool2$LngtClas == LngtClas[[i]] & 
Gpool2$SpCode ==  : can not take a sample larger than the population 
when 'replace = FALSE'.


I understand why this error message appears but I was wondering if there 
is a way to select all the items present in the group even if it's not 5 
(something like size =< 5).


LngtClas <- list( "40_49", "50_59", "60_69", "70_")
SpCode <- list ("POLLVIR ", "MERLMER")

a <- as.character(sample(Gpool$SampleNb[Gpool$LngtClas == LngtClas[[4]] 
& Gpool$SpCode == SpCode[[2]]], size = 5, replace = FALSE))


You can find enclosed my dataset,

Thank you for the help,

Xochitl C.

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

SampleNb,SpCode,LngtClas
3,MERLMER,40_49
5,POLLVIR ,50_59
6,POLLVIR ,50_59
13,POLLVIR ,60_69
19,MERLMER,40_49
20,MERLMER,50_59
21,POLLVIR ,60_69
22,POLLVIR ,50_59
23,MERLMER,60_69
24,POLLVIR ,40_49
25,POLLVIR ,40_49
30,POLLVIR ,50_59
31,POLLVIR ,40_49
32,MERLMER,60_69
33,POLLVIR ,40_49
34,MERLMER,40_49
35,POLLVIR ,60_69
36,POLLVIR ,60_69
37,POLLVIR ,40_49
38,MERLMER,40_49
39,POLLVIR ,40_49
40,MERLMER,60_69
44,MERLMER,70_
45,POLLVIR ,70_
62,POLLVIR ,70_
63,MERLMER,70_
74,MERLMER,40_49
75,POLLVIR ,50_59
76,MERLMER,50_59
77,MERLMER,50_59
78,POLLVIR ,60_69
79,POLLVIR ,60_69
96,MERLMER,70_
97,POLLVIR ,70_
104,MERLMER,50_59
105,POLLVIR ,50_59
112,MERLMER,50_59
113,POLLVIR ,60_69
114,POLLVIR ,60_69
115,MERLMER,50_59
116,MERLMER,60_69
117,POLLVIR ,60_69
128,MERLMER,60_69
129,POLLVIR ,70_
130,MERLMER,60_69
131,POLLVIR ,60_69
136,MERLMER,60_69
137,POLLVIR ,60_69
142,POLLVIR ,50_59
143,MERLMER,40_49
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop FOR with histogram() from lattice

2013-06-05 Thread Xochitl CORMON

Hi Jim,

Thank you a lot. Is it a FAQ concerning lattice or FOR loop in general?

Regards,

Xochitl C.


Le 05/06/2013 10:55, Jim Holtman a écrit :

This is an FAQ.  you have to explicitly 'print' the histogram:

print(histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col = "lightgrey", xlab= 
"LngtClas", main = paste("Length distribution per species for Mpool", "2", sep = "_")))

Sent from my iPad

On Jun 5, 2013, at 4:37, Xochitl CORMON  wrote:


Hi all,

I'm encountering a problem I do not understand on my data:

library (lattice)

Mpool1<- Table[Table$Subarea %in% c("52E9", "51E9"),]
Mpool2<- Table[Table$Subarea %in% c("53F0", "52F0"),]
Mpool3<- Table[Table$Subarea %in% c("51F0", "50F0"),]
Mpool4<- Table[Table$Subarea %in% c("51F1", "52F1"),]

Mpool<- list(Mpool1, Mpool2, Mpool3, Mpool4)


histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col = "lightgrey", xlab= "LngtClas", 
main = paste("Length distribution per species for Mpool", "2", sep = "_"))

 This part works perfectly and I obtain the graph reprensenting Mpool2 
length class count per species.
 Now when I want to automatize this with a "for" loop nothing is plotted.

for (i in c(2)){
windows()
histogram(~ Mpool[[i]]$LngtClas | Mpool[[i]]$SpCode, type = "count", col = "lightgrey", xlab= 
"LngtClas", main = paste("Length distribution per species for Mpool", i, sep = "_"))
print (i)
}

### Running this loop I obtained  windows filled grey (no plot drawn at all) but the 
print (i) print a "2" as expected. I really dont understand what's wrong with 
the loop. There is no error message and no notification in R. You can find enclosed my 
data in txt file.

Thank you very much for any help,

Xochitl C.

<><  <><  <><  <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<><  <><  <><  <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<><  <><  <><  <><


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loop FOR with histogram() from lattice

2013-06-05 Thread Xochitl CORMON

Hi all,

I'm encountering a problem I do not understand on my data:

library (lattice)

Mpool1 <- Table[Table$Subarea %in% c("52E9", "51E9"),]
Mpool2 <- Table[Table$Subarea %in% c("53F0", "52F0"),]
Mpool3 <- Table[Table$Subarea %in% c("51F0", "50F0"),]
Mpool4 <- Table[Table$Subarea %in% c("51F1", "52F1"),]

Mpool <- list(Mpool1, Mpool2, Mpool3, Mpool4)


histogram(~ Mpool[[2]]$LngtClas | Mpool[[2]]$SpCode, type = "count", col 
= "lightgrey", xlab= "LngtClas", main = paste("Length distribution per 
species for Mpool", "2", sep = "_"))


 This part works perfectly and I obtain the graph reprensenting 
Mpool2 length class count per species.
 Now when I want to automatize this with a "for" loop nothing is 
plotted.


for (i in c(2)){
windows()
histogram(~ Mpool[[i]]$LngtClas | Mpool[[i]]$SpCode, type = "count", col 
= "lightgrey", xlab= "LngtClas", main = paste("Length distribution per 
species for Mpool", i, sep = "_"))

print (i)
}

### Running this loop I obtained  windows filled grey (no plot drawn at 
all) but the print (i) print a "2" as expected. I really dont understand 
what's wrong with the loop. There is no error message and no 
notification in R. You can find enclosed my data in txt file.


Thank you very much for any help,

Xochitl C.

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorante en sciences halieutiques
PhD student in fishery sciences

<>< <>< <>< <><

IFREMER
Centre Manche Mer du Nord
150 quai Gambetta
62200 Boulogne-sur-Mer

<>< <>< <>< <><

SpCode,LngtClas,Subarea
POLLVIR ,70_,52F0
POLLVIR ,70_,52F0
MERLMER,40_49,51F0
POLLVIR ,60_69,52F0
POLLVIR ,50_59,51F1
POLLVIR ,50_59,51F1
POLLVIR ,40_49,52F0
POLLVIR ,50_59,52F0
MERLMER,40_49,52F0
MERLMER,60_69,52F0
POLLVIR ,60_69,51F0
MERLMER,40_49,50E7
POLLVIR ,70_,50E7
MERLMER,40_49,51F0
MERLMER,50_59,51F0
POLLVIR ,60_69,51F0
POLLVIR ,50_59,51F0
MERLMER,60_69,51F0
POLLVIR ,40_49,50F0
POLLVIR ,40_49,50F0
POLLVIR ,50_59,52E9
MERLMER,50_59,52E9
MERLMER,50_59,52F0
MERLMER,40_49,52F0
POLLVIR ,50_59,51F1
POLLVIR ,40_49,51F1
MERLMER,60_69,51E9
POLLVIR ,40_49,51E9
MERLMER,40_49,51E9
POLLVIR ,60_69,51E9
POLLVIR ,60_69,50F0
POLLVIR ,40_49,51F0
MERLMER,40_49,51F0
POLLVIR ,40_49,51F0
MERLMER,60_69,51F0
POLLVIR ,70_,52F0
POLLVIR ,50_59,52F0
MERLMER,50_59,52F0
MERLMER,70_,51E9
POLLVIR ,70_,51E9
POLLVIR ,70_,52F0
POLLVIR ,60_69,52F0
POLLVIR ,70_,52F0
POLLVIR ,40_49,52F0
POLLVIR ,70_,52F0
MERLMER,70_,52F0
MERLMER,70_,52F1
POLLVIR ,70_,52F1
POLLVIR ,70_,51F0
MERLMER,70_,51F0
POLLVIR ,60_69,52F0
MERLMER,60_69,52F0
MERLMER,40_49,52F0
POLLVIR ,50_59,52F0
POLLVIR ,50_59,53F0
MERLMER,50_59,53F0
MERLMER,70_,53F0
POLLVIR ,70_,53F0
MERLMER,60_69,52F0
POLLVIR ,70_,52F0
MERLMER,40_49,51F0
POLLVIR ,50_59,51F0
MERLMER,50_59,51F0
MERLMER,50_59,51F0
POLLVIR ,60_69,51F0
POLLVIR ,60_69,51F0
POLLVIR ,70_,52F1
MERLMER,70_,52F1
MERLMER,50_59,52F0
MERLMER,40_49,52F0
MERLMER,60_69,53F0
POLLVIR ,60_69,53F0
POLLVIR ,50_59,52E9
MERLMER,50_59,52E9
MERLMER,40_49,52F0
POLLVIR ,40_49,52F0
MERLMER,40_49,53F0
POLLVIR ,40_49,53F0
MERLMER,50_59,52F1
MERLMER,60_69,52F1
POLLVIR ,70_,52F0
MERLMER,70_,52F0
MERLMER,70_,51F0
POLLVIR ,70_,51F0
MERLMER,60_69,52F0
POLLVIR ,60_69,52F0
POLLVIR ,70_,46E3
MERLMER,70_,46E3
MERLMER,50_59,52E9
POLLVIR ,50_59,52E9
MERLMER,50_59,51F0
POLLVIR ,50_59,51F0
MERLMER,60_69,49E7
POLLVIR ,60_69,49E7
MERLMER,70_,53F0
MERLMER,60_69,53F0
MERLMER,60_69,52F0
POLLVIR ,60_69,52F0
MERLMER,50_59,51F0
POLLVIR ,60_69,51F0
POLLVIR ,60_69,50F0
MERLMER,50_59,50F0
MERLMER,60_69,51F1
POLLVIR ,60_69,51F1
POLLVIR ,60_69,49E7
MERLMER,70_,49E7
MERLMER,70_,52F0
POLLVIR ,70_,52F0
POLLVIR ,60_69,52F0
MERLMER,50_59,52F0
POLLVIR ,50_59,53F0
POLLVIR ,50_59,53F0
POLLVIR ,50_59,53F0
POLLVIR ,40_49,53F0
MERLMER,60_69,51E9
POLLVIR ,70_,51E9
MERLMER,60_69,51F1
POLLVIR ,60_69,51F1
POLLVIR ,50_59,52F0
MERLMER,50_59,52F0
MERLMER,40_49,52F1
POLLVIR ,40_49,52F1
MERLMER,60_69,50F0
POLLVIR ,60_69,50F0
MERLMER,60_69,47E2
POLLVIR ,70_,47E2
POLLVIR ,50_59,52E9
MERLMER,40_49,52E9
POLLVIR ,50_59,51F0
MERLMER,40_49,51F0
MERLMER,60_69,47E3
POLLVIR ,60_69,47E3
POLLVIR ,60_69,52F0
POLLVIR ,60_69,52F0
POLLVIR ,50_59,52E9
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Separation issue in binary response models - glm, brglm, logistf

2013-02-28 Thread Xochitl CORMON



Le 28/02/2013 17:22, Ben Bolker a écrit :

Thank you for your help !


Xochitl CORMON ifremer.fr> writes:


Dear all,

I am encountering some issues with my data and need some help. I am
trying to run glm analysis with a presence/absence variable as
response variable and several explanatory variable (time,
location, presence/absence data, abundance data).

First I tried to use the glm() function, however I was having 2
warnings concerning glm.fit () : # 1: glm.fit: algorithm did not
converge # 2: glm.fit: fitted probabilities numerically 0 or 1
occurred After some investigation I found out that the problem was
most probably quasi complete separation and therefor decide to use
brglm and/or

logistf.


* logistf : analysis does not run When running logistf() I get a
error message saying : # error in chol.default(x) : # leading minor
39 is not positive definite I looked into logistf package manual,
on Internet, in the theoretical and technical paper of Heinze and
Ploner and cannot find where this function is used and if the error
can be fixed by some settings.


chol.default is a function for Cholesky decomposition, which is going
to be embedded fairly deeply in the code ...


If I understand good I should just not use this package as this error is 
not easily fixable ?







* brglm : analysis run However I get a warning message saying : #
In fit.proc(x = X, y = Y, weights = weights, start = start,
etastart # = etastart, : # Iteration limit reached Like before i
cannot find where and why this function is used while running the
package and if it can be fixed by adjusting some settings.

In a more general way, I was wondering what are the fundamental
differences of these packages.


You might also take a crack with bayesglm() in the arm package, which
should (?) be able to overcome the separation problem by specifying a
not-completely-uninformative prior.


Thank you for the tip I will have a look into this package and its doc 
tomorrow. Do you have any idea of what is this fit.proc function ?







I hope this make sense enough and I am sorry if this is kind of
statistical evidence that I'm not aware of.

---




Here an extract of my table and the different formula I run :



head (CPUE_table)

Year Quarter Subarea Latitude Longitude Presence.S CPUE.S
Presence.H CPUE.H Presence.NP CPUE.NP Presence.BW CPUE.BW
Presence.C CPUE.C Presence.P CPUE.P Presence.W CPUE.W 1 2000 1 31F1
51.25 1.5 0 0 0 0 0 0 0 0 1 76.002 0 0 1 3358.667


[snip]


logistf_binomPres<- logistf (Presence.S ~ (Presence.BW + Presence.W
+ Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW +
CPUE.H + CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter +
Latitude + Longitude)^2, data = CPUE_table)

Brglm_binomPres<- brglm (Presence.S ~ (Presence.BW + Presence.W +
Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H
+ CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude +
Longitude)^2, family = binomial, data = CPUE_table)


It's not much to go on, but:


Yeah sorry my table header appeared really bad on the email :s






* are you overfitting your data? That is, do you have at least 20
times as many 1's or 0's (whichever is rarer) as the number of
parameters you are trying to estimated?


I have 16 explanatory variable and with interactions we go to 136 
parameters.


> length (which((CPUE_table)[,]== 0))
[1] 33466

> length (which((CPUE_table)[,]== 1))
[1] 17552

I assume the over fitting is good, isn't it?



* have you examined your data graphically and looked for any strong
outliers that might be throwing off the fit?


I did check my data graphically in a lot and different ways. However if 
you have any particular suggestions, please let me know. Concerning 
strong outliers, I do not really understand what you mean. I have 
outliers here and there but how can I know that they are strong enough 
to throw off the fit? Most of the time they are really high abundance 
coming from the fact that I'm using survey data and probably related to 
the fact that the boat fished over a fish school.



* do you have some strongly correlated/multicollinear predictors?


It's survey data so they indeed are correlated in time and space. 
However I checked the scatterplot matrix and I didn't notice any linear 
relation between variable.



* for what it's worth it looks like a variety of your variables
might be dummy variables, which you can often express more compactly
by using a factor variable and letting R construct the design matrix
(i.e. generating the dummy variables on the fly), although that
shouldn't change your results


I will check about dummy variable concept as to be honest I don't really 
understand what it means...


Thank you again for your time and help






__ R-help@r-project.org
mailing list https://stat.ethz.ch/

[R] Separation issue in binary response models - glm, brglm, logistf

2013-02-27 Thread Xochitl CORMON

Dear all,

I am encountering some issues with my data and need some help.
I am trying to run glm analysis with a presence/absence variable as 
response variable and several explanatory variable (time, location, 
presence/absence data, abundance data).


First I tried to use the glm() function, however I was having 2 warnings 
concerning glm.fit () :

# 1: glm.fit: algorithm did not converge
# 2: glm.fit: fitted probabilities numerically 0 or 1 occurred
After some investigation I found out that the problem was most probably 
quasi complete separation and therefor decide to use brglm and/or logistf.


* logistf : analysis does not run
When running logistf() I get a error message saying :
# error in chol.default(x) :
# leading minor 39 is not positive definite
I looked into logistf package manual, on Internet, in the theoretical 
and technical paper of Heinze and Ploner and cannot find where this 
function is used and if the error can be fixed by some settings.


* brglm : analysis run
However I get a warning message saying :
# In fit.proc(x = X, y = Y, weights = weights, start = start, etastart # 
= etastart,  :

# Iteration limit reached
Like before i cannot find where and why this function is used while 
running the package and if it can be fixed by adjusting some settings.


In a more general way, I was wondering what are the fundamental 
differences of these packages.


I hope this make sense enough and I am sorry if this is kind of 
statistical evidence that I'm not aware of.


It is my first time asking a question so I apologize if it's not like it 
should be and kindly ask you to not hesitate to let me know about it.


Thank you for your help

Xochitl C.

---

Here an extract of my table and the different formula I run :

> head (CPUE_table)
  Year Quarter Subarea Latitude Longitude Presence.S CPUE.S Presence.H 
CPUE.H Presence.NP CPUE.NP Presence.BW CPUE.BW Presence.C CPUE.C 
Presence.P CPUE.P Presence.W   CPUE.W
1 2000   131F151.25   1.5  0  0  0 
0   0   0   0   0  1 76.002 
  0  0  1 3358.667
2 2000   131F251.25   2.5  0  0  0 
0   0   0   0   0  1 12.500 
  0  0  1 3028.500
3 2000   132F151.75   1.5  0  0  0 
0   0   0   0   0  1  5.500 
  0  0  1 2256.750
4 2000   132F251.75   2.5  0  0  0 
0   0   0   0   0  1 10.000 
  0  0  1  808.000
5 2000   132F351.75   3.5  0  0  0 
0   0   0   0   0  1 19.000 
  0  0  1  277.000
6 2000   133F152.25   1.5  0  0  0 
0   0   0   0   0  0  0.000 
  0  0  12.000


> tail (CPUE_table)
 Year Quarter Subarea Latitude Longitude Presence.S   CPUE.S 
Presence.H  CPUE.H Presence.NP  CPUE.NP Presence.BW  CPUE.BW Presence.C 
CPUE.C Presence.P CPUE.P Presence.W CPUE.W
4435 2012   350F360.75   3.5  1  103.000 
  1 110.000   1  2379.00   1   20.000  1  6.000 
 0  0  1 22.000
4436 2012   351E861.25  -1.5  1 1311.600 
  1  12.000   1  4194.78   00.000  1 18.000 
 0  0  0  0.000
4437 2012   351E961.25  -0.5  1   34.336 
  1  46.671   1 11031.56   12.668  1  3.335 
 0  0  1  3.333
4438 2012   351F061.25   0.5  1  430.500 
  1 148.000   1  1212.22   1 3279.200  1  2.000 
 0  0  1  2.000
4439 2012   351F161.25   1.5  1  115.000 
  1  85.000   1  2089.50   11.000  1 22.000 
 1  2  1 40.000
4440 2012   351F261.25   2.5  1   72.500 
  1  35.500   1   270.48   1  516.300  1 11.500 
 1  1  1 16.000


logistf_binomPres <- logistf (Presence.S ~ (Presence.BW + Presence.W + 
Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + 
CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + 
Longitude)^2, data = CPUE_table)


Brglm_binomPres <- brglm (Presence.S ~ (Presence.BW + Presence.W + 
Presence.C + Presence.NP +Presence.P + Presence.H +CPUE.BW + CPUE.H + 
CPUE.P + CPUE.NP + CPUE.W + CPUE.C + Year + Quarter + Latitude + 
Longitude)^2, family = binomial, data = CPUE_table)


---



--

<>< <>< <>< <><

Xochitl CORMON
+33 (0)3 21 99 56 84

Doctorant