[R] Stepwise procedure with force.in command

2012-04-09 Thread Hien Nguyen

Dear R-helpers,

I am trying to do a stepwise procedure in which I want to force some 
variables in the model. I have searched around and it seems that only 
leaps package allows to force the variable in the stepwise procedure. I 
use the leaps package and use the regsubsets(lm1, force.in = 1, data) to 
force 1 variable in the model. However, the force.in command only allow 
me to force 1 variable in. I want to force several variables in the 
procedure. How could I force them in?


Thanks a lot,

Hien

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error when running coeftest in plm

2012-04-08 Thread Hien Nguyen

Dear R-helpers,

I am using plm package to run panel data analysis for different models 
with different dependent variables. After running the PLM, I run the 
COEFTEST(plm1, vcovBK) to get the panel corrected standard errors. I got 
the PCSEs for some of the models but some tests produce error message.  
In one of the COEFTEST, the error message is:


Error in solve.default(crossprod(demX)) :
  system is computationally singular: reciprocal condition number = 
1.404e-017


What should I do to fix it and get the PCSEs?

Thanks a lot,

Hien Nguyen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error when running Conditional Logit Model

2009-12-19 Thread Hien Nguyen
On 12/18/09 22:24, Charles C. Berry wrote:
 On Fri, 18 Dec 2009, Hien Nguyen wrote:

 Thanks a lot for answering my questions.

 I have tried to run the clogit for only 64 observations and 4
 independent variables and the results are solved instantly. However,
 when I run the same command (with only 4 dependent variables) for the
 full data, it keeps running for 50 minutes now. :(

 Thomas, what do you mean by maximizing the unconditional likelihood
 is fine when the stratum sizes are large? What I put in strata
 (__) is actually the possible choices (1-64). Each choices will be
 recored more than 4000 times (which means I have more than 4000
 values of 1, 4000 values of 2 and so on).
 Does it sound right?

 So you have 64 cases and more than 25 controls.


No, I have 4096 cases and more than 25000 controls. Each case will
result in 63 controls (which I have to create from each case)

 Large strata will really slow down clogit. But I think that that isn't
 your problem.

 If the strata really matter - in the sense that the conditional
 distributions of covariates for controls vary a lot from stratum to
 stratum - then you really gain little by having more than a handful of
 controls for each case. If that is the situation you are in, sampling
 a couple of dozen controls from the stratum of each case will give you
 results that are very nearly as precise as those obtained from using
 all 4000 of them:

 plot( 1:100, (1 + 1/1:100), xlab='n of controls',
 ylab='relative variance of coef' )


 will give you rough idea of the impact of increasing the number of
 controls per case. The variance with 1 control per case is 2; at the
 asymptote it is 1.

 So you can probably spend things up a lot by using fewer controls with
 little loss in accuracy.

I think I might need to use this.


 With only 64 cases you cannot fit terribly complicated models. This
 holds whether you approach things conditionally using clogit or
 unconditionally using glm. Fourteen degrees of freedom for regression
 is probably pushing matters.  ridge() is helpful in taming overlarge
 regressor sets in clogit, but you'll need to use
 survival:::summary.coxph.penal() on the result (or tinker with the
 class attribute).

I still let the program run. For the case of 4 df, it still does not
produce the result.

 BTW, when you say 'strata(___)', I hope you mean that you use
 something like 'strata( stratvar )' where stravar is a factor that
 encodes the 64 levels.


Yes, that's what I mean. Thank you.

 HTH,

 Chuck


 Thanks a lot

 Hien

 tlum...@u.washington.edu wrote:
  On Fri, 18 Dec 2009, Hien Nguyen wrote:

   Dear Drs Winsemius and Berry,
Thanks a lot for your comment and suggestions on running my
 model. I am   not just new to R but new to CLM as well. :( With
 your suggestions, I   figure out that I have huge misunderstandings
 on the model and data   arrangement.
After my finals, I have read again related materials on CLM and
   rearranged in an appropriate way before running the model in R.
 This   time, I have a data of more than 250,000 observations
 (created from more   than 4000 response) and a model of 15 predictors.
My question is that how long should it takes for the clogit
 command to   run because it has been running for more 10 hours on a
 quad-core   computer and still doesn't show any sign of done or
 almost done. Is it   OK or my command just does not work.

  If you have a lot of records with case=1 in a stratum, conditional
  logistic regression will be extremely slow.   And unnecessary:
 maximizing
  the unconditional likelihood is fine when the stratum sizes are large.

  Note that a quad-core computer won't help. Only one core will be
 used in
  the computations.

   -thomas




   Thanks a lot for your response
Hien
 Charles C. Berry wrote:
On Fri, 4 Dec 2009, David Winsemius wrote:
  On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
 Dear Dr. Winsemius,
  Thank you very much for your reply.
  I have tried many possible combinations (even with
 the model of  only 2 predictors) but it produces the same
 message. With more  than 4000 observations, I think 14
 predictors might not be too  many.
It is what happens in the factor combinations that
 concern me. I am guessing that some of those predictors are
 factors. You really should not ask r-help questions without
 providing better descriptions of both the outcomes and the
 predictor variables.
 Although my dependent variable (Pin) is not
 discrete  (it ranges  from 0 to 1), I do not think it will
 create problems to the  estimation but I'm not sure
I would think it _would_ cause problems. As I
 understand it, conditional methods create contingency tables.
 Why are you using an outcome type that is not consistent with
 the fundamental regression assumptions of the clogit function?
I do not get that particular error when

Re: [R] Error when running Conditional Logit Model

2009-12-18 Thread Hien Nguyen

Dear Drs Winsemius and Berry,

Thanks a lot for your comment and suggestions on running my model. I am 
not just new to R but new to CLM as well. :( With your suggestions, I 
figure out that I have huge misunderstandings on the model and data 
arrangement.


After my finals, I have read again related materials on CLM and 
rearranged in an appropriate way before running the model in R. This 
time, I have a data of more than 250,000 observations (created from more 
than 4000 response) and a model of 15 predictors.


My question is that how long should it takes for the clogit command to 
run because it has been running for more 10 hours on a quad-core 
computer and still doesn't show any sign of done or almost done. Is it 
OK or my command just does not work.


Thanks a lot for your response

Hien


Charles C. Berry wrote:

On Fri, 4 Dec 2009, David Winsemius wrote:



On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:


Dear Dr. Winsemius,

Thank you very much for your reply.

I have tried many possible combinations (even with the model of only 
2 predictors) but it produces the same message. With more than 4000 
observations, I think 14 predictors might not be too many.


It is what happens in the factor combinations that concern me. I am 
guessing that some of those predictors are factors. You really should 
not ask r-help questions without providing better descriptions of 
both the outcomes and the predictor variables.




Although my dependent variable (Pin) is not discrete  (it ranges 
from 0 to 1), I do not think it will create problems to the 
estimation but I'm not sure


I would think it _would_ cause problems. As I understand it, 
conditional methods create contingency tables. Why are you using an 
outcome type that is not consistent with the fundamental regression 
assumptions of the clogit function?


I do not get that particular error when I munge the infert dataset to 
have case be a random uniform value, but I do get an error.

 infert$case - runif(nrow(infert))
 clogit(case~spontaneous+induced+strata(stratum),data=infert)

Error in Surv(rep(1, 248L), case) : Invalid status value



David, I think you were on the right track. I get this:

---
clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum40,NA,stratum)),data=infert) 

Error in fitter(X, Y, strats, offset, init, control, weights = 
weights,  :

  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 248L), I(case * runif(length(case :
  Invalid status value, converted to NA
2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge





which looks pretty much the same as Hien's error msg

So Hien needs to create a logical status value.

Chuck

p.s.


sessionInfo()

R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] survival_2.35-7

loaded via a namespace (and not attached):
[1] tools_2.10.0





So I certainly would not have proceeded to submit a full analysis to 
clogit if I could not get a test case to run under the situation you 
propose.


--
David



I have checked the collinearity among predictors and they are all  
0.5 (which I think is OK). Do you know what else could make this 
errors?


Thanks a lot

Hien Nguyen

David Winsemius wrote:
  On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
   Dear R-helpers,
I am very new to R and trying to run the conditional logit 
model using

  clogit  command.
  I have more than 4000 observations in my dataset and try to 
predict the
  dependent variable from 14 independent variables. My command is 
as   follows

clmtest1 -
  
clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) 


  However, it produces the following errors:
Error in fitter(X, Y, strats, offset, init, control, weights 
= weights,   :

  NA/NaN/Inf in foreign function call (arg 6)
  In addition: Warning messages:
  1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, 
converted to   NA
  2: In fitter(X, Y, strats, offset, init, control, weights = 
weights,  :

  Ran out of iterations and did not converge
I search the error message from R forums but it does not say 
anything

  for Conditional Logit Model.
  With that many predictors in a small dataset, you may have 
created matrix  singularities. Perhaps you created a stratum where 
all of the subjects  experience the event and others where none did 
so. The coefficients might  be driven to infinities. Try 
simplifying the model.
  Please check for me what it says and what should I do to 
solve it.
  


David Winsemius, MD
Heritage Laboratories

Re: [R] Error when running Conditional Logit Model

2009-12-18 Thread Hien Nguyen

Thanks a lot for answering my questions.

I have tried to run the clogit for only 64 observations and 4 
independent variables and the results are solved instantly. However, 
when I run the same command (with only 4 dependent variables) for the 
full data, it keeps running for 50 minutes now. :(


Thomas, what do you mean by maximizing the unconditional likelihood is 
fine when the stratum sizes are large? What I put in strata (__) is 
actually the possible choices (1-64). Each choices will be recored more 
than 4000 times (which means I have more than 4000 values of 1, 4000 
values of 2 and so on).

Does it sound right?

Thanks a lot

Hien

tlum...@u.washington.edu wrote:

On Fri, 18 Dec 2009, Hien Nguyen wrote:


Dear Drs Winsemius and Berry,

Thanks a lot for your comment and suggestions on running my model. I 
am not just new to R but new to CLM as well. :( With your 
suggestions, I figure out that I have huge misunderstandings on the 
model and data arrangement.


After my finals, I have read again related materials on CLM and 
rearranged in an appropriate way before running the model in R. This 
time, I have a data of more than 250,000 observations (created from 
more than 4000 response) and a model of 15 predictors.


My question is that how long should it takes for the clogit command 
to run because it has been running for more 10 hours on a quad-core 
computer and still doesn't show any sign of done or almost done. Is 
it OK or my command just does not work.


If you have a lot of records with case=1 in a stratum, conditional 
logistic regression will be extremely slow.   And unnecessary: 
maximizing the unconditional likelihood is fine when the stratum sizes 
are large.


Note that a quad-core computer won't help. Only one core will be used 
in the computations.


 -thomas





Thanks a lot for your response

Hien


Charles C. Berry wrote:

On Fri, 4 Dec 2009, David Winsemius wrote:



On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:


Dear Dr. Winsemius,

Thank you very much for your reply.

I have tried many possible combinations (even with the model of 
only 2 predictors) but it produces the same message. With more 
than 4000 observations, I think 14 predictors might not be too many.


It is what happens in the factor combinations that concern me. I am 
guessing that some of those predictors are factors. You really 
should not ask r-help questions without providing better 
descriptions of both the outcomes and the predictor variables.




Although my dependent variable (Pin) is not discrete  (it ranges 
from 0 to 1), I do not think it will create problems to the 
estimation but I'm not sure


I would think it _would_ cause problems. As I understand it, 
conditional methods create contingency tables. Why are you using an 
outcome type that is not consistent with the fundamental regression 
assumptions of the clogit function?


I do not get that particular error when I munge the infert dataset 
to have case be a random uniform value, but I do get an error.

 infert$case - runif(nrow(infert))
 clogit(case~spontaneous+induced+strata(stratum),data=infert)

Error in Surv(rep(1, 248L), case) : Invalid status value



David, I think you were on the right track. I get this:

---
clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum40,NA,stratum)),data=infert) 


Error in fitter(X, Y, strats, offset, init, control, weights = 
weights,  :

  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 248L), I(case * runif(length(case :
  Invalid status value, converted to NA
2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge





which looks pretty much the same as Hien's error msg

So Hien needs to create a logical status value.

Chuck

p.s.


sessionInfo()

R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] survival_2.35-7

loaded via a namespace (and not attached):
[1] tools_2.10.0





So I certainly would not have proceeded to submit a full analysis 
to clogit if I could not get a test case to run under the situation 
you propose.


--
David



I have checked the collinearity among predictors and they are all 
 0.5 (which I think is OK). Do you know what else could make this 
errors?


Thanks a lot

Hien Nguyen

David Winsemius wrote:
  On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
   Dear R-helpers,
I am very new to R and trying to run the conditional logit 
model using

  clogit  command.
  I have more than 4000 observations in my dataset and try to 
predict the
  dependent variable from 14 independent variables. My command

[R] Error when running Conditional Logit Model

2009-12-04 Thread Hien Nguyen
Dear R-helpers,

I am very new to R and trying to run the conditional logit model using
clogit  command.
I have more than 4000 observations in my dataset and try to predict the
dependent variable from 14 independent variables. My command is as follows

clmtest1 -
clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata)


However, it produces the following errors:

Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
 NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA
2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
 Ran out of iterations and did not converge

I search the error message from R forums but it does not say anything
for Conditional Logit Model.

Please check for me what it says and what should I do to solve it.

Thanks a lot for your help

Hien Nguyen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error when running Conditional Logit Model

2009-12-04 Thread Hien Nguyen

Dear Dr. Winsemius,

Thank you very much for your reply.

I have tried many possible combinations (even with the model of only 2 
predictors) but it produces the same message. With more than 4000 
observations, I think 14 predictors might not be too many.


Although my dependent variable (Pin) is not discrete  (it ranges from 0 
to 1), I do not think it will create problems to the estimation but I'm 
not sure


I have checked the collinearity among predictors and they are all  0.5 
(which I think is OK). Do you know what else could make this errors?


Thanks a lot

Hien Nguyen

David Winsemius wrote:


On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:


Dear R-helpers,

I am very new to R and trying to run the conditional logit model using
clogit  command.
I have more than 4000 observations in my dataset and try to predict the
dependent variable from 14 independent variables. My command is as 
follows


clmtest1 -
clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) 




However, it produces the following errors:

Error in fitter(X, Y, strats, offset, init, control, weights = 
weights,  :

NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted 
to NA

2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
Ran out of iterations and did not converge

I search the error message from R forums but it does not say anything
for Conditional Logit Model.


With that many predictors in a small dataset, you may have created 
matrix singularities. Perhaps you created a stratum where all of the 
subjects experience the event and others where none did so. The 
coefficients might be driven to infinities. Try simplifying the model.





Please check for me what it says and what should I do to solve it.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error when running Conditional Logit Model

2009-12-01 Thread Hien Nguyen

Dear R-helpers,

I am very new to R and trying to run the conditional logit model using 
clogit  command.
I have more than 4000 observations in my dataset and try to predict the 
dependent variable from 14 independent variables. My command is as follows


clmtest1 - 
clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata)


However, it produces the following errors:

Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
 NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA
2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
 Ran out of iterations and did not converge

I search the error message from R forums but it does not say anything 
for Conditional Logit Model.


Please check for me what it says and what should I do to solve it.

Thanks a lot for your help

Hien Nguyen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.