[R] Stepwise procedure with force.in command
Dear R-helpers, I am trying to do a stepwise procedure in which I want to force some variables in the model. I have searched around and it seems that only leaps package allows to force the variable in the stepwise procedure. I use the leaps package and use the regsubsets(lm1, force.in = 1, data) to force 1 variable in the model. However, the force.in command only allow me to force 1 variable in. I want to force several variables in the procedure. How could I force them in? Thanks a lot, Hien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error when running coeftest in plm
Dear R-helpers, I am using plm package to run panel data analysis for different models with different dependent variables. After running the PLM, I run the COEFTEST(plm1, vcovBK) to get the panel corrected standard errors. I got the PCSEs for some of the models but some tests produce error message. In one of the COEFTEST, the error message is: Error in solve.default(crossprod(demX)) : system is computationally singular: reciprocal condition number = 1.404e-017 What should I do to fix it and get the PCSEs? Thanks a lot, Hien Nguyen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error when running Conditional Logit Model
On 12/18/09 22:24, Charles C. Berry wrote: On Fri, 18 Dec 2009, Hien Nguyen wrote: Thanks a lot for answering my questions. I have tried to run the clogit for only 64 observations and 4 independent variables and the results are solved instantly. However, when I run the same command (with only 4 dependent variables) for the full data, it keeps running for 50 minutes now. :( Thomas, what do you mean by maximizing the unconditional likelihood is fine when the stratum sizes are large? What I put in strata (__) is actually the possible choices (1-64). Each choices will be recored more than 4000 times (which means I have more than 4000 values of 1, 4000 values of 2 and so on). Does it sound right? So you have 64 cases and more than 25 controls. No, I have 4096 cases and more than 25000 controls. Each case will result in 63 controls (which I have to create from each case) Large strata will really slow down clogit. But I think that that isn't your problem. If the strata really matter - in the sense that the conditional distributions of covariates for controls vary a lot from stratum to stratum - then you really gain little by having more than a handful of controls for each case. If that is the situation you are in, sampling a couple of dozen controls from the stratum of each case will give you results that are very nearly as precise as those obtained from using all 4000 of them: plot( 1:100, (1 + 1/1:100), xlab='n of controls', ylab='relative variance of coef' ) will give you rough idea of the impact of increasing the number of controls per case. The variance with 1 control per case is 2; at the asymptote it is 1. So you can probably spend things up a lot by using fewer controls with little loss in accuracy. I think I might need to use this. With only 64 cases you cannot fit terribly complicated models. This holds whether you approach things conditionally using clogit or unconditionally using glm. Fourteen degrees of freedom for regression is probably pushing matters. ridge() is helpful in taming overlarge regressor sets in clogit, but you'll need to use survival:::summary.coxph.penal() on the result (or tinker with the class attribute). I still let the program run. For the case of 4 df, it still does not produce the result. BTW, when you say 'strata(___)', I hope you mean that you use something like 'strata( stratvar )' where stravar is a factor that encodes the 64 levels. Yes, that's what I mean. Thank you. HTH, Chuck Thanks a lot Hien tlum...@u.washington.edu wrote: On Fri, 18 Dec 2009, Hien Nguyen wrote: Dear Drs Winsemius and Berry, Thanks a lot for your comment and suggestions on running my model. I am not just new to R but new to CLM as well. :( With your suggestions, I figure out that I have huge misunderstandings on the model and data arrangement. After my finals, I have read again related materials on CLM and rearranged in an appropriate way before running the model in R. This time, I have a data of more than 250,000 observations (created from more than 4000 response) and a model of 15 predictors. My question is that how long should it takes for the clogit command to run because it has been running for more 10 hours on a quad-core computer and still doesn't show any sign of done or almost done. Is it OK or my command just does not work. If you have a lot of records with case=1 in a stratum, conditional logistic regression will be extremely slow. And unnecessary: maximizing the unconditional likelihood is fine when the stratum sizes are large. Note that a quad-core computer won't help. Only one core will be used in the computations. -thomas Thanks a lot for your response Hien Charles C. Berry wrote: On Fri, 4 Dec 2009, David Winsemius wrote: On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote: Dear Dr. Winsemius, Thank you very much for your reply. I have tried many possible combinations (even with the model of only 2 predictors) but it produces the same message. With more than 4000 observations, I think 14 predictors might not be too many. It is what happens in the factor combinations that concern me. I am guessing that some of those predictors are factors. You really should not ask r-help questions without providing better descriptions of both the outcomes and the predictor variables. Although my dependent variable (Pin) is not discrete (it ranges from 0 to 1), I do not think it will create problems to the estimation but I'm not sure I would think it _would_ cause problems. As I understand it, conditional methods create contingency tables. Why are you using an outcome type that is not consistent with the fundamental regression assumptions of the clogit function? I do not get that particular error when
Re: [R] Error when running Conditional Logit Model
Dear Drs Winsemius and Berry, Thanks a lot for your comment and suggestions on running my model. I am not just new to R but new to CLM as well. :( With your suggestions, I figure out that I have huge misunderstandings on the model and data arrangement. After my finals, I have read again related materials on CLM and rearranged in an appropriate way before running the model in R. This time, I have a data of more than 250,000 observations (created from more than 4000 response) and a model of 15 predictors. My question is that how long should it takes for the clogit command to run because it has been running for more 10 hours on a quad-core computer and still doesn't show any sign of done or almost done. Is it OK or my command just does not work. Thanks a lot for your response Hien Charles C. Berry wrote: On Fri, 4 Dec 2009, David Winsemius wrote: On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote: Dear Dr. Winsemius, Thank you very much for your reply. I have tried many possible combinations (even with the model of only 2 predictors) but it produces the same message. With more than 4000 observations, I think 14 predictors might not be too many. It is what happens in the factor combinations that concern me. I am guessing that some of those predictors are factors. You really should not ask r-help questions without providing better descriptions of both the outcomes and the predictor variables. Although my dependent variable (Pin) is not discrete (it ranges from 0 to 1), I do not think it will create problems to the estimation but I'm not sure I would think it _would_ cause problems. As I understand it, conditional methods create contingency tables. Why are you using an outcome type that is not consistent with the fundamental regression assumptions of the clogit function? I do not get that particular error when I munge the infert dataset to have case be a random uniform value, but I do get an error. infert$case - runif(nrow(infert)) clogit(case~spontaneous+induced+strata(stratum),data=infert) Error in Surv(rep(1, 248L), case) : Invalid status value David, I think you were on the right track. I get this: --- clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum40,NA,stratum)),data=infert) Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 248L), I(case * runif(length(case : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge which looks pretty much the same as Hien's error msg So Hien needs to create a logical status value. Chuck p.s. sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.35-7 loaded via a namespace (and not attached): [1] tools_2.10.0 So I certainly would not have proceeded to submit a full analysis to clogit if I could not get a test case to run under the situation you propose. -- David I have checked the collinearity among predictors and they are all 0.5 (which I think is OK). Do you know what else could make this errors? Thanks a lot Hien Nguyen David Winsemius wrote: On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote: Dear R-helpers, I am very new to R and trying to run the conditional logit model using clogit command. I have more than 4000 observations in my dataset and try to predict the dependent variable from 14 independent variables. My command is as follows clmtest1 - clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) However, it produces the following errors: Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge I search the error message from R forums but it does not say anything for Conditional Logit Model. With that many predictors in a small dataset, you may have created matrix singularities. Perhaps you created a stratum where all of the subjects experience the event and others where none did so. The coefficients might be driven to infinities. Try simplifying the model. Please check for me what it says and what should I do to solve it. David Winsemius, MD Heritage Laboratories
Re: [R] Error when running Conditional Logit Model
Thanks a lot for answering my questions. I have tried to run the clogit for only 64 observations and 4 independent variables and the results are solved instantly. However, when I run the same command (with only 4 dependent variables) for the full data, it keeps running for 50 minutes now. :( Thomas, what do you mean by maximizing the unconditional likelihood is fine when the stratum sizes are large? What I put in strata (__) is actually the possible choices (1-64). Each choices will be recored more than 4000 times (which means I have more than 4000 values of 1, 4000 values of 2 and so on). Does it sound right? Thanks a lot Hien tlum...@u.washington.edu wrote: On Fri, 18 Dec 2009, Hien Nguyen wrote: Dear Drs Winsemius and Berry, Thanks a lot for your comment and suggestions on running my model. I am not just new to R but new to CLM as well. :( With your suggestions, I figure out that I have huge misunderstandings on the model and data arrangement. After my finals, I have read again related materials on CLM and rearranged in an appropriate way before running the model in R. This time, I have a data of more than 250,000 observations (created from more than 4000 response) and a model of 15 predictors. My question is that how long should it takes for the clogit command to run because it has been running for more 10 hours on a quad-core computer and still doesn't show any sign of done or almost done. Is it OK or my command just does not work. If you have a lot of records with case=1 in a stratum, conditional logistic regression will be extremely slow. And unnecessary: maximizing the unconditional likelihood is fine when the stratum sizes are large. Note that a quad-core computer won't help. Only one core will be used in the computations. -thomas Thanks a lot for your response Hien Charles C. Berry wrote: On Fri, 4 Dec 2009, David Winsemius wrote: On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote: Dear Dr. Winsemius, Thank you very much for your reply. I have tried many possible combinations (even with the model of only 2 predictors) but it produces the same message. With more than 4000 observations, I think 14 predictors might not be too many. It is what happens in the factor combinations that concern me. I am guessing that some of those predictors are factors. You really should not ask r-help questions without providing better descriptions of both the outcomes and the predictor variables. Although my dependent variable (Pin) is not discrete (it ranges from 0 to 1), I do not think it will create problems to the estimation but I'm not sure I would think it _would_ cause problems. As I understand it, conditional methods create contingency tables. Why are you using an outcome type that is not consistent with the fundamental regression assumptions of the clogit function? I do not get that particular error when I munge the infert dataset to have case be a random uniform value, but I do get an error. infert$case - runif(nrow(infert)) clogit(case~spontaneous+induced+strata(stratum),data=infert) Error in Surv(rep(1, 248L), case) : Invalid status value David, I think you were on the right track. I get this: --- clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum40,NA,stratum)),data=infert) Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 248L), I(case * runif(length(case : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge which looks pretty much the same as Hien's error msg So Hien needs to create a logical status value. Chuck p.s. sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.35-7 loaded via a namespace (and not attached): [1] tools_2.10.0 So I certainly would not have proceeded to submit a full analysis to clogit if I could not get a test case to run under the situation you propose. -- David I have checked the collinearity among predictors and they are all 0.5 (which I think is OK). Do you know what else could make this errors? Thanks a lot Hien Nguyen David Winsemius wrote: On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote: Dear R-helpers, I am very new to R and trying to run the conditional logit model using clogit command. I have more than 4000 observations in my dataset and try to predict the dependent variable from 14 independent variables. My command
[R] Error when running Conditional Logit Model
Dear R-helpers, I am very new to R and trying to run the conditional logit model using clogit command. I have more than 4000 observations in my dataset and try to predict the dependent variable from 14 independent variables. My command is as follows clmtest1 - clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) However, it produces the following errors: Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge I search the error message from R forums but it does not say anything for Conditional Logit Model. Please check for me what it says and what should I do to solve it. Thanks a lot for your help Hien Nguyen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error when running Conditional Logit Model
Dear Dr. Winsemius, Thank you very much for your reply. I have tried many possible combinations (even with the model of only 2 predictors) but it produces the same message. With more than 4000 observations, I think 14 predictors might not be too many. Although my dependent variable (Pin) is not discrete (it ranges from 0 to 1), I do not think it will create problems to the estimation but I'm not sure I have checked the collinearity among predictors and they are all 0.5 (which I think is OK). Do you know what else could make this errors? Thanks a lot Hien Nguyen David Winsemius wrote: On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote: Dear R-helpers, I am very new to R and trying to run the conditional logit model using clogit command. I have more than 4000 observations in my dataset and try to predict the dependent variable from 14 independent variables. My command is as follows clmtest1 - clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) However, it produces the following errors: Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge I search the error message from R forums but it does not say anything for Conditional Logit Model. With that many predictors in a small dataset, you may have created matrix singularities. Perhaps you created a stratum where all of the subjects experience the event and others where none did so. The coefficients might be driven to infinities. Try simplifying the model. Please check for me what it says and what should I do to solve it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error when running Conditional Logit Model
Dear R-helpers, I am very new to R and trying to run the conditional logit model using clogit command. I have more than 4000 observations in my dataset and try to predict the dependent variable from 14 independent variables. My command is as follows clmtest1 - clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) However, it produces the following errors: Error in fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted to NA 2: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge I search the error message from R forums but it does not say anything for Conditional Logit Model. Please check for me what it says and what should I do to solve it. Thanks a lot for your help Hien Nguyen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.