On Fri, 18 Dec 2009, Hien Nguyen wrote:

Thanks a lot for answering my questions.

I have tried to run the clogit for only 64 observations and 4 independent variables and the results are solved instantly. However, when I run the same command (with only 4 dependent variables) for the full data, it keeps running for 50 minutes now. :(

Thomas, what do you mean by "maximizing the unconditional likelihood is fine when the stratum sizes are large"? What I put in "strata (__)" is actually the possible choices (1-64). Each choices will be recored more than 4000 times (which means I have more than 4000 values of 1, 4000 values of 2 and so on).
Does it sound right?

So you have 64 cases and more than 250000 controls.

Large strata will really slow down clogit. But I think that that isn't your problem.

If the strata really matter - in the sense that the conditional distributions of covariates for controls vary a lot from stratum to stratum - then you really gain little by having more than a handful of controls for each case. If that is the situation you are in, sampling a couple of dozen controls from the stratum of each case will give you results that are very nearly as precise as those obtained from using all 4000 of them:

        plot( 1:100, (1 + 1/1:100), xlab='n of controls',
                ylab='relative variance of coef' )


will give you rough idea of the impact of increasing the number of controls per case. The variance with 1 control per case is 2; at the asymptote it is 1.

So you can probably spend things up a lot by using fewer controls with little loss in accuracy.

With only 64 cases you cannot fit terribly complicated models. This holds whether you approach things conditionally using clogit or unconditionally using glm. Fourteen degrees of freedom for regression is probably pushing matters. ridge() is helpful in taming overlarge regressor sets in clogit, but you'll need to use survival:::summary.coxph.penal() on the result (or tinker with the class attribute).

BTW, when you say 'strata(___)', I hope you mean that you use something like 'strata( stratvar )' where stravar is a factor that encodes the 64 levels.

HTH,

Chuck


Thanks a lot

Hien

tlum...@u.washington.edu wrote:
 On Fri, 18 Dec 2009, Hien Nguyen wrote:

>  Dear Drs Winsemius and Berry,
> > Thanks a lot for your comment and suggestions on running my model. I am > not just new to R but new to CLM as well. :( With your suggestions, I > figure out that I have huge misunderstandings on the model and data > arrangement. > > After my finals, I have read again related materials on CLM and > rearranged in an appropriate way before running the model in R. This > time, I have a data of more than 250,000 observations (created from more > than 4000 response) and a model of 15 predictors. > > My question is that how long should it takes for the clogit command to > run because it has been running for more 10 hours on a quad-core > computer and still doesn't show any sign of done or almost done. Is it > OK or my command just does not work.

 If you have a lot of records with case=1 in a stratum, conditional
 logistic regression will be extremely slow.   And unnecessary: maximizing
 the unconditional likelihood is fine when the stratum sizes are large.

 Note that a quad-core computer won't help. Only one core will be used in
 the computations.

      -thomas




>  Thanks a lot for your response
> > Hien > > > Charles C. Berry wrote:
> >  On Fri, 4 Dec 2009, David Winsemius wrote:
> > > > > > > > On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote: > > > > > > > Dear Dr. Winsemius, > > > > > > > > Thank you very much for your reply. > > > > > > > > I have tried many possible combinations (even with the model of > > > > only 2 predictors) but it produces the same message. With more > > > > than 4000 observations, I think 14 predictors might not be too > > > > many. > > > > > > It is what happens in the factor combinations that concern me. I am > > > guessing that some of those predictors are factors. You really > > > should not ask r-help questions without providing better > > > descriptions of both the outcomes and the predictor variables. > > > > > > > > > > > Although my dependent variable (Pin) is not discrete (it ranges > > > > from 0 to 1), I do not think it will create problems to the > > > > estimation but I'm not sure > > > > > > I would think it _would_ cause problems. As I understand it, > > > conditional methods create contingency tables. Why are you using an > > > outcome type that is not consistent with the fundamental regression > > > assumptions of the clogit function? > > > > > > I do not get that particular error when I munge the infert dataset > > > to have case be a random uniform value, but I do get an error.
> > > >   infert$case <- runif(nrow(infert))
> > > >   clogit(case~spontaneous+induced+strata(stratum),data=infert)
> > >  Error in Surv(rep(1, 248L), case) : Invalid status value
> > > > > > > David, I think you were on the right track. I get this: > > > > ----------- > > > clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert) > > > > Error in fitter(X, Y, strats, offset, init, control, weights = > > weights, :
> >    NA/NaN/Inf in foreign function call (arg 6)
> >  In addition: Warning messages:
> >  1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
> >    Invalid status value, converted to NA
> > 2: In fitter(X, Y, strats, offset, init, control, weights = weights, > > :
> >    Ran out of iterations and did not converge
> > > > > ------------ > > > > which looks pretty much the same as Hien's error msg > > > > So Hien needs to create a logical status value. > > > > Chuck > > > > p.s. > > > > > sessionInfo()
> >  R version 2.10.0 (2009-10-26)
> >  i386-pc-mingw32
> > > > locale:
> >  [1] LC_COLLATE=English_United States.1252
> >  [2] LC_CTYPE=English_United States.1252
> >  [3] LC_MONETARY=English_United States.1252
> >  [4] LC_NUMERIC=C
> >  [5] LC_TIME=English_United States.1252
> > > > attached base packages: > > [1] splines stats graphics grDevices utils datasets > > methods
> >  [8] base
> > > > other attached packages:
> >  [1] survival_2.35-7
> > > > loaded via a namespace (and not attached):
> >  [1] tools_2.10.0
> > > > > > > > > > So I certainly would not have proceeded to submit a full analysis to > > > clogit if I could not get a test case to run under the situation you > > > propose. > > > > > > -- > > > David > > > > > > > > > > > I have checked the collinearity among predictors and they are all > > > > < 0.5 (which I think is OK). Do you know what else could make this > > > > errors? > > > > > > > > Thanks a lot > > > > > > > > Hien Nguyen > > > > > > > > David Winsemius wrote:
> > > > > >  On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
> > > > > > >  Dear R-helpers,
> > > > > > > > I am very new to R and trying to run the conditional logit > > > > model using
> > > > > >  "clogit " command.
> > > > > > I have more than 4000 observations in my dataset and try to > > > > predict the > > > > > > dependent variable from 14 independent variables. My command > > > > is as > > follows
> > > > > > > >  clmtest1 <-
> > > > > > > > > > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) > > > > > > > > > > However, it produces the following errors: > > > > > > > > Error in fitter(X, Y, strats, offset, init, control, > > > > weights = weights, > > :
> > > > > >  NA/NaN/Inf in foreign function call (arg 6)
> > > > > >  In addition: Warning messages:
> > > > > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, > > > > converted to > > NA > > > > > > 2: In fitter(X, Y, strats, offset, init, control, weights = > > > > weights, :
> > > > > >  Ran out of iterations and did not converge
> > > > > > > > I search the error message from R forums but it does not > > > > say anything
> > > > > >  for Conditional Logit Model.
> > > > > > With that many predictors in a small dataset, you may have > > > > created matrix > singularities. Perhaps you created a stratum > > > > where all of the subjects > experience the event and others where > > > > none did so. The coefficients might > be driven to infinities. Try > > > > simplifying the model. > > > > > > > > > > Please check for me what it says and what should I do > > > > to solve it. > > > > > > > > > > > > David Winsemius, MD
> > >  Heritage Laboratories
> > >  West Hartford, CT
> > > > > > ______________________________________________
> > >  R-help@r-project.org mailing list
> > >  https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html
> > >  and provide commented, minimal, self-contained, reproducible code.
> > > > > > > Charles C. Berry (858) 534-2098 > > Dept of Family/Preventive > > Medicine
> >  E mailto:cbe...@tajo.ucsd.edu                UC San Diego
> > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego > > 92093-0901 > > > > > > ______________________________________________
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>
 Thomas Lumley            Assoc. Professor, Biostatistics
 tlum...@u.washington.edu    University of Washington, Seattle


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to