On Fri, 18 Dec 2009, Hien Nguyen wrote:
> Dear Drs Winsemius and Berry,
>
> Thanks a lot for your comment and suggestions on running my model. I am
> not just new to R but new to CLM as well. :( With your suggestions, I
> figure out that I have huge misunderstandings on the model and data
> arrangement.
>
> After my finals, I have read again related materials on CLM and
> rearranged in an appropriate way before running the model in R. This
> time, I have a data of more than 250,000 observations (created from more
> than 4000 response) and a model of 15 predictors.
>
> My question is that how long should it takes for the clogit command to
> run because it has been running for more 10 hours on a quad-core
> computer and still doesn't show any sign of done or almost done. Is it
> OK or my command just does not work.
If you have a lot of records with case=1 in a stratum, conditional
logistic regression will be extremely slow. And unnecessary: maximizing
the unconditional likelihood is fine when the stratum sizes are large.
Note that a quad-core computer won't help. Only one core will be used in
the computations.
-thomas
> Thanks a lot for your response
>
> Hien
>
>
> Charles C. Berry wrote:
> > On Fri, 4 Dec 2009, David Winsemius wrote:
> >
> > >
> > > On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
> > >
> > > > Dear Dr. Winsemius,
> > > >
> > > > Thank you very much for your reply.
> > > >
> > > > I have tried many possible combinations (even with the model of
> > > > only 2 predictors) but it produces the same message. With more
> > > > than 4000 observations, I think 14 predictors might not be too
> > > > many.
> > >
> > > It is what happens in the factor combinations that concern me. I am
> > > guessing that some of those predictors are factors. You really
> > > should not ask r-help questions without providing better
> > > descriptions of both the outcomes and the predictor variables.
> > >
> > > >
> > > > Although my dependent variable (Pin) is not discrete (it ranges
> > > > from 0 to 1), I do not think it will create problems to the
> > > > estimation but I'm not sure
> > >
> > > I would think it _would_ cause problems. As I understand it,
> > > conditional methods create contingency tables. Why are you using an
> > > outcome type that is not consistent with the fundamental regression
> > > assumptions of the clogit function?
> > >
> > > I do not get that particular error when I munge the infert dataset
> > > to have case be a random uniform value, but I do get an error.
> > > > infert$case <- runif(nrow(infert))
> > > > clogit(case~spontaneous+induced+strata(stratum),data=infert)
> > > Error in Surv(rep(1, 248L), case) : Invalid status value
> > >
> >
> > David, I think you were on the right track. I get this:
> >
> > -----------
> > > clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert)
> >
> > Error in fitter(X, Y, strats, offset, init, control, weights =
> > weights, :
> > NA/NaN/Inf in foreign function call (arg 6)
> > In addition: Warning messages:
> > 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
> > Invalid status value, converted to NA
> > 2: In fitter(X, Y, strats, offset, init, control, weights = weights,
> > :
> > Ran out of iterations and did not converge
> > >
> > ------------
> >
> > which looks pretty much the same as Hien's error msg
> >
> > So Hien needs to create a logical status value.
> >
> > Chuck
> >
> > p.s.
> >
> > > sessionInfo()
> > R version 2.10.0 (2009-10-26)
> > i386-pc-mingw32
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] splines stats graphics grDevices utils datasets
> > methods
> > [8] base
> >
> > other attached packages:
> > [1] survival_2.35-7
> >
> > loaded via a namespace (and not attached):
> > [1] tools_2.10.0
> > >
> >
> >
> > > So I certainly would not have proceeded to submit a full analysis to
> > > clogit if I could not get a test case to run under the situation you
> > > propose.
> > >
> > > --
> > > David
> > >
> > > >
> > > > I have checked the collinearity among predictors and they are all
> > > > < 0.5 (which I think is OK). Do you know what else could make this
> > > > errors?
> > > >
> > > > Thanks a lot
> > > >
> > > > Hien Nguyen
> > > >
> > > > David Winsemius wrote:
> > > > > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
> > > > > > > Dear R-helpers,
> > > > > > > > I am very new to R and trying to run the conditional logit
> > > > model using
> > > > > > "clogit " command.
> > > > > > I have more than 4000 observations in my dataset and try to
> > > > predict the
> > > > > > dependent variable from 14 independent variables. My command
> > > > is as > > follows
> > > > > > > > clmtest1 <-
> > > > > >
> > > > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata)
> > > > > > > > > > However, it produces the following errors:
> > > > > > > > Error in fitter(X, Y, strats, offset, init, control,
> > > > weights = weights, > > :
> > > > > > NA/NaN/Inf in foreign function call (arg 6)
> > > > > > In addition: Warning messages:
> > > > > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value,
> > > > converted to > > NA
> > > > > > 2: In fitter(X, Y, strats, offset, init, control, weights =
> > > > weights, :
> > > > > > Ran out of iterations and did not converge
> > > > > > > > I search the error message from R forums but it does not
> > > > say anything
> > > > > > for Conditional Logit Model.
> > > > > > With that many predictors in a small dataset, you may have
> > > > created matrix > singularities. Perhaps you created a stratum
> > > > where all of the subjects > experience the event and others where
> > > > none did so. The coefficients might > be driven to infinities. Try
> > > > simplifying the model.
> > > > > > > > > > Please check for me what it says and what should I do
> > > > to solve it.
> > > > > >
> > >
> > > David Winsemius, MD
> > > Heritage Laboratories
> > > West Hartford, CT
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > Charles C. Berry (858) 534-2098
> > Dept of Family/Preventive
> > Medicine
> > E mailto:cbe...@tajo.ucsd.edu UC San Diego
> > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> > 92093-0901
> >
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle