On 12/18/09 22:24, Charles C. Berry wrote: > On Fri, 18 Dec 2009, Hien Nguyen wrote: > >> Thanks a lot for answering my questions. >> >> I have tried to run the clogit for only 64 observations and 4 >> independent variables and the results are solved instantly. However, >> when I run the same command (with only 4 dependent variables) for the >> full data, it keeps running for 50 minutes now. :( >> >> Thomas, what do you mean by "maximizing the unconditional likelihood >> is fine when the stratum sizes are large"? What I put in "strata >> (__)" is actually the possible choices (1-64). Each choices will be >> recored more than 4000 times (which means I have more than 4000 >> values of 1, 4000 values of 2 and so on). >> Does it sound right? > > So you have 64 cases and more than 250000 controls. >
No, I have 4096 cases and more than 25000 controls. Each case will result in 63 controls (which I have to create from each case) > Large strata will really slow down clogit. But I think that that isn't > your problem. > > If the strata really matter - in the sense that the conditional > distributions of covariates for controls vary a lot from stratum to > stratum - then you really gain little by having more than a handful of > controls for each case. If that is the situation you are in, sampling > a couple of dozen controls from the stratum of each case will give you > results that are very nearly as precise as those obtained from using > all 4000 of them: > > plot( 1:100, (1 + 1/1:100), xlab='n of controls', > ylab='relative variance of coef' ) > > > will give you rough idea of the impact of increasing the number of > controls per case. The variance with 1 control per case is 2; at the > asymptote it is 1. > > So you can probably spend things up a lot by using fewer controls with > little loss in accuracy. I think I might need to use this. > > With only 64 cases you cannot fit terribly complicated models. This > holds whether you approach things conditionally using clogit or > unconditionally using glm. Fourteen degrees of freedom for regression > is probably pushing matters. ridge() is helpful in taming overlarge > regressor sets in clogit, but you'll need to use > survival:::summary.coxph.penal() on the result (or tinker with the > class attribute). > I still let the program run. For the case of 4 df, it still does not produce the result. > BTW, when you say 'strata(___)', I hope you mean that you use > something like 'strata( stratvar )' where stravar is a factor that > encodes the 64 levels. > Yes, that's what I mean. Thank you. > HTH, > > Chuck > >> >> Thanks a lot >> >> Hien >> >> tlum...@u.washington.edu wrote: >>> On Fri, 18 Dec 2009, Hien Nguyen wrote: >>> >>> > Dear Drs Winsemius and Berry, >>> > > Thanks a lot for your comment and suggestions on running my >>> model. I am > not just new to R but new to CLM as well. :( With >>> your suggestions, I > figure out that I have huge misunderstandings >>> on the model and data > arrangement. >>> > > After my finals, I have read again related materials on CLM and >>> > rearranged in an appropriate way before running the model in R. >>> This > time, I have a data of more than 250,000 observations >>> (created from more > than 4000 response) and a model of 15 predictors. >>> > > My question is that how long should it takes for the clogit >>> command to > run because it has been running for more 10 hours on a >>> quad-core > computer and still doesn't show any sign of done or >>> almost done. Is it > OK or my command just does not work. >>> >>> If you have a lot of records with case=1 in a stratum, conditional >>> logistic regression will be extremely slow. And unnecessary: >>> maximizing >>> the unconditional likelihood is fine when the stratum sizes are large. >>> >>> Note that a quad-core computer won't help. Only one core will be >>> used in >>> the computations. >>> >>> -thomas >>> >>> >>> >>> >>> > Thanks a lot for your response >>> > > Hien >>> > > > Charles C. Berry wrote: >>> > > On Fri, 4 Dec 2009, David Winsemius wrote: >>> > > > > > > > > On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote: >>> > > > > > > > Dear Dr. Winsemius, >>> > > > > > > > > Thank you very much for your reply. >>> > > > > > > > > I have tried many possible combinations (even with >>> the model of > > > > only 2 predictors) but it produces the same >>> message. With more > > > > than 4000 observations, I think 14 >>> predictors might not be too > > > > many. >>> > > > > > > It is what happens in the factor combinations that >>> concern me. I am > > > guessing that some of those predictors are >>> factors. You really > > > should not ask r-help questions without >>> providing better > > > descriptions of both the outcomes and the >>> predictor variables. >>> > > > > > > > > > > > Although my dependent variable (Pin) is not >>> discrete (it ranges > > > > from 0 to 1), I do not think it will >>> create problems to the > > > > estimation but I'm not sure >>> > > > > > > I would think it _would_ cause problems. As I >>> understand it, > > > conditional methods create contingency tables. >>> Why are you using an > > > outcome type that is not consistent with >>> the fundamental regression > > > assumptions of the clogit function? >>> > > > > > > I do not get that particular error when I munge the >>> infert dataset > > > to have case be a random uniform value, but I >>> do get an error. >>> > > > > infert$case <- runif(nrow(infert)) >>> > > > > clogit(case~spontaneous+induced+strata(stratum),data=infert) >>> > > > Error in Surv(rep(1, 248L), case) : Invalid status value >>> > > > > > > > David, I think you were on the right track. I get this: >>> > > > > ----------- >>> > > > >>> clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert) >>> > > > > Error in fitter(X, Y, strats, offset, init, control, >>> weights = > > weights, : >>> > > NA/NaN/Inf in foreign function call (arg 6) >>> > > In addition: Warning messages: >>> > > 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) : >>> > > Invalid status value, converted to NA >>> > > 2: In fitter(X, Y, strats, offset, init, control, weights = >>> weights, > > : >>> > > Ran out of iterations and did not converge >>> > > > > > ------------ >>> > > > > which looks pretty much the same as Hien's error msg >>> > > > > So Hien needs to create a logical status value. >>> > > > > Chuck >>> > > > > p.s. >>> > > > > > sessionInfo() >>> > > R version 2.10.0 (2009-10-26) >>> > > i386-pc-mingw32 >>> > > > > locale: >>> > > [1] LC_COLLATE=English_United States.1252 >>> > > [2] LC_CTYPE=English_United States.1252 >>> > > [3] LC_MONETARY=English_United States.1252 >>> > > [4] LC_NUMERIC=C >>> > > [5] LC_TIME=English_United States.1252 >>> > > > > attached base packages: >>> > > [1] splines stats graphics grDevices utils datasets >>> > > methods >>> > > [8] base >>> > > > > other attached packages: >>> > > [1] survival_2.35-7 >>> > > > > loaded via a namespace (and not attached): >>> > > [1] tools_2.10.0 >>> > > > > > > > > > > So I certainly would not have proceeded to >>> submit a full analysis to > > > clogit if I could not get a test >>> case to run under the situation you > > > propose. >>> > > > > > > -- > > > David >>> > > > > > > > > > > > I have checked the collinearity among >>> predictors and they are all > > > > < 0.5 (which I think is OK). Do >>> you know what else could make this > > > > errors? >>> > > > > > > > > Thanks a lot >>> > > > > > > > > Hien Nguyen >>> > > > > > > > > David Winsemius wrote: >>> > > > > > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote: >>> > > > > > > > Dear R-helpers, >>> > > > > > > > > I am very new to R and trying to run the >>> conditional logit > > > > model using >>> > > > > > > "clogit " command. >>> > > > > > > I have more than 4000 observations in my dataset and >>> try to > > > > predict the >>> > > > > > > dependent variable from 14 independent variables. My >>> command > > > > is as > > follows >>> > > > > > > > > clmtest1 <- >>> > > > > > > > > > > >>> clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) >>> > > > > > > > > > > However, it produces the following errors: >>> > > > > > > > > Error in fitter(X, Y, strats, offset, init, >>> control, > > > > weights = weights, > > : >>> > > > > > > NA/NaN/Inf in foreign function call (arg 6) >>> > > > > > > In addition: Warning messages: >>> > > > > > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status >>> value, > > > > converted to > > NA >>> > > > > > > 2: In fitter(X, Y, strats, offset, init, control, >>> weights = > > > > weights, : >>> > > > > > > Ran out of iterations and did not converge >>> > > > > > > > > I search the error message from R forums but it >>> does not > > > > say anything >>> > > > > > > for Conditional Logit Model. >>> > > > > > > With that many predictors in a small dataset, you may >>> have > > > > created matrix > singularities. Perhaps you created a >>> stratum > > > > where all of the subjects > experience the event >>> and others where > > > > none did so. The coefficients might > be >>> driven to infinities. Try > > > > simplifying the model. >>> > > > > > > > > > > Please check for me what it says and what >>> should I do > > > > to solve it. >>> > > > > > > > > > > > > David Winsemius, MD >>> > > > Heritage Laboratories >>> > > > West Hartford, CT >>> > > > > > > ______________________________________________ >>> > > > R-help@r-project.org mailing list >>> > > > https://stat.ethz.ch/mailman/listinfo/r-help >>> > > > PLEASE do read the posting guide > > > >>> http://www.R-project.org/posting-guide.html >>> > > > and provide commented, minimal, self-contained, reproducible >>> code. >>> > > > > > > > Charles C. Berry (858) >>> 534-2098 >>> > > Dept of >>> Family/Preventive > > Medicine >>> > > E mailto:cbe...@tajo.ucsd.edu UC San Diego >>> > > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego >>> > > 92093-0901 >>> > > > > > > ______________________________________________ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> > >>> Thomas Lumley Assoc. Professor, Biostatistics >>> tlum...@u.washington.edu University of Washington, Seattle >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cbe...@tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego > 92093-0901 > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.