Thanks for your response. You are right - I am new to R and it's terminologies. I will follow up on your suggestions.
On Fri, Jul 26, 2013 at 11:22 AM, Bert Gunter <gunter.ber...@gene.com>wrote: > Soumitro: > > Have you read "An Introduction to R." If not, do so, as some of your > confusion appears related to basic concepts (e.g. of factors) > explained there. > > 1. Presumably your categorical variables are factors, not character. > If so, when you cbind() them, you cbind their integer codes, yielding > numerical variables. This produces an in incorrect design matrix in > fitting -- 1 df per categorical variable instead of 1 less than the > number of levels. Also see ?cbind. > > 2. Produces the correct design matrix, but you are overfitting, > presumably because of many different levels for your categorical > variables. I suggest you consult with a local statistician to decide > how best to handle this, as you seem to be out of your depth with > regard to model fitting. > > ... unless I have misunderstood, of course. > > Cheers, > Bert > > On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey <soumitrod...@gmail.com> > wrote: > > Hi list, > > > > While the "X matrix deemed to be singular" question has been answered in > > the list for quite a few times, I have a twist to it. > > > > I am using the coxph model for survival analysis on a dataset containing > > over 160,000 instances and 46 independent variables and I have 2 > scenarios: > > > > 1. If I use cbind on the 46 independent variables (many of which are > > categorical), coxph runs without any frills. The problem however is that > it > > won't report which of the categorical variables (e.g. VERY HIGH, HIGH, > > NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH > > ***, XLOW ., etc). Is there any way to check this? > > > > 2. If I don't use cbind, assuming it'll give me the details I am looking > > for in the previous step, it throws me the "X matrix deemed to be > > singular", more precisely: "X matrix deemed to be singular; variable 130 > > 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 > 149 > > 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 > 168 > > 169 170 171 172 173 174 175 176 177 178 179 180 181" > > > > Could anyone please elaborate on how to get around problem #1 or #2? > > > > Thanks! > > SD > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.