Thanks for your response. You are right - I am new to R and it's
terminologies. I will follow up on your suggestions.




On Fri, Jul 26, 2013 at 11:22 AM, Bert Gunter <gunter.ber...@gene.com>wrote:

> Soumitro:
>
> Have you read "An Introduction to R." If not, do so, as some of your
> confusion appears related to basic concepts (e.g. of factors)
> explained there.
>
> 1. Presumably your categorical variables are factors, not character.
> If so, when you cbind() them, you cbind their integer codes, yielding
> numerical variables. This produces an in incorrect design matrix in
> fitting -- 1 df per categorical variable instead of 1 less than the
> number of levels. Also see ?cbind.
>
> 2. Produces the correct design matrix, but you are overfitting,
> presumably because of many different levels for your categorical
> variables. I suggest you consult with a local statistician to decide
> how best to handle this, as you seem to be out of your depth with
> regard to model fitting.
>
> ... unless I have misunderstood, of course.
>
> Cheers,
> Bert
>
> On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey <soumitrod...@gmail.com>
> wrote:
> > Hi list,
> >
> > While the "X matrix deemed to be singular" question has been answered in
> > the list for quite a few times, I have a twist to it.
> >
> > I am using the coxph model for survival analysis on a dataset containing
> > over 160,000 instances and 46 independent variables and I have 2
> scenarios:
> >
> > 1. If I use cbind on the 46 independent variables (many of which are
> > categorical), coxph runs without any frills. The problem however is that
> it
> > won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
> > NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
> > ***, XLOW ., etc). Is there any way to check this?
> >
> > 2. If I don't use cbind, assuming it'll give me the details I am looking
> > for in the previous step, it throws me the "X matrix deemed to be
> > singular", more precisely: "X matrix deemed to be singular; variable 130
> > 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
> 149
> > 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
> 168
> > 169 170 171 172 173 174 175 176 177 178 179 180 181"
> >
> > Could anyone please elaborate on how to get around problem #1 or #2?
> >
> > Thanks!
> > SD
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to