On Sep 13, 2013, at 9:33 AM, E Joffe wrote:

Thank you so much for your answer  !
As far as I understand, glmnet doesn't accept categorical variables only binary factors - so I had to create dummy variables for all categorical
variables.
It worked perfectly.

It's not exactly clear what worked perfectly. Since glmnet will only accept a matrix as its `x` data input, did you use model.matrix to construct the "dummies" and cbind your numeric predictors to that result? If you just assigned a factor attribute, it's more likely that you didn't actually use "dummies" but rather regressed on the integer values of the factor.

--
David.

--
Erel


Erel Joffe MD MSc
School of Biomedical Informatics
University of Texas - Health Science Center in Houston
832.287.0829 (c)

-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Friday, September 13, 2013 3:05 PM
To: E Joffe
Cc: r-help@r-project.org
Subject: Re: [R] Creating dummy vars with contrasts - why does the returned
identity matrix contain all levels (and not n-1 levels) ?


On Sep 13, 2013, at 4:15 AM, E Joffe wrote:

Hello,



I have a problem with creating an identity matrix for glmnet by using
the contrasts function.

Why do you want to do this?

I have a factor with 4 levels.

When I create dummy variables I think there should be n-1 variables
(in this case 3) - so that the contrasts would be against the baseline
level.

This is also what is written in the help file for 'contrasts'.

The problem is that the function creates a matrix with n variables
(i.e. the same as the number of levels) and not n-1 (where I would
have a baseline level for comparison).

Only if you specify contrasts=FALSE does it do so and this is documented in
that help file.



My questions are:

1.       How can I create a matrix with n-1 dummy vars ?

See below.

was I supposed to
define explicitly that I want contr.treatment (contrasts) ?

No need to do so.


2.       If it is not possible, how should I interpret the hazard
ratios in
the Cox regression I am generating (I use glmnet for variable
selection and
then generate a Cox regression)  - That is, if I get an HR of 3 for
the
variable 300mg what does it mean ? the hazard is 3 times higher of
what ?


Relative hazards are generally referenced to the "baseline hazard",
i.e. the hazard for a group with the omitted level for treatment
constrasts and the mean value for any numeric predictors.

Here is some code to reproduce the issue:

# Create a 4 level example factor

trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"),

                     100, replace=TRUE ) )

# If your intent is to use constrasts different than the defaults used
by
#  regression functions, these factor contrasts need to be assigned,
either
# within the construction of the factor or after the fact.

contrasts(trt)
        300 MG 600 MG PLACEBO
1200 MG      0      0       0
300 MG       1      0       0
600 MG       0      1       0
PLACEBO      0      0       1

# the default value for the contrasts parameter is TRUE and the
default type is treatement

# That did not cause any change to the 'trt'-object:
trt

#To make a change you need to use the `contrasts<-` function:

contrasts (trt) <- contrasts(trt)
trt


# Use contrasts to get the identity matrix of dummy variables to be
used in
glmnet

trt2 <- contrasts (trt,contrasts=FALSE)

Results (as you can see all levels are represented in the identity
matrix):

levels (trt)
[1] "1200 MG" "300 MG"  "600 MG"  "PLACEBO"


print (trt2)

      1200 MG 300 MG 600 MG PLACEBO

1200 MG       1      0      0       0

300 MG        0      1      0       0

600 MG        0      0      1       0

PLACEBO       0      0      0       1



        [[alternative HTML version deleted]]

Rhelp is a plain text mailing list.

--
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to