On Sep 13, 2013, at 9:33 AM, E Joffe wrote:
Thank you so much for your answer !
As far as I understand, glmnet doesn't accept categorical variables
only
binary factors - so I had to create dummy variables for all
categorical
variables.
It worked perfectly.
It's not exactly clear what worked perfectly. Since glmnet will only
accept a matrix as its `x` data input, did you use model.matrix to
construct the "dummies" and cbind your numeric predictors to that
result? If you just assigned a factor attribute, it's more likely that
you didn't actually use "dummies" but rather regressed on the integer
values of the factor.
--
David.
--
Erel
Erel Joffe MD MSc
School of Biomedical Informatics
University of Texas - Health Science Center in Houston
832.287.0829 (c)
-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Friday, September 13, 2013 3:05 PM
To: E Joffe
Cc: r-help@r-project.org
Subject: Re: [R] Creating dummy vars with contrasts - why does the
returned
identity matrix contain all levels (and not n-1 levels) ?
On Sep 13, 2013, at 4:15 AM, E Joffe wrote:
Hello,
I have a problem with creating an identity matrix for glmnet by using
the contrasts function.
Why do you want to do this?
I have a factor with 4 levels.
When I create dummy variables I think there should be n-1 variables
(in this case 3) - so that the contrasts would be against the
baseline
level.
This is also what is written in the help file for 'contrasts'.
The problem is that the function creates a matrix with n variables
(i.e. the same as the number of levels) and not n-1 (where I would
have a baseline level for comparison).
Only if you specify contrasts=FALSE does it do so and this is
documented in
that help file.
My questions are:
1. How can I create a matrix with n-1 dummy vars ?
See below.
was I supposed to
define explicitly that I want contr.treatment (contrasts) ?
No need to do so.
2. If it is not possible, how should I interpret the hazard
ratios in
the Cox regression I am generating (I use glmnet for variable
selection and
then generate a Cox regression) - That is, if I get an HR of 3 for
the
variable 300mg what does it mean ? the hazard is 3 times higher of
what ?
Relative hazards are generally referenced to the "baseline hazard",
i.e. the hazard for a group with the omitted level for treatment
constrasts and the mean value for any numeric predictors.
Here is some code to reproduce the issue:
# Create a 4 level example factor
trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"),
100, replace=TRUE ) )
# If your intent is to use constrasts different than the defaults used
by
# regression functions, these factor contrasts need to be assigned,
either
# within the construction of the factor or after the fact.
contrasts(trt)
300 MG 600 MG PLACEBO
1200 MG 0 0 0
300 MG 1 0 0
600 MG 0 1 0
PLACEBO 0 0 1
# the default value for the contrasts parameter is TRUE and the
default type is treatement
# That did not cause any change to the 'trt'-object:
trt
#To make a change you need to use the `contrasts<-` function:
contrasts (trt) <- contrasts(trt)
trt
# Use contrasts to get the identity matrix of dummy variables to be
used in
glmnet
trt2 <- contrasts (trt,contrasts=FALSE)
Results (as you can see all levels are represented in the identity
matrix):
levels (trt)
[1] "1200 MG" "300 MG" "600 MG" "PLACEBO"
print (trt2)
1200 MG 300 MG 600 MG PLACEBO
1200 MG 1 0 0 0
300 MG 0 1 0 0
600 MG 0 0 1 0
PLACEBO 0 0 0 1
[[alternative HTML version deleted]]
Rhelp is a plain text mailing list.
--
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.