On Sep 13, 2013, at 11:21 PM, E Joffe wrote:

Hi David,

First I ordered the levels of each factor in a descending order based on
frequency.
Then, I used the following code to generate a matrix from the dataframe with
dummy variables and  subsequently run the glmnet (coxnet)

## tranform categorical variables into binary variables with dummy for
trainSet
predict_matrix <- model.matrix(~ ., data=trainSet,
                             contrasts.arg = lapply
(trainSet[,sapply(trainSet, is.factor)], contrasts))

## remove the status/time variables from the predictor matrix (x) for
glmnet
predict_matrix <- subset (predict_matrix, select=c(-time,-status))

## create a glmnet cox object using lasso regularization and cross
validation
glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")


I hope I did not do anything wrong .....

Can't thank you enough for your advice and interest.

Thank you for outlining the process that you used. It looks "from the outside" as though it respects the constraints on the first two argument imposed by the more constrained input requirements of cv.glmnet. I didn't realize that subset could accept a `-`sign as an operator inside a c() expression, but if you are getting success then I guess it must.

--
David.



Erel



-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Friday, September 13, 2013 8:51 PM
To: E Joffe
Cc: r-help@r-project.org
Subject: Re: [R] Creating dummy vars with contrasts - why does the returned
identity matrix contain all levels (and not n-1 levels) ?


On Sep 13, 2013, at 9:33 AM, E Joffe wrote:

Thank you so much for your answer  !
As far as I understand, glmnet doesn't accept categorical variables
only binary factors - so I had to create dummy variables for all
categorical variables.

I was rather puzzled by your question. The conventions used by glmnet should prevent constrasts from being pre-specified. Only matrices are accepted as data objects and one cannot assign contrast attributes to matrix columns.

It worked perfectly.
Erel


Erel Joffe MD MSc
School of Biomedical Informatics
University of Texas - Health Science Center in Houston
832.287.0829 (c)

-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Friday, September 13, 2013 3:05 PM
To: E Joffe
Cc: r-help@r-project.org
Subject: Re: [R] Creating dummy vars with contrasts - why does the
returned identity matrix contain all levels (and not n-1 levels) ?


On Sep 13, 2013, at 4:15 AM, E Joffe wrote:

Hello,



I have a problem with creating an identity matrix for glmnet by using
the contrasts function.

Why do you want to do this?

I have a factor with 4 levels.

When I create dummy variables I think there should be n-1 variables
(in this case 3) - so that the contrasts would be against the
baseline level.

This is also what is written in the help file for 'contrasts'.

The problem is that the function creates a matrix with n variables
(i.e. the same as the number of levels) and not n-1 (where I would
have a baseline level for comparison).

Only if you specify contrasts=FALSE does it do so and this is
documented in that help file.



My questions are:

1.       How can I create a matrix with n-1 dummy vars ?

See below.

was I supposed to
define explicitly that I want contr.treatment (contrasts) ?

No need to do so.


2.       If it is not possible, how should I interpret the hazard
ratios in
the Cox regression I am generating (I use glmnet for variable
selection and
then generate a Cox regression)  - That is, if I get an HR of 3 for
the
variable 300mg what does it mean ? the hazard is 3 times higher of
what ?


Relative hazards are generally referenced to the "baseline hazard",
i.e. the hazard for a group with the omitted level for treatment
constrasts and the mean value for any numeric predictors.

Here is some code to reproduce the issue:

# Create a 4 level example factor

trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"),

                 100, replace=TRUE ) )

# If your intent is to use constrasts different than the defaults used
by
#  regression functions, these factor contrasts need to be assigned,
either
# within the construction of the factor or after the fact.

contrasts(trt)
    300 MG 600 MG PLACEBO
1200 MG      0      0       0
300 MG       1      0       0
600 MG       0      1       0
PLACEBO      0      0       1

# the default value for the contrasts parameter is TRUE and the
default type is treatement

# That did not cause any change to the 'trt'-object:
trt

#To make a change you need to use the `contrasts<-` function:

contrasts (trt) <- contrasts(trt)
trt


# Use contrasts to get the identity matrix of dummy variables to be
used in
glmnet

trt2 <- contrasts (trt,contrasts=FALSE)

Results (as you can see all levels are represented in the identity
matrix):

levels (trt)
[1] "1200 MG" "300 MG"  "600 MG"  "PLACEBO"


print (trt2)

  1200 MG 300 MG 600 MG PLACEBO

1200 MG       1      0      0       0

300 MG        0      1      0       0

600 MG        0      0      1       0

PLACEBO       0      0      0       1



        [[alternative HTML version deleted]]

Rhelp is a plain text mailing list.

--
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA



David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to