Re: [R] grplasso

2012-01-13 Thread Scott Raynaud
So does anyone use this package?

 
- Original Message -
From: Scott Raynaud scott.rayn...@yahoo.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Tuesday, January 10, 2012 1:40 PM
Subject: grplasso

I want to use the grplasso package on a data set where I want to fit a linear 
model.  My interest is in identifying significant beta coefficients.  The 
documentation is a bit cryptic so I'd appreciate some help.
 
I know this is a strategy for large numbers of variables but consider a simple 
case for pedagogical puposes.  Say I have two 3 category predictors (2 dummies 
each), a binary predictor and a continuous predictor with a continuous outcome:
 
y  x1  x2  x3  x4 x5 x6
rows of data here
..
..
 
Naturally, I want to select x1 and x2 as a group and x3 and x4 as another 
group.  
The documentation has a couple of examples but it's not clear how they 
translate 
to the current problem.  How do I specify my groups and run the lasso 
regression?
 
Looks like this is the grouping part:
 
index-c(NA,)
 
but I'm not sure how to specify the df for the variables past the NA for the 
intercept.
 
Once that's defined the penalty can be specified:
 
lambda - lambdamax(x, y = y, index = index, penscale = sqrt,
model = LogReg()) * 0.5^(0:5) 
In my case I'd use LinReg for the model.  
 
Then the model:
 
fit - grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(),
penscale = sqrt, control = grpl.control(update.hess = lambda, trace = 0))
 
again using LinReg for the model.

This can be plotted against lambda, but when I do lasso regression 
in other software I end up with a plot of the coefficients against the 
tuning parameter with a cutpoint or a table and graph that tells me 
what to include in the model based on some selected criterion.  
It's not clear from the example if there's a cross-validation or some 
other procedure to determine what variables to include.  Plot(fit) 
produces a graph of coefficients against lambda but nothig to indicate 
what to include.  What is used in the package, if anything, to make that 
determination?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grplasso

2012-01-10 Thread Scott Raynaud
I want to use the grplasso package on a data set where I want to fit a linear 
model.  My interest is in identifying significant beta coefficients.  The 
documentation is a bit cryptic so I'd appreciate some help.
 
I know this is a strategy for large numbers of variables but consider a simple 
case for pedagogical puposes.  Say I have two 3 category predictors (2 dummies 
each), a binary predictor and a continuous predictor with a continuous outcome:
 
y  x1  x2  x3  x4 x5 x6
rows of data here
..
..
 
Naturally, I want to select x1 and x2 as a group and x3 and x4 as another 
group.  
The documentation has a couple of examples but it's not clear how they 
translate 
to the current problem.  How do I specify my groups and run the lasso 
regression?
 
Looks like this is the grouping part:
 
index-c(NA,)
 
but I'm not sure how to specify the df for the variables past the NA for the 
intercept.
 
Once that's defined the penalty can be specified:
 
lambda - lambdamax(x, y = y, index = index, penscale = sqrt,
model = LogReg()) * 0.5^(0:5) 
In my case I'd use LinReg for the model.  
 
Then the model:
 
fit - grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(),
penscale = sqrt, control = grpl.control(update.hess = lambda, trace = 0))
 
again using LinReg for the model.

This can be plotted against lambda, but when I do lasso regression 
in other software I end up with a plot of the coefficients against the 
tuning parameter with a cutpoint or a table and graph that tells me 
what to include in the model based on some selected criterion.  
It's not clear from the example if there's a cross-validation or some 
other procedure to determine what variables to include.  Plot(fit) 
produces a graph of coefficients against lambda but nothig to indicate 
what to include.  What is used in the package, if anything, to make that 
determination?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.