Re: Have Friedman's glmnet algo running in Spark

mike Tue, 04 Aug 2015 08:22:47 -0700

 My friends and I are continuing work on the algorithm. You are right that 
there are two elements to Friedman's glmnet algorithm. One is the use of 
coordinate descent for minimizing penalized regression with an absolute value 
penalty and the other is managing the regularization parameters. Friedmans 
algorithm does return the the entire regularization path. We have had to get 
fairly deep into the mechanics of linear algebra. The tricky part has been 
arranging the matrix and vector multiplications to minimize the compute times - 
(e.g. big time differences between multiplying by a submatrix versus 
mulbiplying by the columns in the submatrix, etc. )


All of the versions we've produced generate a multitude of solutions (default = 
100) for a range of different values of the regularization parameter. The 
solutions always cover the most heavily penalized end of the curve. The number 
of solutions generated depends on how fine the steps are and how close the 
solutions get to the fully saturated (un-penalized) solution. Default values 
for these work about 80% of the time.

Personally, i've always found it useful to have the entire regularization path. 
One way or another, that's always required to get a final solution. It's just a 
question of whether the points on the path are generated by hunting and pecking 
or done all in one shot systematically.
mike






-----Original Message-----
From: Patrick [mailto:petz2...@gmail.com]
Sent: Tuesday, August 4, 2015 12:50 AM
To: d...@sparapache.org
Subject: Re: Have Friedman's glmnet algo running in Spark

I have a follow up on this: I see on JIRA that the idea of having a GLMNET imp 
entation was more orless abandoned, since a OWLQN implementation was chosen to 
construct a modelusing L1/L2 regularization. However, GLMNET has the property 
of "returning a multitide of models(corresponding to different vales of penalty 
parameters [for theregularization])". I think this is not the case in the OWLQN 
implementation. However, this would be really helpful to compare the accuracy 
of models withdifferent regParam values. As far as I understood, this would 
avoid to have a costly cross-validationstep over a possibly large set of 
regParam values. Joseph Bradley wrote> Some of this discussion seems valuable 
enough to preserve on the JIRA; can> we move it there (and copy any relevant 
discussions from previous emails> as> needed)?> > On Wed, Feb 25, 2015 at 10:35 
AM, <> mike@> > wrote:--View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Have-Friedman-s-glmnet-algo-running-in-Spark-tp10692p13587.htmlSent
 from the Apache Spark Developers List mailing list archive at 
Nabble.com.---------------------------------------------------------------------To
 unsubscribe, e-mail: dev-unsubscribe@spark.apache.orgFor additional commands, 
e-mail: dev-h...@spark.apache.org

Re: Have Friedman's glmnet algo running in Spark

Reply via email to