Hi Mike,

glmnet has definitely been very successful, and it would be great to see
how we can improve optimization in MLlib!  There is some related work
ongoing; here are the JIRAs:

GLMNET implementation in Spark
<https://issues.apache.org/jira/browse/SPARK-1673>

LinearRegression with L1/L2 (elastic net) using OWLQN in new ML package
<https://issues.apache.org/jira/browse/SPARK-5253>

The GLMNET JIRA has actually been closed in favor of the latter JIRA.
However, if you're getting good results in your experiments, could you
please post them on the GLMNET JIRA and link them from the other JIRA?  If
it's faster and more scalable, that would be great to find out.

As far as where the code should go and the APIs, that can be discussed on
the JIRA.

I hope this helps, and I'll keep an eye out for updates on the JIRAs!

Joseph


On Thu, Feb 19, 2015 at 10:59 AM, <m...@mbowles.com> wrote:

> Dev List,
> A couple of colleagues and I have gotten several versions of glmnet algo
> coded and running on Spark RDD. glmnet algo (
> http://www.jstatsoft.org/v33/i01/paper) is a very fast algorithm for
> generating coefficient paths solving penalized regression with elastic net
> penalties. The algorithm runs fast by taking an approach that generates
> solutions for a wide variety of penalty parameter. We're able to integrate
> into Mllib class structure a couple of different ways. The algorithm may
> fit better into the new pipeline structure since it naturally returns a
> multitide of models (corresponding to different vales of penalty
> parameters). That appears to fit better into pipeline than Mllib linear
> regression (for example).
>
> We've got regression running with the speed optimizations that Friedman
> recommends. We'll start working on the logistic regression version next.
>
> We're eager to make the code available as open source and would like to
> get some feedback about how best to do that. Any thoughts?
> Mike Bowles.
>
>
>

Reply via email to