Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread DB Tsai
Please see the current version of code for better documentation. https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread Wei Zhou
Hi DB Tsai, Thanks for your reply. I went through the source code of LinearRegression.scala. The algorithm minimizes square error L = 1/2n ||A weights - y||^2^. I cannot match this with the elasticNet loss function found here http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html, which is the

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread Wei Zhou
Thanks DB Tsai, it is very helpful. Cheers, Wei 2015-06-23 16:00 GMT-07:00 DB Tsai dbt...@dbtsai.com: Please see the current version of code for better documentation. https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread DB Tsai
The regularization is handled after the objective function of data is computed. See https://github.com/apache/spark/blob/6a827d5d1ec520f129e42c3818fe7d0d870dcbef/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala line 348 for L2. For L1, it's handled by OWLQN, so you

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-19 Thread DB Tsai
Hi Wei, I don't think ML is meant for single node computation, and the algorithms in ML are designed for pipeline framework. In short, the lasso regression in ML is new algorithm implemented from scratch, and it's faster, and converged to the same solution as R's glmnet but with scalability.

Difference between Lasso regression in MLlib package and ML package

2015-06-19 Thread Wei Zhou
Hi Spark experts, I see lasso regression/ elastic net implementation under both MLLib and ML, does anyone know what is the difference between the two implementation? In spark summit, one of the keynote speakers mentioned that ML is meant for single node computation, could anyone elaborate this?