[jira] [Created] (SPARK-8651) Lasso with SGD not Converging properly

2015-06-25 Thread Albert Azout (JIRA)
Albert Azout created SPARK-8651:
---

 Summary: Lasso with SGD not Converging properly
 Key: SPARK-8651
 URL: https://issues.apache.org/jira/browse/SPARK-8651
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.4.0
Reporter: Albert Azout


We are having issues getting Lasso with SGD to converge properly. The weights 
outputted are extremely large values. We have tried multiple miniBatchRatios 
and still see same issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1859) Linear, Ridge and Lasso Regressions with SGD yield unexpected results

2015-06-25 Thread Albert Azout (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602388#comment-14602388
 ] 

Albert Azout commented on SPARK-1859:
-

Hi this is still an open issue for us. FYI. Any new resolutions on this?

 Linear, Ridge and Lasso Regressions with SGD yield unexpected results
 -

 Key: SPARK-1859
 URL: https://issues.apache.org/jira/browse/SPARK-1859
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.1
 Environment: OS: Ubuntu Server 12.04 x64
 PySpark
Reporter: Vlad Frolov
  Labels: algorithm, machine_learning, regression

 Issue:
 Linear Regression with SGD don't work as expected on any data, but lpsa.dat 
 (example one).
 Ridge Regression with SGD *sometimes* works ok.
 Lasso Regression with SGD *sometimes* works ok.
 Code example (PySpark) based on 
 http://spark.apache.org/docs/0.9.0/mllib-guide.html#linear-regression-2 :
 {code:title=regression_example.py}
 parsedData = sc.parallelize([
 array([2400., 1500.]),
 array([240., 150.]),
 array([24., 15.]),
 array([2.4, 1.5]),
 array([0.24, 0.15])
 ])
 # Build the model
 model = LinearRegressionWithSGD.train(parsedData)
 print model._coeffs
 {code}
 So we have a line ({{f(X) = 1.6 * X}}) here. Fortunately, {{f(X) = X}} works! 
 :)
 The resulting model has nan coeffs: {{array([ nan])}}.
 Furthermore, if you comment records line by line you will get:
 * [-1.55897475e+296] coeff (the first record is commented), 
 * [-8.62115396e+104] coeff (the first two records are commented),
 * etc
 It looks like the implemented regression algorithms diverges somehow.
 I get almost the same results on Ridge and Lasso.
 I've also tested these inputs in scikit-learn and it works as expected there.
 However, I'm still not sure whether it's a bug or SGD 'feature'. Should I 
 preprocess my datasets somehow?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org