[ https://issues.apache.org/jira/browse/SPARK-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602388#comment-14602388 ]
Albert Azout commented on SPARK-1859: ------------------------------------- Hi this is still an open issue for us. FYI. Any new resolutions on this? > Linear, Ridge and Lasso Regressions with SGD yield unexpected results > --------------------------------------------------------------------- > > Key: SPARK-1859 > URL: https://issues.apache.org/jira/browse/SPARK-1859 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 0.9.1 > Environment: OS: Ubuntu Server 12.04 x64 > PySpark > Reporter: Vlad Frolov > Labels: algorithm, machine_learning, regression > > Issue: > Linear Regression with SGD don't work as expected on any data, but lpsa.dat > (example one). > Ridge Regression with SGD *sometimes* works ok. > Lasso Regression with SGD *sometimes* works ok. > Code example (PySpark) based on > http://spark.apache.org/docs/0.9.0/mllib-guide.html#linear-regression-2 : > {code:title=regression_example.py} > parsedData = sc.parallelize([ > array([2400., 1500.]), > array([240., 150.]), > array([24., 15.]), > array([2.4, 1.5]), > array([0.24, 0.15]) > ]) > # Build the model > model = LinearRegressionWithSGD.train(parsedData) > print model._coeffs > {code} > So we have a line ({{f(X) = 1.6 * X}}) here. Fortunately, {{f(X) = X}} works! > :) > The resulting model has nan coeffs: {{array([ nan])}}. > Furthermore, if you comment records line by line you will get: > * [-1.55897475e+296] coeff (the first record is commented), > * [-8.62115396e+104] coeff (the first two records are commented), > * etc > It looks like the implemented regression algorithms diverges somehow. > I get almost the same results on Ridge and Lasso. > I've also tested these inputs in scikit-learn and it works as expected there. > However, I'm still not sure whether it's a bug or SGD 'feature'. Should I > preprocess my datasets somehow? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org