[jira] [Updated] (SPARK-11920) ML LinearRegression should use correct dataset in examples and user guide doc

Yanbo Liang (JIRA) Mon, 23 Nov 2015 01:11:47 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yanbo Liang updated SPARK-11920:
--------------------------------
    Description: 
ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in 
examples and user guide doc, but it's actually classification dataset rather 
than regression dataset. We should use 
data/mllib/sample_linear_regression_data.txt instead.
The deeper causes is that LinearRegression with "normal" solver can not solve 
this dataset correctly, may be due to the ill condition and unreasonable label. 
This issue has been reported at SPARK-11918.
So we should make this change in examples and user guides, that can clearly 
illustrate the usage of LinearRegression algorithm.

  was:
ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in 
examples and user guide doc, but it's actually classification dataset rather 
than regression dataset. We should use 
data/mllib/sample_linear_regression_data.txt instead.
The deeper level reason is that LinearRegression with "normal" solver can not 
solve this dataset correctly, may be due to the ill condition and unreasonable 
label. This issue has been reported at SPARK-11918.
So we should make this change in examples and user guides, that can clearly 
illustrate the usage of LinearRegression algorithm.


> ML LinearRegression should use correct dataset in examples and user guide doc
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-11920
>                 URL: https://issues.apache.org/jira/browse/SPARK-11920
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, ML
>            Reporter: Yanbo Liang
>            Priority: Minor
>
> ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in 
> examples and user guide doc, but it's actually classification dataset rather 
> than regression dataset. We should use 
> data/mllib/sample_linear_regression_data.txt instead.
> The deeper causes is that LinearRegression with "normal" solver can not solve 
> this dataset correctly, may be due to the ill condition and unreasonable 
> label. This issue has been reported at SPARK-11918.
> So we should make this change in examples and user guides, that can clearly 
> illustrate the usage of LinearRegression algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11920) ML LinearRegression should use correct dataset in examples and user guide doc

Reply via email to