[ 
https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237884#comment-16237884
 ] 

Teng Peng commented on SPARK-22433:
-----------------------------------

Thanks for the quick response, Sean. I am glad this issue is discussed in Spark 
community.

I understand how important coherent is, and it's the users' decision to do what 
they believe is appropriate. 

I just want to propose a one-line change: change eval.setMetricName("r2") to 
"mse" in test("cross validation with linear regression"). Then we would not 
leave the impression that "Wait what? Spark officially cross validate on R2?" 

> Linear regression R^2 train/test terminology related 
> -----------------------------------------------------
>
>                 Key: SPARK-22433
>                 URL: https://issues.apache.org/jira/browse/SPARK-22433
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Teng Peng
>            Priority: Minor
>
> Traditional statistics is traditional statistics. Their goal, framework, and 
> terminologies are not the same as ML. However, in linear regression related 
> components, this distinction is not clear, which is reflected:
> 1. regressionMetric + regressionEvaluator : 
> * R2 shouldn't be there. 
> * A better name "regressionPredictionMetric".
> 2. LinearRegressionSuite: 
> * Shouldn't test R2 and residuals on test data. 
> * There is no train set and test set in this setting.
> 3. Terminology: there is no "linear regression with L1 regularization". 
> Linear regression is linear. Adding a penalty term, then it is no longer 
> linear. Just call it "LASSO", "ElasticNet".
> There are more. I am working on correcting them.
> They are not breaking anything, but it does not make one feel good to see the 
> basic distinction is blurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to