[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

Yanbo Liang (JIRA) Wed, 24 Aug 2016 00:54:37 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412
 ]


Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:54 AM:
--------------------------------------------------------------

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> -----------------------------------------------------------------------
>
>                 Key: SPARK-17163
>                 URL: https://issues.apache.org/jira/browse/SPARK-17163
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

Reply via email to