[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

Yanbo Liang (JIRA) Wed, 24 Aug 2016 00:52:43 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412
 ]


Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:52 AM:
--------------------------------------------------------------

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> -----------------------------------------------------------------------
>
>                 Key: SPARK-17163
>                 URL: https://issues.apache.org/jira/browse/SPARK-17163
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

Reply via email to