[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436191#comment-15436191
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/25/16 3:22 AM:
--

Exposing a {{family}} or similar parameter sounds good to me.
One question:
{quote}
When the family is set to "binomial" we produce normal logistic regression with 
pivoting and when it is set to "multinomial" (default) it produces logistic 
regression with pivoting. 
{quote}
Should it be {{when it is set to "multinomial" (default) it produces logistic 
regression {color:red}without{color} pivoting}} ? Thanks!



was (Author: yanboliang):
Exposing a {{family}} or similar parameter sounds good to me.

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436191#comment-15436191
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/25/16 3:22 AM:
--

Exposing a {{family}} or similar parameter sounds good to me.
One more question:
{quote}
When the family is set to "binomial" we produce normal logistic regression with 
pivoting and when it is set to "multinomial" (default) it produces logistic 
regression with pivoting. 
{quote}
Should it be {{when it is set to "multinomial" (default) it produces logistic 
regression {color:red}without{color} pivoting}} ? Thanks!



was (Author: yanboliang):
Exposing a {{family}} or similar parameter sounds good to me.
One question:
{quote}
When the family is set to "binomial" we produce normal logistic regression with 
pivoting and when it is set to "multinomial" (default) it produces logistic 
regression with pivoting. 
{quote}
Should it be {{when it is set to "multinomial" (default) it produces logistic 
regression {color:red}without{color} pivoting}} ? Thanks!


> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436191#comment-15436191
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/25/16 3:14 AM:
--

Exposing a {{family}} or similar parameter sounds good to me.


was (Author: yanboliang):
Exposing a {{family}} or similar parameter to control pivoting sounds good to 
me.

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/25/16 3:12 AM:
--

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will be consistent with other ML models such as {{NaiveBayesModel}} which is 
also support multi-class classification. But this will introduce breaking 
change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

Here we have two choice: consolidate them which will introduce breaking change; 
or keep them separately.
-I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!-


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will be consistent with other ML models such as {{NaiveBayesModel}} which is 
also support multi-class classification. But this will introduce big breaking 
change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

Here we have two choice: consolidate them which will introduce breaking change; 
or keep them separately.
-I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!-

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434798#comment-15434798
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:12 PM:
---

Think more about this problem, I change my mind to support consolidate MLOR and 
LOR into one since I saw there are lots of duplicated code between them. I 
think it's worth to make the breaking change, otherwise, it will require extra 
efforts to maintain them. Thanks!


was (Author: yanboliang):
Think more about this problem, I change my mind to support consolidate MLOR and 
LOR into one since I saw there are lots of duplicated code between them. I 
think it's worth to make the breaking change, otherwise, it will require 
efforts to maintain them. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:11 PM:
---

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will be consistent with other ML models such as {{NaiveBayesModel}} which is 
also support multi-class classification. But this will introduce big breaking 
change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

Here we have two choice: consolidate them which will introduce breaking change; 
or keep them separately.
-I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!-


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

Here we have two choice: consolidate them which will introduce breaking change; 
or keep them separately.
-I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!-

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:10 PM:
---

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

Here we have two choice: consolidate them which will introduce breaking change; 
or keep them separately.
-I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!-


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:54 AM:
--

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs. FYI: SPARK-11834 and SPARK-11543.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:52 AM:
--

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:50 AM:
--

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will more or less consistent with other ML models such as {{NaiveBayesModel}} 
which is also support multi-class classification. But this will introduce big 
breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will consistent with other ML models such as {{NaiveBayesModel}} which is also 
support multi-class classification. But this will introduce big breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434412#comment-15434412
 ] 

Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:49 AM:
--

I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will consistent with other ML models such as {{NaiveBayesModel}} which is also 
support multi-class classification. But this will introduce big breaking change.
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!


was (Author: yanboliang):
I think it's hard to unify binary and multinomial logistic regression if we do 
not make any breaking change.
* Like [~sethah] said, we need to find a way to unify the representation of 
{{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is 
still compromise, the best representation should be matrix for {{coefficients}} 
and vector for {{intercept}} even it's a binary classification problem. This 
will consistent with other ML models such as {{NaiveBayesModel}} which is also 
support multi-class classification. 
* MLOR and LOR return different result for binary classification when 
regularization is used.
* Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for 
binary logistic regression and they have some interactions. If we make MLOR and 
LOR share the old LOR code base, it will also introduce breaking change for 
these APIs.
* Model store/load compatibility.

I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly 
hold my opinion if you have better proposal. Thanks!

> Decide on unified multinomial and binary logistic regression interfaces
> ---
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org