[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-14 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r116408136
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
   assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
 }
 val testParams = Seq(
-  ("binomial", smallBinaryDataset, 2),
-  ("multinomial", smallMultinomialDataset, 3)
+  ("Binomial", smallBinaryDataset, 2),
--- End diff --

The changes you made don't address this comment at all, and there are not 
tests for the suggestion from Yanbo either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-10 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115896918
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
   assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
 }
 val testParams = Seq(
-  ("binomial", smallBinaryDataset, 2),
-  ("multinomial", smallMultinomialDataset, 3)
+  ("Binomial", smallBinaryDataset, 2),
--- End diff --

@sethah OK, I will add dedicated tests for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-10 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115813053
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
   assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
 }
 val testParams = Seq(
-  ("binomial", smallBinaryDataset, 2),
-  ("multinomial", smallMultinomialDataset, 3)
+  ("Binomial", smallBinaryDataset, 2),
--- End diff --

What about "binomial" and "BiNoMiaL"? Also, what about doing:

scala
lr.setFamily("BiNomial")
assert(lr.getFamily === "binomial")


Also, I'm not a big fan of sprinkling these very subtle "tests" around the 
test suite. We should have dedicated tests of this functionality. Otherwise, 
how should future developers know that the capital "B" here is supposed to be 
there and is actually testing some desired functionality?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115669376
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
   case None => histogram.length
 }
 
-val isMultinomial = $(family) match {
+val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {
--- End diff --

good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-09 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115453085
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -890,7 +890,7 @@ object LogisticRegression extends 
DefaultParamsReadable[LogisticRegression] {
   override def load(path: String): LogisticRegression = super.load(path)
 
   private[classification] val supportedFamilyNames =
-Array("auto", "binomial", 
"multinomial").map(_.toLowerCase(Locale.ROOT))
+Array("auto", "binomial", "multinomial")
--- End diff --

+1 @hhbyyh Let's keep it to handle some special locale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-09 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115451397
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
   case None => histogram.length
 }
 
-val isMultinomial = $(family) match {
+val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {
--- End diff --

+1 @zhengruifeng. I'd like to put the original value in param, since users 
may compare the param value with the original input as following:
```
val family = "Binomial"
val lr = new LogisticRegression().setFamily(family)
val model = lr.fit(dataset)
...
if (family == lr.getFamily) 
  println("A")
else
  println("B")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115418289
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
   case None => histogram.length
 }
 
-val isMultinomial = $(family) match {
+val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {
--- End diff --

I follow the style in `GeneralizedLinearRegression`.
Lower the param in setter can simplify the codes, but it also change the 
output of coresponding getter. What is your opinion? @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115418315
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -890,7 +890,7 @@ object LogisticRegression extends 
DefaultParamsReadable[LogisticRegression] {
   override def load(path: String): LogisticRegression = super.load(path)
 
   private[classification] val supportedFamilyNames =
-Array("auto", "binomial", 
"multinomial").map(_.toLowerCase(Locale.ROOT))
+Array("auto", "binomial", "multinomial")
--- End diff --

I am not sure about this. If we should keep `toLowerCase` here, we may also 
do this in `GeneralizedLinearRegression` and others


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115416085
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
   case None => histogram.length
 }
 
-val isMultinomial = $(family) match {
+val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {
--- End diff --

As a general practice, I would recommend moving the 
`.toLowerCase(Locale.ROOT)` into the setter. Then we don't need to invoke the 
`.toLowerCase(Locale.ROOT)` multiple times in the code. (here it happens to be 
once). And we can always assume the $(family) has predictable values in the 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r115416204
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -890,7 +890,7 @@ object LogisticRegression extends 
DefaultParamsReadable[LogisticRegression] {
   override def load(path: String): LogisticRegression = super.load(path)
 
   private[classification] val supportedFamilyNames =
-Array("auto", "binomial", 
"multinomial").map(_.toLowerCase(Locale.ROOT))
+Array("auto", "binomial", "multinomial")
--- End diff --

We may need to be careful to remove the map. Since Locale.Root can be some 
special case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-08 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/17910

[SPARK-20669][ML] LogisticRegression family should be case insensitive 

## What changes were proposed in this pull request?
make param `family` case insensitive 

## How was this patch tested?
updated tests


@yanboliang 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark lr_family_lowercase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17910


commit 33c0f9e52c239a6067a535be9c0ce19772d32aef
Author: Zheng RuiFeng 
Date:   2017-05-09T05:43:13Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org