[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12467


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-13 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218967787
  
Merged to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218854678
  
pinging @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218829934
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218829938
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58503/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218829747
  
**[Test build #58503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58503/consoleFull)**
 for PR 12467 at commit 
[`3786ef9`](https://github.com/apache/spark/commit/3786ef9b382c8f1942bb1d81812586b7c06cfb5c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218819415
  
**[Test build #58503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58503/consoleFull)**
 for PR 12467 at commit 
[`3786ef9`](https://github.com/apache/spark/commit/3786ef9b382c8f1942bb1d81812586b7c06cfb5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218817929
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58502/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218817889
  
**[Test build #58502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58502/consoleFull)**
 for PR 12467 at commit 
[`23f80ee`](https://github.com/apache/spark/commit/23f80eeb1d64e8701daf30c499aae0fbc2739d40).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218817926
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218815288
  
**[Test build #58502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58502/consoleFull)**
 for PR 12467 at commit 
[`23f80ee`](https://github.com/apache/spark/commit/23f80eeb1d64e8701daf30c499aae0fbc2739d40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218744614
  
@yanboliang thanks a lot!
Will rebase soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218743714
  
Sorry for late response. This LGTM after the conflicts be resolved. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r63011688
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +290,23 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
+val formula = new RFormula().setFormula("label ~ features")
+  .setLabelCol("x")
+  .setFeaturesCol("y")
+val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext)
+val expected = formula.fit(dfs(DoubleType))
+val actuals = dfs.keys.filter(_ != DoubleType).map(t => 
formula.fit(dfs(t)))
+actuals.foreach { actual =>
+  assert(expected.pipelineModel.stages.length === 
actual.pipelineModel.stages.length)
+  
expected.pipelineModel.stages.zip(actual.pipelineModel.stages).foreach {
+case (exTransformer, acTransformer) =>
+  assert(exTransformer.params === acTransformer.params)
+  }
+  assert(expected.resolvedFormula.label === 
actual.resolvedFormula.label)
+  assert(expected.resolvedFormula.terms === 
actual.resolvedFormula.terms)
+  assert(expected.resolvedFormula.hasIntercept === 
actual.resolvedFormula.hasIntercept)
+}
--- End diff --

Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-05-12 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-218730580
  
Ping @yanboliang


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-29 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r61649874
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +290,23 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
+val formula = new RFormula().setFormula("label ~ features")
+  .setLabelCol("x")
+  .setFeaturesCol("y")
+val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext)
+val expected = formula.fit(dfs(DoubleType))
+val actuals = dfs.keys.filter(_ != DoubleType).map(t => 
formula.fit(dfs(t)))
+actuals.foreach { actual =>
+  assert(expected.pipelineModel.stages.length === 
actual.pipelineModel.stages.length)
+  
expected.pipelineModel.stages.zip(actual.pipelineModel.stages).foreach {
+case (exTransformer, acTransformer) =>
+  assert(exTransformer.params === acTransformer.params)
+  }
+  assert(expected.resolvedFormula.label === 
actual.resolvedFormula.label)
+  assert(expected.resolvedFormula.terms === 
actual.resolvedFormula.terms)
+  assert(expected.resolvedFormula.hasIntercept === 
actual.resolvedFormula.hasIntercept)
+}
--- End diff --

@yanboliang is this what you had in mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215886842
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57359/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215886840
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215886734
  
**[Test build #57359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57359/consoleFull)**
 for PR 12467 at commit 
[`79b0f9d`](https://github.com/apache/spark/commit/79b0f9d201f560a203b69630399dd348f62aaae6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215878828
  
**[Test build #57359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57359/consoleFull)**
 for PR 12467 at commit 
[`79b0f9d`](https://github.com/apache/spark/commit/79b0f9d201f560a203b69630399dd348f62aaae6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-28 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r61384769
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

Sure, thanks for your input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-27 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215308986
  
Look good overall, I have my last inline comment. After that, it should be 
ready to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r61372578
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

@BenFradet Sorry for late response. 
FYI: 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/r/RWrappers.scala#L44


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-27 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-215190221
  
This LGTM. @yanboliang anything further?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60735190
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

@MLnick The code in master is right.
For the first example:
```
scala> val original = sqlContext.createDataFrame(Seq((0, 1), (2, 
2))).toDF("x", "y")
original: org.apache.spark.sql.DataFrame = [x: int, y: int]

scala> formula.fit(original).transform(original).show
+---+---++-+
|  x|  y|features|label|
+---+---++-+
|  0|  1|   [0.0]|  1.0|
|  2|  2|   [2.0]|  2.0|
+---+---++-+
```
It confirms with the design of ```RFormula``` which transform dataset with 
column names ```a, b``` to two columns: ```label``` and ```features```.
For the second one:
It make sense that L244 catch this exception. In this example, we can not 
transform ```y``` whose type is ```Seq``` to ```label```, so throw exception 
inside of function ```transformLabel``` is reasonable.
In ```checkCanTransform``` we only have two kinds of guarantee:
#1 Label column not exists.
#2 If label column exists, it must be type of ```Double/Numeric```. If we 
are in this situation, we will fall into branch ```if 
(hasLabelCol(dataset.schema))``` of ```transformLabel```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60730417
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

ah I see now, never mind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60729785
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

e.g. before this PR this works (and I don't believe it's supposed to?).
```
scala> val original = sqlContext.createDataFrame(Seq((0, 1), (2, 
2))).toDF("x", "y")
original: org.apache.spark.sql.DataFrame = [x: int, y: int]

scala> formula.fit(original).transform(original).show
+---+---++-+
|  x|  y|features|label|
+---+---++-+
|  0|  1|   [0.0]|  1.0|
|  2|  2|   [2.0]|  2.0|
+---+---++-+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60729720
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

+1 @BenFradet It should be ```||```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60729110
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

I don't think so no. What do you think @yanboliang ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60727908
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -254,8 +254,8 @@ class RFormulaModel private[feature](
 val columnNames = schema.map(_.name)
 require(!columnNames.contains($(featuresCol)), "Features column 
already exists.")
 require(
-  !columnNames.contains($(labelCol)) || schema($(labelCol)).dataType 
== DoubleType,
-  "Label column already exists and is not of type DoubleType.")
+  !columnNames.contains($(labelCol)) || 
schema($(labelCol)).dataType.isInstanceOf[NumericType],
--- End diff --

Should the `||` not be `&&`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-212122934
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-212122941
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56258/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-212122615
  
**[Test build #56258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56258/consoleFull)**
 for PR 12467 at commit 
[`4cb27cf`](https://github.com/apache/spark/commit/4cb27cfe331eb6cc985aaef5c8187d0419dd5474).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-212108645
  
**[Test build #56258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56258/consoleFull)**
 for PR 12467 at commit 
[`4cb27cf`](https://github.com/apache/spark/commit/4cb27cfe331eb6cc985aaef5c8187d0419dd5474).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60300681
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

After reviewing the suite, I don't think the same tests apply since 
RFormula also accepts string labels.

Consequently, I think it's best as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60255560
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

Or simply just:

```scala
val schema = dataset.schema
SchemaUtils.checkNumericType(schema, $(labelCol))
```

at the beginning of RFormula's `fit` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60250070
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

It'd work expect for the expected exception when dealing with a dataframe 
containing string labels because the label column gets indexed by RFormula's 
`fit`.
Consequently, an exception is thrown by `StringIndexer`.

What I could do is add a `validateSchema` to RFormula (called at the 
beginiing of the `fit` method) checking that the label column is of numeric 
type, then I could use `MLTestingUtils.checkNumericTypes`.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12467#discussion_r60245818
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -290,4 +291,18 @@ class RFormulaSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 val newModel = testDefaultReadWrite(model)
 checkModelData(model, newModel)
   }
+
+  test("should support all NumericType labels") {
--- End diff --

Can we use ```MLTestingUtils.checkNumericTypes``` to test this? It will 
eliminate some redundant code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211898854
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211898857
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56220/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211898483
  
**[Test build #56220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56220/consoleFull)**
 for PR 12467 at commit 
[`c9edad0`](https://github.com/apache/spark/commit/c9edad04113f2c75980b2cfc2f74128f7f5259ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211876908
  
**[Test build #56220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56220/consoleFull)**
 for PR 12467 at commit 
[`c9edad0`](https://github.com/apache/spark/commit/c9edad04113f2c75980b2cfc2f74128f7f5259ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211516223
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211516230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56068/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211516011
  
**[Test build #56068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56068/consoleFull)**
 for PR 12467 at commit 
[`f05c217`](https://github.com/apache/spark/commit/f05c217e0e87a9f9758a1d380b80d1f39751fffe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211487432
  
**[Test build #56068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56068/consoleFull)**
 for PR 12467 at commit 
[`f05c217`](https://github.com/apache/spark/commit/f05c217e0e87a9f9758a1d380b80d1f39751fffe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211467763
  
@yanboliang yup, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211466549
  
@BenFradet I'm sorry that I did not notice the #10355 has been merged. 
Please ignore my last comments because it's not valid after #10355. Would you 
mind to add RFormula support back? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211457885
  
@yanboliang thanks for your input, I reverted the affected commits


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/12467#issuecomment-211451163
  
@BenFradet Thanks for this PR. I think we can not make ```RFormula``` 
supporting other numeric types for label as your proposal. If the label column 
already exists, it must be type of DoubleType, otherwise it will cause the 
downstream model can not recognize the label column when 
```validateAndTransformSchema```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13961][ML] spark.ml ChiSqSelector and R...

2016-04-18 Thread BenFradet
GitHub user BenFradet opened a pull request:

https://github.com/apache/spark/pull/12467

[SPARK-13961][ML] spark.ml ChiSqSelector and RFormula should support other 
numeric types for label

## What changes were proposed in this pull request?

Made ChiSqSelector and RFormula accept all numeric types for label

## How was this patch tested?

Unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BenFradet/spark SPARK-13961

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12467


commit ac7db9ec38e72655b8312dfbd8e39b9c7cf84520
Author: BenFradet 
Date:   2016-04-06T16:53:08Z

ChiSqSelector now accepts numeric labels

commit cd62e84fb54471edbf88a369bde8e51398894450
Author: BenFradet 
Date:   2016-04-06T18:03:46Z

spec for ChiSqSelector accepting numeric types

commit 0bc5057c14552739dcad3be0c5bd1b58e4849804
Author: BenFradet 
Date:   2016-04-18T07:04:21Z

RFormula now accepts all numeric type

commit c9edad04113f2c75980b2cfc2f74128f7f5259ad
Author: BenFradet 
Date:   2016-04-18T07:04:51Z

spec for RFormula accepting all numeric types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org