[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103864147
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -27,6 +27,9 @@ import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.random.{ExponentialGenerator, 
WeibullGenerator}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions.{col, lit}
+import org.apache.spark.sql.types.{ByteType, DecimalType, FloatType, 
IntegerType, LongType,
+  ShortType}
--- End diff --

The style rule is generally to use `_` when you're importing >= 5 things. 
You can revert it back, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103858261
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.random.{ExponentialGenerator, 
WeibullGenerator}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
--- End diff --

Yes, I will update this. Thanks for your reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103858210
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite
   }
   }
 
+  test("should support all NumericType censors, and not support other 
types") {
+val df = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0)),
+  (1, Vectors.dense(1)),
+  (2, Vectors.dense(2)),
+  (3, Vectors.dense(3)),
+  (4, Vectors.dense(4))
+)).toDF("label", "features")
+  .withColumn("censor", lit(0.0))
+val aft = new AFTSurvivalRegression().setMaxIter(1)
+val expected = aft.fit(df)
+
+val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DecimalType(10, 0))
+types.foreach { t =>
+  val actual = aft.fit(df.select(col("label"), col("features"),
+col("censor").cast(t)))
+  assert(expected.intercept === actual.intercept)
+  assert(expected.coefficients === actual.coefficients)
+}
+
+val dfWithStringCensors = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0, 2, 3), "0")
+)).toDF("label", "features", "censor")
+val thrown = intercept[IllegalArgumentException] {
--- End diff --

This place follows the implementation in 
`MLTestingUtils.checkNumericTypes`, so I prefer not to change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103853263
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.random.{ExponentialGenerator, 
WeibullGenerator}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
--- End diff --

I think it is discouraged for readability reasons to use _, consider 
specifying the list of types here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103853075
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite
   }
   }
 
+  test("should support all NumericType censors, and not support other 
types") {
+val df = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0)),
+  (1, Vectors.dense(1)),
+  (2, Vectors.dense(2)),
+  (3, Vectors.dense(3)),
+  (4, Vectors.dense(4))
+)).toDF("label", "features")
+  .withColumn("censor", lit(0.0))
+val aft = new AFTSurvivalRegression().setMaxIter(1)
+val expected = aft.fit(df)
+
+val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DecimalType(10, 0))
+types.foreach { t =>
+  val actual = aft.fit(df.select(col("label"), col("features"),
+col("censor").cast(t)))
+  assert(expected.intercept === actual.intercept)
+  assert(expected.coefficients === actual.coefficients)
+}
+
+val dfWithStringCensors = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0, 2, 3), "0")
+)).toDF("label", "features", "censor")
+val thrown = intercept[IllegalArgumentException] {
--- End diff --

can you wrap this in a withClue("Column censor must be of type NumericType 
but was actually of type StringType") {
...
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-02-23 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r102727229
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite
   }
   }
 
+  test("should support all NumericType censors, and not support other 
types") {
+val df = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0)),
+  (1, Vectors.dense(1)),
+  (2, Vectors.dense(2)),
+  (3, Vectors.dense(3)),
+  (4, Vectors.dense(4))
+)).toDF("label", "features")
+  .withColumn("censor", lit(0.0))
+val aft = new AFTSurvivalRegression().setMaxIter(1)
+val expected = aft.fit(df)
+
+val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DecimalType(10, 0))
+types.foreach { t =>
+  val actual = aft.fit(df.select(col("label"), col("features"),
+col("censor").cast(t)))
+  assert(expected.intercept === actual.intercept)
+  assert(expected.coefficients === actual.coefficients)
+}
+
+val dfWithStringCensors = spark.createDataFrame(Seq(
--- End diff --

Technically I guess this could be part of `checkNumericTypes` similar to 
checking weight and label cols, but since it is specific to AFT this is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-02-22 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/17034

[SPARK-19704][ML] AFTSurvivalRegression should support numeric censorCol

## What changes were proposed in this pull request?
make `AFTSurvivalRegression` support numeric censorCol
## How was this patch tested?
existing tests and added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark aft_numeric_censor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17034






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org