[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17034 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103864147 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -27,6 +27,9 @@ import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.random.{ExponentialGenerator, WeibullGenerator} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.{ByteType, DecimalType, FloatType, IntegerType, LongType, + ShortType} --- End diff -- The style rule is generally to use `_` when you're importing >= 5 things. You can revert it back, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103858261 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.random.{ExponentialGenerator, WeibullGenerator} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ --- End diff -- Yes, I will update this. Thanks for your reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103858210 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType censors, and not support other types") { +val df = spark.createDataFrame(Seq( + (0, Vectors.dense(0)), + (1, Vectors.dense(1)), + (2, Vectors.dense(2)), + (3, Vectors.dense(3)), + (4, Vectors.dense(4)) +)).toDF("label", "features") + .withColumn("censor", lit(0.0)) +val aft = new AFTSurvivalRegression().setMaxIter(1) +val expected = aft.fit(df) + +val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DecimalType(10, 0)) +types.foreach { t => + val actual = aft.fit(df.select(col("label"), col("features"), +col("censor").cast(t))) + assert(expected.intercept === actual.intercept) + assert(expected.coefficients === actual.coefficients) +} + +val dfWithStringCensors = spark.createDataFrame(Seq( + (0, Vectors.dense(0, 2, 3), "0") +)).toDF("label", "features", "censor") +val thrown = intercept[IllegalArgumentException] { --- End diff -- This place follows the implementation in `MLTestingUtils.checkNumericTypes`, so I prefer not to change this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103853263 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.random.{ExponentialGenerator, WeibullGenerator} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ --- End diff -- I think it is discouraged for readability reasons to use _, consider specifying the list of types here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103853075 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType censors, and not support other types") { +val df = spark.createDataFrame(Seq( + (0, Vectors.dense(0)), + (1, Vectors.dense(1)), + (2, Vectors.dense(2)), + (3, Vectors.dense(3)), + (4, Vectors.dense(4)) +)).toDF("label", "features") + .withColumn("censor", lit(0.0)) +val aft = new AFTSurvivalRegression().setMaxIter(1) +val expected = aft.fit(df) + +val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DecimalType(10, 0)) +types.foreach { t => + val actual = aft.fit(df.select(col("label"), col("features"), +col("censor").cast(t))) + assert(expected.intercept === actual.intercept) + assert(expected.coefficients === actual.coefficients) +} + +val dfWithStringCensors = spark.createDataFrame(Seq( + (0, Vectors.dense(0, 2, 3), "0") +)).toDF("label", "features", "censor") +val thrown = intercept[IllegalArgumentException] { --- End diff -- can you wrap this in a withClue("Column censor must be of type NumericType but was actually of type StringType") { ... } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r102727229 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType censors, and not support other types") { +val df = spark.createDataFrame(Seq( + (0, Vectors.dense(0)), + (1, Vectors.dense(1)), + (2, Vectors.dense(2)), + (3, Vectors.dense(3)), + (4, Vectors.dense(4)) +)).toDF("label", "features") + .withColumn("censor", lit(0.0)) +val aft = new AFTSurvivalRegression().setMaxIter(1) +val expected = aft.fit(df) + +val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DecimalType(10, 0)) +types.foreach { t => + val actual = aft.fit(df.select(col("label"), col("features"), +col("censor").cast(t))) + assert(expected.intercept === actual.intercept) + assert(expected.coefficients === actual.coefficients) +} + +val dfWithStringCensors = spark.createDataFrame(Seq( --- End diff -- Technically I guess this could be part of `checkNumericTypes` similar to checking weight and label cols, but since it is specific to AFT this is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/17034 [SPARK-19704][ML] AFTSurvivalRegression should support numeric censorCol ## What changes were proposed in this pull request? make `AFTSurvivalRegression` support numeric censorCol ## How was this patch tested? existing tests and added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark aft_numeric_censor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17034.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17034 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org