[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204676362 No problem, thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10355 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204623657 LGTM I'll go ahead and merge this with master before a conflict happens Thanks for the PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204478494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204478499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54707/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204478315 **[Test build #54707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204465968 **[Test build #54707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204279426 Will fix that, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204275787 One very minor comment on imports, and think there are now merge conflicts with latest master that need fixing, subject to those LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r58168089 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Row} import org.apache.spark.sql.functions._ -import org.apache.spark.sql.types.{DataType, DoubleType, StructType} +import org.apache.spark.sql.types.{DataType, DoubleType, NumericType, StructType} --- End diff -- minor, but I believe the `NumericType` import is no longer necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203660671 LGTM @MLnick ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203641373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54549/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203641371 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203641203 **[Test build #54549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203627025 **[Test build #54549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203617375 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203617385 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54547/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203617194 **[Test build #54547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203606312 **[Test build #54547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203522117 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54535/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203522113 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203522095 **[Test build #54535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203521461 **[Test build #54535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203103276 **[Test build #54458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203103291 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54458/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203103287 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-203102318 **[Test build #54458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57796113 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -17,14 +17,116 @@ package org.apache.spark.ml.util -import org.apache.spark.ml.Model +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor} import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.regression.Regressor +import org.apache.spark.ml.tree.impl.TreeTests +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ -object MLTestingUtils { +object MLTestingUtils extends SparkFunSuite { def checkCopy(model: Model[_]): Unit = { val copied = model.copy(ParamMap.empty) .asInstanceOf[Model[_]] assert(copied.parent.uid == model.parent.uid) assert(copied.parent == model.parent) } + + def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]]( --- End diff -- good point, will update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57768696 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -17,14 +17,116 @@ package org.apache.spark.ml.util -import org.apache.spark.ml.Model +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor} import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.regression.Regressor +import org.apache.spark.ml.tree.impl.TreeTests +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ -object MLTestingUtils { +object MLTestingUtils extends SparkFunSuite { def checkCopy(model: Model[_]): Unit = { val copied = model.copy(ParamMap.empty) .asInstanceOf[Model[_]] assert(copied.parent.uid == model.parent.uid) assert(copied.parent == model.parent) } + + def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]]( --- End diff -- Is there a reason to have two separate methods for predictor and estimator? I also tend to prefer _not_ separating the "accept" and "reject" checks, and instead have them in a single function call. This will reduce the complexity of having to decide which one to use (and why) and also reduce the number of function calls in future tests. I was able to get the tests working having a single method for accept/reject and estimator/predictor, like this: ```scala def checkEstimatorNumericTypes[M <: Model[M], T <: Estimator[M]]( predictor: T, isClassification: Boolean, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = { val dfs = if (isClassification) { genClassifDFWithNumericLabelCol(sqlContext) } else { genRegressionDFWithNumericLabelCol(sqlContext) } val expected = predictor.fit(dfs(DoubleType)) val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t))) actuals.foreach(actual => check(expected, actual)) val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext) val thrown = intercept[IllegalArgumentException] { predictor.fit(dfWithStringLabels) } assert(thrown.getMessage contains "Column label must be of type NumericType but was actually of type StringType") } ``` I had to change the fit method of OneVsRest to include a call to `transformSchema(dataset.schema)` so that it would fail properly on non-numeric label column. I think it's better to change the fit method so that all ML estimators provide consistent errors. Also note that the `isClassification` flag is needed because MLPClassifier does not inherit from Classifier, and so it is not easy to infer that it is a classification estimator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-202744058 Sure, thanks for your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-202603261 This looks to be in very good shape. My main comment is about unit test time: the tests take a significant amount of time. Wherever it is easy to do, could you set the algorithm parameters to make for faster tests? E.g., use 1 iteration for LogisticRegression, use maxDepth 0 for trees, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57643358 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -17,14 +17,116 @@ package org.apache.spark.ml.util -import org.apache.spark.ml.Model +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor} import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.regression.Regressor +import org.apache.spark.ml.tree.impl.TreeTests +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ -object MLTestingUtils { +object MLTestingUtils extends SparkFunSuite { def checkCopy(model: Model[_]): Unit = { val copied = model.copy(ParamMap.empty) .asInstanceOf[Model[_]] assert(copied.parent.uid == model.parent.uid) assert(copied.parent == model.parent) } + + def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]]( + predictor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = { +val dfs = if (predictor.isInstanceOf[Regressor[_, _, _]]) { + genRegressionDFWithNumericLabelCol(sqlContext) +} else { + genClassifDFWithNumericLabelCol(sqlContext) +} +val expected = predictor.fit(dfs(DoubleType)) +val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t))) +actuals.foreach(actual => check(expected, actual)) + } + + def checkPredictorRejectNotNumericTypes( + predictor: Predictor[_, _, _], sqlContext: SQLContext): Unit = { +val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext) +val thrown = intercept[IllegalArgumentException] { + predictor.fit(dfWithStringLabels) +} +assert(thrown.getMessage contains + "Column label must be of type NumericType but was actually of type StringType") + } + + def checkEstimatorAcceptAllNumericTypes[M <: Model[M], T <: Estimator[M]]( + estimator: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = { +val dfs = genRegressionDFWithNumericLabelCol(sqlContext) +val expected = estimator.fit(dfs(DoubleType)) +val actuals = dfs.keys.filter(_ != DoubleType).map(t => estimator.fit(dfs(t))) +actuals.foreach(actual => check(expected, actual)) + } + + def checkEstimatorRejectNotNumericTypes( + predictor: Estimator[_], sqlContext: SQLContext): Unit = { +val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext) +val thrown = intercept[IllegalArgumentException] { + predictor.fit(dfWithStringLabels) +} +assert(thrown.getMessage contains + "Column label must be of type NumericType but was actually of type StringType") + } + + def genClassifDFWithNumericLabelCol( + sqlContext: SQLContext, + labelColName: String = "label", + featuresColName: String = "features"): Map[NumericType, DataFrame] = { +val df = sqlContext.createDataFrame(Seq( + (0, Vectors.dense(0, 2, 3)), + (1, Vectors.dense(0, 3, 1)), + (0, Vectors.dense(0, 2, 2)), + (1, Vectors.dense(0, 3, 9)), + (0, Vectors.dense(0, 2, 6)) +)).toDF(labelColName, featuresColName) + +val types = + Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0)) +types.map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName))) + .map { case (t, d) => t -> TreeTests.setMetadata(d, 2, labelColName) } + .toMap + } + + def genRegressionDFWithNumericLabelCol( + sqlContext: SQLContext, + labelColName: String = "label", + featuresColName: String = "features", + censorColName: String = "censor"): Map[NumericType, DataFrame] = { +val df = sqlContext.createDataFrame(Seq( + (0, Vectors.dense(0)), + (1, Vectors.dense(1)), + (2, Vectors.dense(2)), + (3, Vectors.dense(3)), + (4, Vectors.dense(4)) +)).toDF(labelColName, featuresColName) + +val types = + Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0)) +types + .map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName))) + .map { case (t, d) => +t -> TreeTests.setMetadata(d, 2, labelColName).withColumn(censorColName, lit(0.0)) --- End diff -- setMetadata: classes should be 0, not 2 --- If your project is set up for it, you can reply to this email and have your rep
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57643353 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala --- @@ -61,6 +61,20 @@ private[spark] object SchemaUtils { } /** +* Check whether the given schema contains a column of the numeric data type. --- End diff -- indent 1 space less (can update IntelliJ code style settings: Editor -> Code Style -> Scala -> ScalaDoc tab -> uncheck Use scaladoc indent for leading asterisk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-202579805 I'll take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201750481 @jkbradley @sethah @MLnick I think it's in a pretty good state now, what do you guys think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201582826 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201582825 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201582376 **[Test build #54229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201560326 **[Test build #54229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201037149 **[Test build #54088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201037265 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201037272 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54088/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201021316 **[Test build #54088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201018209 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54085/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201018203 **[Test build #54085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201018208 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-201017850 **[Test build #54085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200726627 @jkbradley yes, I'm currently trying to incorporate @sethah's comments and keep code duplication to a strict minimum. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200564243 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53974/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200564242 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200564217 **[Test build #53974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200561614 @BenFradet Are you actively working on this? if so, I'll wait to review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57245082 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -183,12 +183,12 @@ class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: S * Extract [[featuresCol]], [[labelCol]] and [[censorCol]] from input dataset, * and put it in an RDD with strong types. */ - protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] = { -dataset.select($(featuresCol), $(labelCol), $(censorCol)).rdd.map { - case Row(features: Vector, label: Double, censor: Double) => -AFTPoint(features, label, censor) -} - } + protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] = --- End diff -- I prefer to keep braces around multiline methods like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200559906 **[Test build #53974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-200556843 Sorry for the long delay! I'll take a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57241509 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features") + +val rf = new RandomForestClassifier().setFeaturesCol("features") + +val expected = rf.setLabelCol("label") + .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label")) +dfs.keys.filter(_ != DoubleType).foreach { t => + TreeTests.checkEqual(expected, +rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label"))) +} + } + + test("shouldn't support non NumericType labels") { --- End diff -- Here's what I managed to make work: ```scala def checkNumericTypes[M <: RegressionModel[_, M], T <: Regressor[_, _, M]]( regressor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = { val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features") val expected = regressor.fit(dfs(DoubleType)) val actuals = dfs.keys.filter(_ != DoubleType).map(t => regressor.fit(dfs(t))) actuals.foreach(actual => check(expected, actual)) } ``` which could be used like: ```scala MLTestingUtils.checkNumericTypes[LinearRegressionModel, LinearRegression]( new LinearRegression(), sqlContext) { (expected, actual) => assert(expected.intercept === actual.intercept) assert(expected.coefficients === actual.coefficients) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57219723 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features") + +val rf = new RandomForestClassifier().setFeaturesCol("features") + +val expected = rf.setLabelCol("label") + .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label")) +dfs.keys.filter(_ != DoubleType).foreach { t => + TreeTests.checkEqual(expected, +rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label"))) +} + } + + test("shouldn't support non NumericType labels") { --- End diff -- I wanted to be able to do something like: ```scala def checkNumericTypes[T <: Regressor, M <: Model]( regressor: T, sqlContext: SQLContext)(asserts: Seq[(M, M) => Unit]): Unit = { val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features") val expected = regressor.fit(dfs(DoubleType)) dfs.keys.filter(_ != DoubleType).foreach { t => val actual = regressor.fit(dfs(t)) asserts.foreach(f => f(expected, actual)) } } ``` But unfortunately that doesn't work since I'm not able to capture which model is being created by the regressor's `fit` method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57057748 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features") + +val rf = new RandomForestClassifier().setFeaturesCol("features") + +val expected = rf.setLabelCol("label") + .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label")) +dfs.keys.filter(_ != DoubleType).foreach { t => + TreeTests.checkEqual(expected, +rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label"))) +} + } + + test("shouldn't support non NumericType labels") { --- End diff -- Yup, good idea, I'll investigate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57057621 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -27,4 +31,56 @@ object MLTestingUtils { assert(copied.parent.uid == model.parent.uid) assert(copied.parent == model.parent) } + + def genClassifDFWithNumericLabelCol( +sqlContext: SQLContext, +labelColName: String, +featuresColName: String + ): Map[NumericType, DataFrame] = { --- End diff -- I'll modify the whole file then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199967566 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199967576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53797/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199967218 **[Test build #53797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57047541 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -27,4 +31,56 @@ object MLTestingUtils { assert(copied.parent.uid == model.parent.uid) assert(copied.parent == model.parent) } + + def genClassifDFWithNumericLabelCol( +sqlContext: SQLContext, +labelColName: String, +featuresColName: String + ): Map[NumericType, DataFrame] = { --- End diff -- nit: Spark style seems to be to put the return type on the line above, if it will fit. So: ```scala def genClassifDFWithNumericLabelCol( sqlContext: SQLContext, labelColName: String, featuresColName: String): Map[NumericType, DataFrame] = { ``` [Example](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala#L168) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r57045359 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features") + +val rf = new RandomForestClassifier().setFeaturesCol("features") + +val expected = rf.setLabelCol("label") + .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label")) +dfs.keys.filter(_ != DoubleType).foreach { t => + TreeTests.checkEqual(expected, +rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label"))) +} + } + + test("shouldn't support non NumericType labels") { --- End diff -- I still think the code duplication is a big concern. It might not be that hard to add some utility function that can clean this up. For example I was able to get this working: In _MLTestingUtils.scala_ ```scala def checkRejectsNonNumericType(est: Predictor[_, _, _], sqlContext: SQLContext) = { val dfWithStringLabels = MLTestingUtils.generateDFWithStringLabelCol(sqlContext, est.getLabelCol, est.getFeaturesCol) val thrown = intercept[IllegalArgumentException] { est.fit(dfWithStringLabels) } assert(thrown.getMessage contains "Column label must be of type NumericType but was actually of type StringType") } ``` Then in the actual test suites: ```scala test("shouldn't support non NumericType labels") { MLTestingUtils.checkRejectsNonNumericType(new RandomForestClassifier, sqlContext) } ``` There will be some complication for some of the estimators like OneVsRest and IsotonicRegression (why doesn't it extend Predictor?) but I think it will be worth it to get this right since future estimators will have to implement this as well. It would be really nice to have something like we do for params where there is just a one-liner `ParamsSuite.checkParams(new RandomForestClassifier)`. For testing that all numeric types are supported, we could have a utility method that produces all actual and expected results, then check for equality inside the individual test suites, like: ```scala val models = MLTestingUtils.fitAllNumericTypes(new NaiveBayes, sqlContext) models.foreach { case (expected, actual) => assert(expected.pi === actual.pi) assert(expected.theta === actual.theta) } ``` I'm open to feedback on this, realizing it could be hard to generalize this for every case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199948244 **[Test build #53797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199709409 Yup, I'll rebase tonight and start working on the related jiras. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199708780 Looks fine to me - latest test result indicates there may be merge conflicts? I'd like @jkbradley to take a quick pass too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-199709002 @sethah could you also take a final pass? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198035588 **[Test build #53454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198039870 **[Test build #53458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198057662 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198113442 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197537016 **[Test build #53343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198057482 **[Test build #53458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197586274 **[Test build #53368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197623813 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197623814 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53366/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197744650 @BenFradet I linked 2 additional sub-tasks to the [umbrella JIRA](https://issues.apache.org/jira/browse/SPARK-11107) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197521550 **[Test build #53343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r56465970 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features") --- End diff -- I think you'll need to adjust the generated data here as it requires an additional 'censor' col. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198113445 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53454/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197596347 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53368/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r56465357 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -49,10 +50,7 @@ private[ml] trait PredictorParams extends Params validateParams() // TODO: Support casting Array[Double] and Array[Float] to Vector when FeaturesType = Vector SchemaUtils.checkColumnType(schema, $(featuresCol), featuresDataType) -if (fitting) { - // TODO: Allow other numeric types - SchemaUtils.checkColumnType(schema, $(labelCol), DoubleType) -} +if (fitting) SchemaUtils.checkNumericType(schema, $(labelCol)) --- End diff -- Spark style prefers to keep the braces format for if statements. Can you change it back please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197755584 @MLnick yup thanks, I'll get to those once I'm done with this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197537090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197596345 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198113198 **[Test build #53454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-198057663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53458/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197596308 **[Test build #53368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r56468850 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType labels") { +val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features") --- End diff -- Thanks for the insight, I'll fix this later today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197537092 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53343/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197581060 **[Test build #53366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197623583 **[Test build #53366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197039251 **[Test build #53227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197039263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53227/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197039258 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197037079 Here's what's left to do: - IsotonicRegression - GeneralizedLinearRegression - AFTSurvival Otherwise, I'd prefer doing the evaluators and other estimators (like rformula and chiSqSelector) in other prs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-197035874 **[Test build #53227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user BenFradet commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r56203460 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { --- End diff -- OK, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r56200539 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte assert(importances.toArray.forall(_ >= 0.0)) } + test("should support all NumericType labels") { --- End diff -- I was thinking more like create a util method (or use an existing one if there is one) to generate the required test datasets. It's not a big deal though, just a "nice to have" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org