[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/20164 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20164#discussion_r161535696 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -170,21 +170,24 @@ final class OneVsRestModel private[ml] ( newDataset.persist(StorageLevel.MEMORY_AND_DISK) } +// temporary column to store intermediate raw prediction +val tmpRawPredictionColName = "rawPrediction_" + UUID.randomUUID().toString + // update the accumulator column with the result of prediction of models val aggregatedDataset = models.zipWithIndex.foldLeft[DataFrame](newDataset) { case (df, (model, index)) => -val rawPredictionCol = model.getRawPredictionCol -val columns = origCols ++ List(col(rawPredictionCol), col(accColName)) +val columns = origCols ++ List(col(tmpRawPredictionColName), col(accColName)) --- End diff -- This line doesn't need to be in the `foldLeft` block any longer? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20164#discussion_r160020496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -170,21 +170,24 @@ final class OneVsRestModel private[ml] ( newDataset.persist(StorageLevel.MEMORY_AND_DISK) } +// temporary column to store intermediate raw prediction +val tmpRawPredictionColName = "mbc$tmpraw" + UUID.randomUUID().toString --- End diff -- in other ml cases we are slightly more descriptive with the prefix text https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L1050 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/20164 [SPARK-22971][ML] OneVsRestModel should use temporary RawPredictionCol ## What changes were proposed in this pull request? use temporary RawPredictionCol in `OneVsRestModel#transform` ## How was this patch tested? existing tests and added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark ovr_not_use_getRawPredictionCol Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20164 commit f155e1cc6b175ac06a5f2ab710d4c053b0776507 Author: Zheng RuiFeng Date: 2018-01-05T09:29:25Z create pr commit 9b0dcc69535b6731c9b6cdc0030c846c3352a5de Author: Zheng RuiFeng Date: 2018-01-05T10:19:59Z create pr commit 6c567ffb02738346fc83e467752add0d00a42e07 Author: Zheng RuiFeng Date: 2018-01-05T10:26:16Z add test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org