[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22991 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22991#discussion_r236230624 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -209,6 +215,9 @@ final class OneVsRestModel private[ml] ( newDataset.unpersist() } +var outputColNames = Seq.empty[String] --- End diff -- Maybe 'predictionColumns' ? These aren't the only output columns. You could make this a mutable val too, but whatever. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/22991#discussion_r236110139 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] ( Vectors.dense(predArray) } - // output the index of the classifier with highest confidence as prediction - val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble } - - // output confidence as raw prediction, label and label metadata as prediction - aggregatedDataset -.withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName))) -.withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata) -.drop(accColName) + if (getPredictionCol != "") { --- End diff -- I implemented this in another way, classificationmodel update the output dataset, and I direct return the output in each if clause. Then I update the to follow ClassificationModel, and update the outputColumns in each clauses. And `withColumns` is used to return the output columns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22991#discussion_r235929179 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] ( Vectors.dense(predArray) } - // output the index of the classifier with highest confidence as prediction - val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble } - - // output confidence as raw prediction, label and label metadata as prediction - aggregatedDataset -.withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName))) -.withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata) -.drop(accColName) + if (getPredictionCol != "") { --- End diff -- I guess I'm surprised these are both optional, in PredicitonModel too. But yeah consistency is good. However shouldn't this if clause be outside the "getRawPredictionCol = """ block? see ClassificationModel --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/22991 [SPARK-25989][ML] OneVsRestModel handle empty outputCols incorrectly ## What changes were proposed in this pull request? ignore empty output columns ## How was this patch tested? added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark ovrm_empty_outcol Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22991.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22991 commit 035362d9ab6d04ff04e3060edd941fdbd0c26222 Author: zhengruifeng Date: 2018-11-09T07:47:30Z lint --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org