Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21044#discussion_r181288721
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    -    // output the index of the classifier with highest confidence as 
prediction
    -    val labelUDF = udf { (predictions: Map[Int, Double]) =>
    -      predictions.maxBy(_._2)._1.toDouble
    -    }
    +    // output the RawPrediction as vector
    +    if (getRawPredictionCol != "") {
    +      val rawPredictionUDF = udf { (predictions: Map[Int, Double]) =>
    +        val predArray = Array.fill[Double](numClasses)(0.0)
    --- End diff --
    
    This causes a subtle ContextCleaner bug: `numClasses` refers to a field of 
the class OneVsRestModel, so when Spark's closure capture serializes this UDF 
to send to executors, it will end up sending the entire OneVsRestModel object, 
rather than just the value for numClasses.  Make a local copy of the value 
numClasses within the transform() method to avoid this issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to