Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148444535 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] ( // TODO: Check more carefully about whether this whole class will be included in a closure. /** Per-vector transform function */ - private val transformFunc: Vector => Vector = { + private lazy val transformFunc: Vector => Vector = { val sortedCatFeatureIndices = categoryMaps.keys.toArray.sorted val localVectorMap = categoryMaps val localNumFeatures = numFeatures + val localHandleInvalid = getHandleInvalid val f: Vector => Vector = { (v: Vector) => assert(v.size == localNumFeatures, "VectorIndexerModel expected vector of length" + s" $numFeatures but found length ${v.size}") + val exceptMsg = "VectorIndexer encountered NULL value. To handle" + --- End diff -- And I would suggest moving the exceptMsg in case VectorIndexer.ERROR_INVALID, where it may provide some concrete error info, like the featureIndex and unexpected value. Otherwise it will be very hard for the users to locate the root cause for the error case.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org