Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19588#discussion_r148444535
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
    @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (
       // TODO: Check more carefully about whether this whole class will be 
included in a closure.
     
       /** Per-vector transform function */
    -  private val transformFunc: Vector => Vector = {
    +  private lazy val transformFunc: Vector => Vector = {
         val sortedCatFeatureIndices = categoryMaps.keys.toArray.sorted
         val localVectorMap = categoryMaps
         val localNumFeatures = numFeatures
    +    val localHandleInvalid = getHandleInvalid
         val f: Vector => Vector = { (v: Vector) =>
           assert(v.size == localNumFeatures, "VectorIndexerModel expected 
vector of length" +
             s" $numFeatures but found length ${v.size}")
    +      val exceptMsg = "VectorIndexer encountered NULL value. To handle" +
    --- End diff --
    
    And I would suggest moving the exceptMsg in case 
VectorIndexer.ERROR_INVALID, where it may provide some concrete    error info, 
like the featureIndex and unexpected value.
    Otherwise it will be very hard for the users to locate the root cause for 
the error case. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to