[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

jkbradley Tue, 20 Mar 2018 13:43:00 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20829#discussion_r175908248
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
    @@ -49,32 +53,57 @@ class VectorAssembler @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
       @Since("1.4.0")
       def setOutputCol(value: String): this.type = set(outputCol, value)
     
    +  /** @group setParam */
    +  @Since("2.4.0")
    +  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
    +
    +  /**
    +   * Param for how to handle invalid data (NULL values). Options are 
'skip' (filter out rows with
    --- End diff --
    
    It would be good to expand this doc to explain the behavior: how various 
types of invalid values are treated (null, NaN, incorrect Vector length) and 
how computationally expensive different options can be.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

Reply via email to