[GitHub] spark pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric...

actuaryzhang Thu, 25 May 2017 15:33:59 -0700

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17864#discussion_r118600408
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
    @@ -94,12 +94,13 @@ private[feature] trait ImputerParams extends Params 
with HasInputCols {
      * :: Experimental ::
      * Imputation estimator for completing missing values, either using the 
mean or the median
      * of the columns in which the missing values are located. The input 
columns should be of
    - * DoubleType or FloatType. Currently Imputer does not support categorical 
features
    + * numeric type. Currently Imputer does not support categorical features
      * (SPARK-15041) and possibly creates incorrect values for a categorical 
feature.
      *
      * Note that the mean/median value is computed after filtering out missing 
values.
      * All Null values in the input columns are treated as missing, and so are 
also imputed. For
      * computing median, DataFrameStatFunctions.approxQuantile is used with a 
relative error of 0.001.
    + * The output column is always of Double type regardless of the input 
column type.
    --- End diff --
    
    @MLnick Here is the note on always returning Double type.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric...

Reply via email to