[ 
https://issues.apache.org/jira/browse/SPARK-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-20604:
------------------------------
    Priority: Minor  (was: Major)

> Allow Imputer to handle all numeric types
> -----------------------------------------
>
>                 Key: SPARK-20604
>                 URL: https://issues.apache.org/jira/browse/SPARK-20604
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Wayne Zhang
>            Assignee: Wayne Zhang
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> Imputer currently requires input column to be Double or Float, but the logic 
> should work on any numeric data types. Many practical problems have integer  
> data types, and it could get very tedious to manually cast them into Double 
> before calling imputer. This transformer could be extended to handle all 
> numeric types.  
> The example below shows failure of Imputer on integer data. 
> {code}
>     val df = spark.createDataFrame( Seq(
>       (0, 1.0, 1.0, 1.0),
>       (1, 11.0, 11.0, 11.0),
>       (2, 1.5, 1.5, 1.5),
>       (3, Double.NaN, 4.5, 1.5)
>     )).toDF("id", "value1", "expected_mean_value1", "expected_median_value1")
>     val imputer = new Imputer()
>       .setInputCols(Array("value1"))
>       .setOutputCols(Array("out1"))
>     imputer.fit(df.withColumn("value1", col("value1").cast(IntegerType)))
> java.lang.IllegalArgumentException: requirement failed: Column value1 must be 
> of type equal to one of the following types: [DoubleType, FloatType] but was 
> actually of type IntegerType.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to