[ https://issues.apache.org/jira/browse/SPARK-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-20604. ------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 17864 [https://github.com/apache/spark/pull/17864] > Allow Imputer to handle all numeric types > ----------------------------------------- > > Key: SPARK-20604 > URL: https://issues.apache.org/jira/browse/SPARK-20604 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.1.0 > Reporter: Wayne Zhang > Assignee: Wayne Zhang > Priority: Major > Fix For: 3.0.0 > > > Imputer currently requires input column to be Double or Float, but the logic > should work on any numeric data types. Many practical problems have integer > data types, and it could get very tedious to manually cast them into Double > before calling imputer. This transformer could be extended to handle all > numeric types. > The example below shows failure of Imputer on integer data. > {code} > val df = spark.createDataFrame( Seq( > (0, 1.0, 1.0, 1.0), > (1, 11.0, 11.0, 11.0), > (2, 1.5, 1.5, 1.5), > (3, Double.NaN, 4.5, 1.5) > )).toDF("id", "value1", "expected_mean_value1", "expected_median_value1") > val imputer = new Imputer() > .setInputCols(Array("value1")) > .setOutputCols(Array("out1")) > imputer.fit(df.withColumn("value1", col("value1").cast(IntegerType))) > java.lang.IllegalArgumentException: requirement failed: Column value1 must be > of type equal to one of the following types: [DoubleType, FloatType] but was > actually of type IntegerType. > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org