[ https://issues.apache.org/jira/browse/SPARK-20574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang closed SPARK-20574. ------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 > Allow Bucketizer to handle non-Double column > -------------------------------------------- > > Key: SPARK-20574 > URL: https://issues.apache.org/jira/browse/SPARK-20574 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0 > Reporter: Wayne Zhang > Assignee: Wayne Zhang > Fix For: 2.2.0 > > > Bucketizer currently requires input column to be Double, but the logic should > work on any numeric data types. Many practical problems have integer/float > data types, and it could get very tedious to manually cast them into Double > before calling bucketizer. This transformer could be extended to handle all > numeric types. > The example below shows failure of Bucketizer on integer data. > {code} > val splits = Array(-3.0, 0.0, 3.0) > val data: Array[Int] = Array(-2, -1, 0, 1, 2) > val expectedBuckets = Array(0.0, 0.0, 1.0, 1.0, 1.0) > val dataFrame = data.zip(expectedBuckets).toSeq.toDF("feature", "expected") > val bucketizer = new Bucketizer() > .setInputCol("feature") > .setOutputCol("result") > .setSplits(splits) > bucketizer.transform(dataFrame) > java.lang.IllegalArgumentException: requirement failed: Column feature must > be of type DoubleType but was actually IntegerType. > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org