[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-23377: ---------------------------------- Fix Version/s: (was: 2.3.1) 2.3.0 > Bucketizer with multiple columns persistence bug > ------------------------------------------------ > > Key: SPARK-23377 > URL: https://issues.apache.org/jira/browse/SPARK-23377 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.3.0 > Reporter: Bago Amirbekian > Assignee: Liang-Chi Hsieh > Priority: Blocker > Fix For: 2.3.0, 2.4.0 > > > A Bucketizer with multiple input/output columns get "inputCol" set to the > default value on write -> read which causes it to throw an error on > transform. Here's an example. > {code:java} > import org.apache.spark.ml.feature._ > val splits = Array(Double.NegativeInfinity, 0, 10, 100, > Double.PositiveInfinity) > val bucketizer = new Bucketizer() > .setSplitsArray(Array(splits, splits)) > .setInputCols(Array("foo1", "foo2")) > .setOutputCols(Array("bar1", "bar2")) > val data = Seq((1.0, 2.0), (10.0, 100.0), (101.0, -1.0)).toDF("foo1", "foo2") > bucketizer.transform(data) > val path = "/temp/bucketrizer-persist-test" > bucketizer.write.overwrite.save(path) > val bucketizerAfterRead = Bucketizer.read.load(path) > println(bucketizerAfterRead.isDefined(bucketizerAfterRead.outputCol)) > // This line throws an error because "outputCol" is set > bucketizerAfterRead.transform(data) > {code} > And the trace: > {code:java} > java.lang.IllegalArgumentException: Bucketizer bucketizer_6f0acc3341f7 has > the inputCols Param set for multi-column transform. The following Params are > not applicable and should not be set: outputCol. > at > org.apache.spark.ml.param.ParamValidators$.checkExclusiveParams$1(params.scala:300) > at > org.apache.spark.ml.param.ParamValidators$.checkSingleVsMultiColumnParams(params.scala:314) > at > org.apache.spark.ml.feature.Bucketizer.transformSchema(Bucketizer.scala:189) > at > org.apache.spark.ml.feature.Bucketizer.transform(Bucketizer.scala:141) > at > line251821108a8a433da484ee31f166c83725.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-6079631:17) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org