Barry Becker created SPARK-17086: ------------------------------------ Summary: QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data Key: SPARK-17086 URL: https://issues.apache.org/jira/browse/SPARK-17086 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.1.0 Reporter: Barry Becker
I discovered this bug when working with a build from the master branch (which I believe is 2.1.0). This used to work fine when running spark 1.6.2. I have a dataframe with an "intData" column that has values like {code} 1 3 2 1 1 2 3 2 2 2 1 3 {code} I have a stage in my pipeline that uses the QuantileDiscretizer to produce equal weight splits like this {code} new QuantileDiscretizer() .setInputCol("intData") .setOutputCol("intData_bin") .setNumBuckets(10) .fit(df) {code} But when that gets run it (incorrectly) throws this error: {code} parameter splits given invalid value [-Infinity, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, Infinity] {code} I don't think that there should be duplicate splits generated should there be? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org