[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366809#comment-16366809 ]
Nick Pentreath commented on SPARK-23265: ---------------------------------------- Thanks for the ping - yes it adds more detailed checking of the exclusive params and would introduce an error being thrown in certain additional situations (specifically {{numBucketsArray}} set for single-column transform, {{numBuckets}} and {{numBucketsArray}} set for multi-column transform, mismatched length of {{numBucketsArray}} with input/output columns for multi-column transform). I reviewed the PR and LGTM so as I said there we can merge this now before RC4 gets cut. > Update multi-column error handling logic in QuantileDiscretizer > --------------------------------------------------------------- > > Key: SPARK-23265 > URL: https://issues.apache.org/jira/browse/SPARK-23265 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.3.0 > Reporter: Nick Pentreath > Priority: Major > > SPARK-22397 added support for multiple columns to {{QuantileDiscretizer}}. If > both single- and mulit-column params are set (specifically {{inputCol}} / > {{inputCols}}) an error is thrown. > However, SPARK-22799 added more comprehensive error logic for {{Bucketizer}}. > The logic for {{QuantileDiscretizer}} should be updated to match. *Note* that > for this transformer, it is acceptable to set the single-column param for > \{{numBuckets}} when transforming multiple columns, since that is then > applied to all columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org