[ 
https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366809#comment-16366809
 ] 

Nick Pentreath commented on SPARK-23265:
----------------------------------------

Thanks for the ping - yes it adds more detailed checking of the exclusive 
params and would introduce an error being thrown in certain additional 
situations (specifically {{numBucketsArray}} set for single-column transform, 
{{numBuckets}} and {{numBucketsArray}} set for multi-column transform, 
mismatched length of {{numBucketsArray}} with input/output columns for 
multi-column transform).

I reviewed the PR and LGTM so as I said there we can merge this now before RC4 
gets cut.

> Update multi-column error handling logic in QuantileDiscretizer
> ---------------------------------------------------------------
>
>                 Key: SPARK-23265
>                 URL: https://issues.apache.org/jira/browse/SPARK-23265
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Nick Pentreath
>            Priority: Major
>
> SPARK-22397 added support for multiple columns to {{QuantileDiscretizer}}. If 
> both single- and mulit-column params are set (specifically {{inputCol}} / 
> {{inputCols}}) an error is thrown.
> However, SPARK-22799 added more comprehensive error logic for {{Bucketizer}}. 
> The logic for {{QuantileDiscretizer}} should be updated to match. *Note* that 
> for this transformer, it is acceptable to set the single-column param for 
> \{{numBuckets}} when transforming multiple columns, since that is then 
> applied to all columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to