Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16011 As far as I recall, the idea is that the `Bucketizer` can be used standalone, and because the `QuantileDiscretizer` itself produced the same thing as a bucketizer, it was used as the model rather than having a dedicated `QuantileDiscretizerModel`. `Bucketizer` is already a separate transformer (it is not required to be produced by a `QuantileDiscretizer`), since it's a `Model` and the constructor is public (by design). So it by itself can be used in a pipeline, and the `splits` param could be selected via cross-validation (for example). What you propose here makes using `QuantileDiscretizer` and a non-default `handleInvalid` param together with cross-validation impossible. In addition, as you've pointed out in your code example above, this would force a pretty clunky "workaround" to set the `handleInvalid` param in a pipeline. Why do this? What is the actual problem with what exists currently? To me it seems better the way it is. Also, I don't see any major benefit to adding a new `QuantileDiscretizerModel`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org