[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

MLnick Sun, 27 Nov 2016 23:49:10 -0800

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/16011
  
    As far as I recall, the idea is that the `Bucketizer` can be used 
standalone, and because the `QuantileDiscretizer` itself produced the same 
thing as a bucketizer, it was used as the model rather than having a dedicated 
`QuantileDiscretizerModel`.
    
    `Bucketizer` is already a separate transformer (it is not required to be 
produced by a `QuantileDiscretizer`), since it's a `Model` and the constructor 
is public (by design). So it by itself can be used in a pipeline, and the 
`splits` param could be selected via cross-validation (for example).
    
    What you propose here makes using `QuantileDiscretizer` and a non-default 
`handleInvalid` param together with cross-validation impossible. In addition, 
as you've pointed out in your code example above, this would force a pretty 
clunky "workaround" to set the `handleInvalid` param in a pipeline.
    
    Why do this? What is the actual problem with what exists currently? To me 
it seems better the way it is. Also, I don't see any major benefit to adding a 
new `QuantileDiscretizerModel`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

Reply via email to