GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/19527

    [SPARK-13030][ML] Create OneHotEncoderEstimator for OneHotEncoder as 
Estimator

    ## What changes were proposed in this pull request?
    
    This patch adds a new class `OneHotEncoderEstimator` which extends 
`Estimator`. The `fit` method returns `OneHotEncoderModel`.
    
    Common methods between existing `OneHotEncoder` and new 
`OneHotEncoderEstimator`, such as transforming schema, are extracted and put 
into `OneHotEncoderCommon`.
    
    ### Multi-column support
    
    `OneHotEncoderEstimator` adds simpler multi-column support because it is 
new API and can be free from backward compatibility.
    
    ### handleInvalid Param support
    
    `OneHotEncoderEstimator` supports `handleInvalid` Param. It supports 
`error` and `skip`. Note that `skip` can't be used at the same time with 
`dropLast` as true. Because they will conflict in encoded vector.
    
    ## How was this patch tested?
    
    Added new test suite `OneHotEncoderEstimatorSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-13030

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19527
    
----
commit 8fd4677fd0e729d99d8777010e78bb5cfea3cf86
Author: Liang-Chi Hsieh <vii...@gmail.com>
Date:   2017-10-18T07:31:32Z

    Add OneHotEncoderEstimator and related tests.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to