GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/20132

    [SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEstimator

    ## What changes were proposed in this pull request?
    
    Follow-up cleanups for the OneHotEncoderEstimator PR.  See some discussion 
in the original PR: https://github.com/apache/spark/pull/19527 or read below 
for what this PR includes:
    * configedCategorySize: I reverted this to return an Array.  I realized the 
original setup (which I had recommended in the original PR) caused the whole 
model to be serialized in the UDF.
    * encoder: I reorganized the logic to show what I meant in the comment in 
the previous PR.  I think it's simpler but am open to suggestions.
    
    I also made some small style cleanups based on IntelliJ warnings.
    
    ## How was this patch tested?
    
    Existing unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark viirya-SPARK-13030

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20132
    
----
commit 9bf045da1adeaa08deeb96eaa0289d8d4cb74bc1
Author: Joseph K. Bradley <joseph@...>
Date:   2017-12-31T23:25:45Z

    updates for final PR

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to