GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/20132
[SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEstimator ## What changes were proposed in this pull request? Follow-up cleanups for the OneHotEncoderEstimator PR. See some discussion in the original PR: https://github.com/apache/spark/pull/19527 or read below for what this PR includes: * configedCategorySize: I reverted this to return an Array. I realized the original setup (which I had recommended in the original PR) caused the whole model to be serialized in the UDF. * encoder: I reorganized the logic to show what I meant in the comment in the previous PR. I think it's simpler but am open to suggestions. I also made some small style cleanups based on IntelliJ warnings. ## How was this patch tested? Existing unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark viirya-SPARK-13030 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20132 ---- commit 9bf045da1adeaa08deeb96eaa0289d8d4cb74bc1 Author: Joseph K. Bradley <joseph@...> Date: 2017-12-31T23:25:45Z updates for final PR ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org