GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/11832
[SPARK-13963][ML] Adding binary toggle param to HashingTF ## What changes were proposed in this pull request? Adding binary toggle parameter to ml.feature.HashingTF, as well as mllib.feature.HashingTF since the former wraps this functionality. This parameter, if true, will set non-zero valued term counts to 1 to transform term count features to binary values that are well suited for discrete probability models. ## How was this patch tested? Added unit tests for ML and MLlib You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark binary-param-HashingTF-SPARK-13963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11832.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11832 ---- commit a5ff3309c0d07e57177374133130803eb98ebffb Author: Bryan Cutler <cutl...@gmail.com> Date: 2016-03-18T21:19:19Z [SPARK-13963] Adding binary toggle to HashingTF in ml/mllib commit 31097231769860b86d1d3234ebf7d4e95f96e5cb Author: Bryan Cutler <cutl...@gmail.com> Date: 2016-03-18T21:19:48Z Added unit test for HashingTF binary toggle commit ca1436166a1292f92d72408c10cf606623b31bbd Author: Bryan Cutler <cutl...@gmail.com> Date: 2016-03-18T21:26:34Z fixed param description text ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org