[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210121149 LGTM. Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210117010 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210116690 **[Test build #55839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55839/consoleFull)** for PR 12079 at commit [`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210117012 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210111381 **[Test build #55839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55839/consoleFull)** for PR 12079 at commit [`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-210110142 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-209049583 As per @jkbradley's https://github.com/apache/spark/pull/12308#issuecomment-209039855, let's keep them separate params. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208974760 @BryanCutler / @yongtang That sounds reasonable :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user BryanCutler commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208970035 > @holdenk @BryanCutler we could merge this and #12308, and then update the param to be shared (if we can do the different doc thing?). I think that will be better and maybe then we can change the param to be shared on both the Scala and Python side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208919727 **[Test build #55608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55608/consoleFull)** for PR 12079 at commit [`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208919990 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208919995 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208917286 Thanks @MLnick I just updated the pull request to address several minor issues. With respect to `. Default False` vs `. (default: False)`, I changed it to `. Default False` for now. But if you want to see `(default: X)` I can change it (including the rest of the file) to it as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208914227 **[Test build #55608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55608/consoleFull)** for PR 12079 at commit [`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208797154 A few minor comments, otherwise LGTM. @holdenk @BryanCutler we could merge this and #12308, and then update the param to be shared (if we can do the different doc thing?). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59341296 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + --- End diff -- `if true` -> `if True` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59341354 --- Diff: python/pyspark/mllib/feature.py --- @@ -379,6 +379,17 @@ class HashingTF(object): """ def __init__(self, numFeatures=1 << 20): self.numFeatures = numFeatures +self.binary = False + +@since("2.0.0") +def setBinary(self, value): +""" +If true, term frequency vector will be binary such that non-zero --- End diff -- `if true` -> `if True` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59339176 --- Diff: python/pyspark/ml/tests.py --- @@ -831,6 +831,25 @@ def test_logistic_regression_summary(self): self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC) +class HashingTFTest(PySparkTestCase): + +def test_apply_binary_term_freqs(self): +sqlContext = SQLContext(self.sc) + +df = sqlContext.createDataFrame([(0, ["a", "a", "b", "c", "c", "c"])], ["id", "words"]) +n = 100 +hashingTF = HashingTF() + hashingTF.setInputCol("words").setOutputCol("features").setNumFeatures(n).setBinary(True) +output = hashingTF.transform(df) +features = output.select("features").first().features.toArray() +expected = Vectors.sparse(100, {(ord("a") % n): 1.0, --- End diff -- `100` -> `n` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59338971 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + + "This is useful for discrete probabilistic models that model binary events " + + "rather than integer counts. (default: False)", --- End diff -- The style seems to be `. Default False` rather than `. (default: False)`. @BryanCutler @holdenk thoughts? Though I must say I'd prefer `(default: X).` across the board myself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59338620 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + --- End diff -- See https://github.com/apache/spark/pull/12308#issuecomment-208773214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59334159 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + --- End diff -- Great! Looking at the incoming PRs it seems there is a second PR also adding a binary feature to another model - it might make sense to move this to a shared param instead of having it be per-model (although it will require coordination with the other PR timing wise). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208700359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55587/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208700356 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208700160 **[Test build #55587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55587/consoleFull)** for PR 12079 at commit [`9c2b4ab`](https://github.com/apache/spark/commit/9c2b4ab64fbd0aa267ff1ee0b9b353a99346d05c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208697614 @holdenk The Scala implementation has ben completed in SPARK-13963. I updated the description of this pull request to show the linkage between this issue (SPARK-14238) and SPARK-13963. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208696634 **[Test build #55587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55587/consoleFull)** for PR 12079 at commit [`9c2b4ab`](https://github.com/apache/spark/commit/9c2b4ab64fbd0aa267ff1ee0b9b353a99346d05c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59319063 --- Diff: python/pyspark/mllib/feature.py --- @@ -379,6 +379,17 @@ class HashingTF(object): """ def __init__(self, numFeatures=1 << 20): self.numFeatures = numFeatures +self.binary = False + +@since("2.0.0") +def setBinary(self, value): +""" +If true, term frequency vector will be binary such that non-zero +term counts will be set to 1 +(default: false) --- End diff -- Thanks @BryanCutler this issue has been corrected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59318934 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + --- End diff -- Thanks @holdenk this issue has been addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59299261 --- Diff: python/pyspark/mllib/feature.py --- @@ -379,6 +379,17 @@ class HashingTF(object): """ def __init__(self, numFeatures=1 << 20): self.numFeatures = numFeatures +self.binary = False + +@since("2.0.0") +def setBinary(self, value): +""" +If true, term frequency vector will be binary such that non-zero +term counts will be set to 1 +(default: false) --- End diff -- minor: false -> False --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r59273122 --- Diff: python/pyspark/ml/feature.py --- @@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +binary = Param(Params._dummy(), "binary", "If true, all non zero counts are set to 1. " + --- End diff -- We probably want to mention the default value here (namely false). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208538911 One minor note:Often we want to go with Scala first then Python, but in either direction if we are only doing one at a time it can be good practice to create either a follow up JIRA or a subtask on the existing JIRA to also expose the implementation in the other language. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208384386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55524/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208384380 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208384272 **[Test build #55524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55524/consoleFull)** for PR 12079 at commit [`829c87e`](https://github.com/apache/spark/commit/829c87e453b7082edd3d5b9d042b9fd849f060b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208380065 **[Test build #55524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55524/consoleFull)** for PR 12079 at commit [`829c87e`](https://github.com/apache/spark/commit/829c87e453b7082edd3d5b9d042b9fd849f060b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-208378950 Rebased to fix conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-204013573 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-204013576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54642/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-204013346 **[Test build #54642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54642/consoleFull)** for PR 12079 at commit [`a71f59b`](https://github.com/apache/spark/commit/a71f59b567c998c19df362123e587fef444e6db7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yongtang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r58083774 --- Diff: python/pyspark/ml/feature.py --- @@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +""" +Binary toggle to control term frequency counts. +If true, all non-zero counts are set to 1. This is useful for discrete probabilistic +models that model binary events rather than integer counts. +(default = False) +""" +binary = Param(Params._dummy(), "binary", + "Binary toggle to control term frequency counts", + typeConverter=TypeConverters.toBoolean) + @keyword_only def __init__(self, numFeatures=1 << 18, inputCol=None, outputCol=None): --- End diff -- Thanks @yanboliang just updated the pull request with issues addressed. Let me know if there are any other issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-204007123 **[Test build #54642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54642/consoleFull)** for PR 12079 at commit [`a71f59b`](https://github.com/apache/spark/commit/a71f59b567c998c19df362123e587fef444e6db7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r58073256 --- Diff: python/pyspark/ml/feature.py --- @@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +""" +Binary toggle to control term frequency counts. +If true, all non-zero counts are set to 1. This is useful for discrete probabilistic +models that model binary events rather than integer counts. +(default = False) +""" +binary = Param(Params._dummy(), "binary", + "Binary toggle to control term frequency counts", + typeConverter=TypeConverters.toBoolean) + @keyword_only def __init__(self, numFeatures=1 << 18, inputCol=None, outputCol=None): --- End diff -- ```binary``` should be arguments of ```__init__```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r58073185 --- Diff: python/pyspark/ml/feature.py --- @@ -520,6 +530,7 @@ def __init__(self, numFeatures=1 << 18, inputCol=None, outputCol=None): super(HashingTF, self).__init__() self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.HashingTF", self.uid) self._setDefault(numFeatures=1 << 18) +self._setDefault(binary=False) --- End diff -- ```self._setDefault(numFeatures=1 << 18, binary=False)``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r58072895 --- Diff: python/pyspark/ml/feature.py --- @@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +""" +Binary toggle to control term frequency counts. --- End diff -- The comment doc for ```binary``` is unnecessary because it will not exist in the generated Python API doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12079#discussion_r58072522 --- Diff: python/pyspark/ml/feature.py --- @@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, HasOutputCol, HasNumFeatures, Java .. versionadded:: 1.3.0 """ +""" +Binary toggle to control term frequency counts. +If true, all non-zero counts are set to 1. This is useful for discrete probabilistic +models that model binary events rather than integer counts. +(default = False) +""" +binary = Param(Params._dummy(), "binary", + "Binary toggle to control term frequency counts", --- End diff -- We should keep the doc of the Param consistent with Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203973486 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203973488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54631/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203973259 **[Test build #54631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54631/consoleFull)** for PR 12079 at commit [`1e24a68`](https://github.com/apache/spark/commit/1e24a68806e616996b3e465f38bbbd3525a18b11). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203968139 **[Test build #54631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54631/consoleFull)** for PR 12079 at commit [`1e24a68`](https://github.com/apache/spark/commit/1e24a68806e616996b3e465f38bbbd3525a18b11). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203944274 **[Test build #54623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54623/consoleFull)** for PR 12079 at commit [`e58d1a2`](https://github.com/apache/spark/commit/e58d1a279aaded9045c9e7a7a161500163b81fd6). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203944299 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203944303 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54623/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203943491 **[Test build #54623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54623/consoleFull)** for PR 12079 at commit [`e58d1a2`](https://github.com/apache/spark/commit/e58d1a279aaded9045c9e7a7a161500163b81fd6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203941636 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12079#issuecomment-203741023 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...
GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/12079 [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib ## What changes were proposed in this pull request? This fix tries to add binary toggle Param to PySpark HashingTF in ML & MLlib. If this toggle is set, then all non-zero counts will be set to 1. ## How was this patch tested? This fix adds two tests to cover the code changes. One for HashingTF in PySpark's ML and one for HashingTF in PySpark's MLLib. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yongtang/spark SPARK-14238 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12079 commit e58d1a279aaded9045c9e7a7a161500163b81fd6 Author: Yong TangDate: 2016-03-31T03:49:33Z [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib This fix tries to add binary toggle Param to PySpark HashingTF in ML & MLlib. If this toggle is set, then all non-zero counts will be set to 1. This fix adds two tests to cover the code changes. One for HashingTF in PySpark's ML and one for HashingTF in PySpark's MLLib. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org