[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210121149
  
LGTM. Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210117010
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210116690
  
**[Test build #55839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55839/consoleFull)**
 for PR 12079 at commit 
[`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210117012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55839/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210111381
  
**[Test build #55839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55839/consoleFull)**
 for PR 12079 at commit 
[`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-14 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-210110142
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-209049583
  
As per @jkbradley's 
https://github.com/apache/spark/pull/12308#issuecomment-209039855, let's keep 
them separate params.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208974760
  
@BryanCutler / @yongtang That sounds reasonable :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread BryanCutler
Github user BryanCutler commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208970035
  
> @holdenk @BryanCutler we could merge this and #12308, and then update the 
param to be shared (if we can do the different doc thing?).

I think that will be better and maybe then we can change the param to be 
shared on both the Scala and Python side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208919727
  
**[Test build #55608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55608/consoleFull)**
 for PR 12079 at commit 
[`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208919990
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208919995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55608/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread yongtang
Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208917286
  
Thanks @MLnick I just updated the pull request to address several minor 
issues. With respect to `. Default False` vs `. (default: False)`, I changed it 
to `. Default False` for now. But if you want to see `(default: X)` I can 
change it (including the rest of the file) to it as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208914227
  
**[Test build #55608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55608/consoleFull)**
 for PR 12079 at commit 
[`551cc6e`](https://github.com/apache/spark/commit/551cc6ee1b4fdb5cff59d4ef998ca0a15777c7e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208797154
  
A few minor comments, otherwise LGTM. 

@holdenk @BryanCutler we could merge this and #12308, and then update the 
param to be shared (if we can do the different doc thing?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59341296
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
--- End diff --

`if true` -> `if True`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59341354
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -379,6 +379,17 @@ class HashingTF(object):
 """
 def __init__(self, numFeatures=1 << 20):
 self.numFeatures = numFeatures
+self.binary = False
+
+@since("2.0.0")
+def setBinary(self, value):
+"""
+If true, term frequency vector will be binary such that non-zero
--- End diff --

`if true` -> `if True`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59339176
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -831,6 +831,25 @@ def test_logistic_regression_summary(self):
 self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC)
 
 
+class HashingTFTest(PySparkTestCase):
+
+def test_apply_binary_term_freqs(self):
+sqlContext = SQLContext(self.sc)
+
+df = sqlContext.createDataFrame([(0, ["a", "a", "b", "c", "c", 
"c"])], ["id", "words"])
+n = 100
+hashingTF = HashingTF()
+
hashingTF.setInputCol("words").setOutputCol("features").setNumFeatures(n).setBinary(True)
+output = hashingTF.transform(df)
+features = output.select("features").first().features.toArray()
+expected = Vectors.sparse(100, {(ord("a") % n): 1.0,
--- End diff --

`100` -> `n`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59338971
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
+   "This is useful for discrete probabilistic models that 
model binary events " +
+   "rather than integer counts. (default: False)",
--- End diff --

The style seems to be `. Default False` rather than `. (default: False)`. 
@BryanCutler @holdenk thoughts?

Though I must say I'd prefer `(default: X).` across the board myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59338620
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
--- End diff --

See https://github.com/apache/spark/pull/12308#issuecomment-208773214


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-12 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59334159
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
--- End diff --

Great! Looking at the incoming PRs it seems there is a second PR also 
adding a binary feature to another model - it might make sense to move this to 
a shared param instead of having it be per-model (although it will require 
coordination with the other PR timing wise).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208700359
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55587/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208700356
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208700160
  
**[Test build #55587 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55587/consoleFull)**
 for PR 12079 at commit 
[`9c2b4ab`](https://github.com/apache/spark/commit/9c2b4ab64fbd0aa267ff1ee0b9b353a99346d05c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread yongtang
Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208697614
  
@holdenk The Scala implementation has ben completed in SPARK-13963. I 
updated the description of this pull request to show the linkage between this 
issue (SPARK-14238) and SPARK-13963.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208696634
  
**[Test build #55587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55587/consoleFull)**
 for PR 12079 at commit 
[`9c2b4ab`](https://github.com/apache/spark/commit/9c2b4ab64fbd0aa267ff1ee0b9b353a99346d05c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread yongtang
Github user yongtang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59319063
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -379,6 +379,17 @@ class HashingTF(object):
 """
 def __init__(self, numFeatures=1 << 20):
 self.numFeatures = numFeatures
+self.binary = False
+
+@since("2.0.0")
+def setBinary(self, value):
+"""
+If true, term frequency vector will be binary such that non-zero
+term counts will be set to 1
+(default: false)
--- End diff --

Thanks @BryanCutler this issue has been corrected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread yongtang
Github user yongtang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59318934
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
--- End diff --

Thanks @holdenk this issue has been addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59299261
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -379,6 +379,17 @@ class HashingTF(object):
 """
 def __init__(self, numFeatures=1 << 20):
 self.numFeatures = numFeatures
+self.binary = False
+
+@since("2.0.0")
+def setBinary(self, value):
+"""
+If true, term frequency vector will be binary such that non-zero
+term counts will be set to 1
+(default: false)
--- End diff --

minor: false -> False


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r59273122
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,14 +512,19 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+binary = Param(Params._dummy(), "binary", "If true, all non zero 
counts are set to 1. " +
--- End diff --

We probably want to mention the default value here (namely false).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208538911
  
One minor note:Often we want to go with Scala first then Python, but in 
either direction if we are only doing one at a time it can be good practice to 
create either a follow up JIRA or a subtask on the existing JIRA to also expose 
the implementation in the other language.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208384386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55524/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208384380
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208384272
  
**[Test build #55524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55524/consoleFull)**
 for PR 12079 at commit 
[`829c87e`](https://github.com/apache/spark/commit/829c87e453b7082edd3d5b9d042b9fd849f060b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208380065
  
**[Test build #55524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55524/consoleFull)**
 for PR 12079 at commit 
[`829c87e`](https://github.com/apache/spark/commit/829c87e453b7082edd3d5b9d042b9fd849f060b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-04-11 Thread yongtang
Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-208378950
  
Rebased to fix conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-204013573
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-204013576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54642/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-204013346
  
**[Test build #54642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54642/consoleFull)**
 for PR 12079 at commit 
[`a71f59b`](https://github.com/apache/spark/commit/a71f59b567c998c19df362123e587fef444e6db7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread yongtang
Github user yongtang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r58083774
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+"""
+Binary toggle to control term frequency counts.
+If true, all non-zero counts are set to 1.  This is useful for 
discrete probabilistic
+models that model binary events rather than integer counts.
+(default = False)
+"""
+binary = Param(Params._dummy(), "binary",
+   "Binary toggle to control term frequency counts",
+   typeConverter=TypeConverters.toBoolean)
+
 @keyword_only
 def __init__(self, numFeatures=1 << 18, inputCol=None, outputCol=None):
--- End diff --

Thanks @yanboliang just updated the pull request with issues addressed. Let 
me know if there are  any other issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-204007123
  
**[Test build #54642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54642/consoleFull)**
 for PR 12079 at commit 
[`a71f59b`](https://github.com/apache/spark/commit/a71f59b567c998c19df362123e587fef444e6db7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r58073256
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+"""
+Binary toggle to control term frequency counts.
+If true, all non-zero counts are set to 1.  This is useful for 
discrete probabilistic
+models that model binary events rather than integer counts.
+(default = False)
+"""
+binary = Param(Params._dummy(), "binary",
+   "Binary toggle to control term frequency counts",
+   typeConverter=TypeConverters.toBoolean)
+
 @keyword_only
 def __init__(self, numFeatures=1 << 18, inputCol=None, outputCol=None):
--- End diff --

```binary``` should be arguments of ```__init__```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r58073185
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -520,6 +530,7 @@ def __init__(self, numFeatures=1 << 18, inputCol=None, 
outputCol=None):
 super(HashingTF, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.HashingTF", self.uid)
 self._setDefault(numFeatures=1 << 18)
+self._setDefault(binary=False)
--- End diff --

```self._setDefault(numFeatures=1 << 18, binary=False)```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r58072895
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+"""
+Binary toggle to control term frequency counts.
--- End diff --

The comment doc for ```binary``` is unnecessary because it will not exist 
in the generated Python API doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12079#discussion_r58072522
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -512,6 +512,16 @@ class HashingTF(JavaTransformer, HasInputCol, 
HasOutputCol, HasNumFeatures, Java
 .. versionadded:: 1.3.0
 """
 
+"""
+Binary toggle to control term frequency counts.
+If true, all non-zero counts are set to 1.  This is useful for 
discrete probabilistic
+models that model binary events rather than integer counts.
+(default = False)
+"""
+binary = Param(Params._dummy(), "binary",
+   "Binary toggle to control term frequency counts",
--- End diff --

We should keep the doc of the Param consistent with Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203973486
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203973488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54631/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203973259
  
**[Test build #54631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54631/consoleFull)**
 for PR 12079 at commit 
[`1e24a68`](https://github.com/apache/spark/commit/1e24a68806e616996b3e465f38bbbd3525a18b11).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203968139
  
**[Test build #54631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54631/consoleFull)**
 for PR 12079 at commit 
[`1e24a68`](https://github.com/apache/spark/commit/1e24a68806e616996b3e465f38bbbd3525a18b11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203944274
  
**[Test build #54623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54623/consoleFull)**
 for PR 12079 at commit 
[`e58d1a2`](https://github.com/apache/spark/commit/e58d1a279aaded9045c9e7a7a161500163b81fd6).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203944299
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203944303
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54623/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203943491
  
**[Test build #54623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54623/consoleFull)**
 for PR 12079 at commit 
[`e58d1a2`](https://github.com/apache/spark/commit/e58d1a279aaded9045c9e7a7a161500163b81fd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-31 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203941636
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12079#issuecomment-203741023
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary t...

2016-03-30 Thread yongtang
GitHub user yongtang opened a pull request:

https://github.com/apache/spark/pull/12079

[SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark 
HashingTF in ML & MLlib

## What changes were proposed in this pull request?

This fix tries to add binary toggle Param to PySpark HashingTF in ML & 
MLlib. If this toggle is set, then all non-zero counts will be set to 1.

## How was this patch tested?

This fix adds two tests to cover the code changes. One for HashingTF in 
PySpark's ML and one for HashingTF in PySpark's MLLib.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yongtang/spark SPARK-14238

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12079


commit e58d1a279aaded9045c9e7a7a161500163b81fd6
Author: Yong Tang 
Date:   2016-03-31T03:49:33Z

[SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark 
HashingTF in ML & MLlib

This fix tries to add binary toggle Param to PySpark HashingTF in ML & 
MLlib.
If this toggle is set, then all non-zero counts will be set to 1.

This fix adds two tests to cover the code changes. One for HashingTF in 
PySpark's ML
and one for HashingTF in PySpark's MLLib.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org