[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3693


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68315410
  
LGTM (including the change to `norm`). Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68298901
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24865/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68298891
  
  [Test build #24865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24865/consoleFull)
 for   PR 3693 at commit 
[`ad534f2`](https://github.com/apache/spark/commit/ad534f26c44a7bdc8ee91f73d80a93bd13aa6805).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68285678
  
  [Test build #24865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24865/consoleFull)
 for   PR 3693 at commit 
[`ad534f2`](https://github.com/apache/spark/commit/ad534f26c44a7bdc8ee91f73d80a93bd13aa6805).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68285349
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68279848
  
@ganonp Could you update the branch and remove the last commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-24 Thread ganonp
Github user ganonp commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68062042
  
Sorry I didn't mean to commit that norm method for this pull request. That 
said, I think it makes sense for norm to be public or at least a d=2 version of 
norm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-23 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68004178
  
Bumping this, since it looks like it might be good to go.  Looks like 
there's a new change here related to making `norm` public, which might want 
additional review (I don't know anything about MLlib; just trying to help 
identify stuff that looks good-to-go to help drain the PR queue before the 
holidays).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-67533957
  
Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r22061443
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -121,9 +121,21 @@ class Word2Vec extends Serializable with Logging {
 
   /** context words from [-window, window] */
   private val window = 5
+  
+  /** 
+   * The minimum number of times a token must occur in the training corpus 
to be 
+   * included in the word2vec model (default: 5). 
+   */
+  private var minCount = 5
--- End diff --

Sorry, I just noticed that the other options are grouped at the top of the 
Word2Vec class.  Would you mind moving this there?  Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-18 Thread ganonp
Github user ganonp commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-67502378
  
O wow, I just didn't see that the function and everything inside was lining 
up... Hurts to look at. 

Thanks for those links and your patience. Spark now makes up about 70% of 
my job, so I'll definitely be contributing more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-67426600
  
Oh, and yes, I meant to use 4 spaces inside the function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-67425285
  
@ganonp  No problem (but we are pretty strict about style).  When in doubt, 
check out other code in the project for example.  Also, I'd recommend checking 
out these resources:
* [Contributing to 
Spark](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark)
* [Spark style 
guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)
* [Scala style guide](http://docs.scala-lang.org/style/)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread ganonp
Github user ganonp commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-67424689
  
Sorry (first time contributing), do you mean I should use 4 spaces instead 
of tab per the convention? When I do this, my additions appear out of alignment 
with the rest of the code... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r21999842
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -121,9 +121,19 @@ class Word2Vec extends Serializable with Logging {
 
   /** context words from [-window, window] */
   private val window = 5
-
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
+  
+  /** The minimum number of times a token must occur in the training 
corpus to be 
--- End diff --

This should be formatted as:
```
  /**
   * The minimum number of times a token must occur in the training corpus 
to be
   * included in the word2vec model (default: 5).
   */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r21999847
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -121,9 +121,19 @@ class Word2Vec extends Serializable with Logging {
 
   /** context words from [-window, window] */
   private val window = 5
-
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
+  
+  /** The minimum number of times a token must occur in the training 
corpus to be 
+* included in the word2vec model (default: 5). 
+*/
+  private var minCount = 5
+
+  /** Sets minCount, the minimum number of times a token must appear to be 
included in the word2vec model's 
+* vocabulary (default: 5).
+*/
+  def setMinCount(minCount: Int): this.type = {
+  this.minCount = minCount
--- End diff --

4 space indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r21999844
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -121,9 +121,19 @@ class Word2Vec extends Serializable with Logging {
 
   /** context words from [-window, window] */
   private val window = 5
-
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
+  
+  /** The minimum number of times a token must occur in the training 
corpus to be 
+* included in the word2vec model (default: 5). 
+*/
+  private var minCount = 5
+
+  /** Sets minCount, the minimum number of times a token must appear to be 
included in the word2vec model's 
--- End diff --

line should be < 100 chars.
Also, format as in the comment above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r21866649
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -122,8 +122,13 @@ class Word2Vec extends Serializable with Logging {
   /** context words from [-window, window] */
   private val window = 5
 
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
+/** minimum frequency to consider a vocabulary word */
+private var minCount = 5
+
+def setMinCount(minCount: Int): this.type = {
--- End diff --

please add javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3693#discussion_r21866645
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -122,8 +122,13 @@ class Word2Vec extends Serializable with Logging {
   /** context words from [-window, window] */
   private val window = 5
 
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
+/** minimum frequency to consider a vocabulary word */
--- End diff --

2-space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-66934430
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-14 Thread ganonp
GitHub user ganonp opened a pull request:

https://github.com/apache/spark/pull/3693

Added setMinCount to Word2Vec.scala

Wanted to customize the private minCount variable in the Word2Vec class. 
Added
a method to do so.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ganonp/spark my-custom-spark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3693


commit 5eb91000cd74ddd7704c79ca69259ee48c5840f9
Author: ganonp 
Date:   2014-12-14T21:56:19Z

Added setMinCount to Word2Vec.scala

Wanted to customize the minCount variable in the Word2Vec class. Added
a method to do so.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org