[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17673
  
@ngopal this one can't be merged as-is and looks like it was abandoned. 
Would you like to take this PR, update per reviews? I'd review that. I think 
CBOW could be useful in MLlib.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-12-07 Thread ngopal
Github user ngopal commented on the issue:

https://github.com/apache/spark/pull/17673
  
When can we anticipate this branch being merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-07-13 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17673
  
@shubhamchopra  are you still working on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-06-28 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17673
  
Jenkins OK to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-11-06 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
@hhbyyh Thanks for your suggestions. Will try to incorporate these in a day 
or so. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82569/testReport)**
 for PR 17673 at commit 
[`9090b96`](https://github.com/apache/spark/commit/9090b967e03e43e3a709d9c2c94fe75de5b9a8e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82569/testReport)**
 for PR 17673 at commit 
[`9090b96`](https://github.com/apache/spark/commit/9090b967e03e43e3a709d9c2c94fe75de5b9a8e6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82568/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82568/testReport)**
 for PR 17673 at commit 
[`236e4c1`](https://github.com/apache/spark/commit/236e4c1db2051f2f8a0435e753df3579afdfeb5e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82568/testReport)**
 for PR 17673 at commit 
[`236e4c1`](https://github.com/apache/spark/commit/236e4c1db2051f2f8a0435e753df3579afdfeb5e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-10-05 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
Thanks for your comments/suggestions @MLnick and @sethah . Working on 
incorporating these.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82005/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82005/testReport)**
 for PR 17673 at commit 
[`64a5a6b`](https://github.com/apache/spark/commit/64a5a6b2b3cacedc82b24bde9347fee272b78849).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #82005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82005/testReport)**
 for PR 17673 at commit 
[`64a5a6b`](https://github.com/apache/spark/commit/64a5a6b2b3cacedc82b24bde9347fee272b78849).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81320/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #81320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81320/testReport)**
 for PR 17673 at commit 
[`361d79d`](https://github.com/apache/spark/commit/361d79ddeab78889cd5a0a63f21d1e446a7a34fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-09-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #81320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81320/testReport)**
 for PR 17673 at commit 
[`361d79d`](https://github.com/apache/spark/commit/361d79ddeab78889cd5a0a63f21d1e446a7a34fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81231/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #81231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81231/testReport)**
 for PR 17673 at commit 
[`948cc15`](https://github.com/apache/spark/commit/948cc15b67113b8ad74b67eeef13a39f55b7313a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #81231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81231/testReport)**
 for PR 17673 at commit 
[`948cc15`](https://github.com/apache/spark/commit/948cc15b67113b8ad74b67eeef13a39f55b7313a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80243/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #80243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80243/testReport)**
 for PR 17673 at commit 
[`feda8dc`](https://github.com/apache/spark/commit/feda8dce8c2832bd1a3c61a84bfac9a23629866a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17673
  
**[Test build #80243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80243/testReport)**
 for PR 17673 at commit 
[`feda8dc`](https://github.com/apache/spark/commit/feda8dce8c2832bd1a3c61a84bfac9a23629866a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-08-04 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17673
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-18 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
Code-review comments/suggestions so far have been incorporated. Thanks for 
looking into the code. Happy to incorporate more suggestions and feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-04 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
@MLnick I half expected that. No worries. I have incorporated some of your 
feedback in the meantime and also added subsampling as well. Thanks for looking 
into the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-05-03 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17673
  
FYI, realistically there won't be bandwidth to really focus on this until 
after Spark 2.2 QA is done at the earliest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-30 Thread Krimit
Github user Krimit commented on the issue:

https://github.com/apache/spark/pull/17673
  
Thanks for the detailed response @shubhamchopra. 
I'd like to clarify my point about whether this should be implemented in 
Spark: Spark MlLib is first and foremost a framework for doing ML on large 
datasets where other existing implementations (such as ``scikit-learn``) are 
impractical. A reality of ML is that often increasing the size (and quality) of 
the training data is much more important than tweaking model hyper-parameters. 
Therefore as a community, I think our focus should be more on robustness than 
on "completeness".

While having additional algorithms available for tuning can helpful, I 
would personally be more interested in additions that offer significant and 
clear benefits (such as ``GloVe`` which should be much faster to train and a 
really good fit for Spark due the natural parallelization of the problem).

With that said, I'm not opposed to adding CBOW, so long as we vet it. As 
part of having this merged in, I think ideally we should run an experiment on a 
large-ish dataset (wikipedia?) comparing the two implementations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-28 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
@Krimit 
_Can you provide some information about the practical differences between 
CBOW and skip-grams?_
![Model 
Architectures](https://cloud.githubusercontent.com/assets/6588487/25546610/d0f95aa8-2c31-11e7-8b47-4f9d31254f0f.png)
As mentioned in [this paper](https://arxiv.org/pdf/1301.3781.pdf), CBOW 
model looks at the words around a target word, and tries to predict the target 
word. SkipGram does just the opposite. Given a target word, it tries to predict 
the context words around it. The prediction is done using a very simple neural 
network with a single hidden layer. 

_Wikipedia quotes the author (I assume they mean Tomas) as saying that CBOW 
is faster while skip-gram is slower but does a better job for infrequent words. 
Has this been your experience as well? How pronounced is the difference?_ 
The current CBOW + Negative Sampling I found to take almost the same time 
as the existing SkipGram + Hierarchical sampling. The negative sampling is 
tunable, and the performance will be slower for a higher number of negative 
samples.

_in what cases would a user choose one over the other?  I'm basically 
seconding @hhbyyh's comment on a more in-depth comparison experiment._
There is a good amount of research around this with comparison experiments. 
It appears to largely depend on the application embeddings would be used for. 
[Levy et al](http://www.aclweb.org/anthology/Q15-1016) show how different 
methods perform with extensive experiments. They used the embeddings to perform 
similarity, relatedness and other tests on some open datasets.

[Mikolov et 
al](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
 found SkipGram with Negative Sampling to outperform CBOW. [Baroni et 
al](http://anthology.aclweb.org/P/P14/P14-1023.pdf) found that CBOW had a 
slight advantage. [Levy et al](http://www.aclweb.org/anthology/Q15-1016) 
explain that while CBOW did not perform as well in their experiments, others 
have shown that capturing joint contexts (CBOW does this) can improve 
performance on word similarity tasks. They also saw CBOW to perform well in 
analogy tasks. So again, it depends on the task being performed.

[Mikolov et al](https://arxiv.org/pdf/1309.4168.pdf) recommend using 
Skip-Gram when mono-lingual data is small and CBOW for larger datasets.

_The fact that the original paper has both implementations is not in itself 
enough of a reason for Spark to do the same, IMO_
This is an active area of research, and both methods generate embeddings 
that perform well on different tasks. As a library providing these 
implementations, the choice I think is best left to the user and the 
application it is being used for.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-27 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
@Krimit @MLnick @hhbyyh I am working on getting your earlier queries 
answered.
@Krimit Thanks for looking into the code, I will try to get the code-review 
feedback incorporated in a couple of day or so. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-26 Thread Krimit
Github user Krimit commented on the issue:

https://github.com/apache/spark/pull/17673
  
@shubhamchopra have you run this code in a distributed spark cluster yet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-25 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17673
  
I can maybe help out a bit in a week and a bit (I've also done some poking 
inside of Word2Vec) but I need to wrap up some travel and Python stuff first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-25 Thread Krimit
Github user Krimit commented on the issue:

https://github.com/apache/spark/pull/17673
  
I'm happy to take a look! I'll have some time to dig in deeper tomorrow. 
Some of my initial impressions:
* There's a lot going on here, I agree with @hhbyyh that it would be 
cleaner to put the CBOW code in a new class
* Can you provide some information about the practical differences between 
CBOW and skip-grams? Wikipedia quotes the author (I assume they mean Tomas) as 
saying that ``CBOW is faster while skip-gram is slower but does a better job 
for infrequent words``. Has this been your experience as well? How pronounced 
is the difference? in what cases would a user choose one over the other? I'm 
basically seconding @hhbyyh's comment on a more in-depth comparison experiment. 
The fact that the original paper has both implementations is not in itself 
enough of a reason for Spark to do the same, IMO


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17673
  
It would be ideal to have both methods, but I'm worried about reviewer 
bandwidth vs priority on this.

@Krimit you were working on Word2Vec recently - thoughts? Perhaps you have 
time to help on review also?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-20 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17673
  
Thanks for working on this, I'm traveling right now but maybe @MLNick has 
some bandwith to look at this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-19 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/17673
  
The [original paper](https://arxiv.org/abs/1301.3781) proposed two model 
architectures for generating word embeddings, Continuous Skip-Gram model and 
continuous Bag-of-words model. Spark ML currently only implements the SkipGram 
model. This PR adds the continuous bag of words model. As such the models 
compete with each other, and this implementation would give users options to 
settle on one which suits their data best.

The implementation is based largely on the [original C 
implementation](https://code.google.com/archive/p/word2vec/). I implemented 
this using Negative Sampling, as that was shown to have good performance 
[here](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).
 I tried to vectorize operations using BLAS where possible.
 
I don't understand what you mean by "MLP" implementation. Can you please 
clarify? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-18 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17673
  
Thanks for sharing the work. To help make the review easier, I would 
recommend:
1. Provide some background info. 
 Is the new algorithm better than the existing one and in which cases?
 compare with other lib or implementation of the algorithm.

2. Provide some description about your implementation.
 algorithm accuracy, scalability compared with the existing Word2Vec. 
 Is there any know issue or the limitation. 
 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17673
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org