[GitHub] spark issue #13735: [SPARK-15328][MLLIB][ML] Word2Vec import for original bi...
Github user insidedctm commented on the issue: https://github.com/apache/spark/pull/13735 This seems to work fine with small model such as that produced by demo_word.sh in the word2vec code repository however I get problems when trying a large model such as [GoogleNews-vectors-negative300.bin](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing). I can successfully load the model using this code (albeit I needed to give the driver 12GB of memory): `import org.apache.spark.ml.feature.Word2VecModel` `val path = "file:///Downloads/GoogleNews-vectors-negative300.bin"` `val model = Word2VecModel.loadGoogleModel(path)` However synonyms are not found for a typical lookup e.g. `model.findSynonyms("spark",20).show` responds with `java.lang.IllegalStateException: spark not in vocabulary` However the distance tool from the word2vec toolkit, loading the same model gives: https://cloud.githubusercontent.com/assets/5909684/18549055/0a60f9da-7b44-11e6-895c-88ee018ed1a9.png";> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650][GraphX] Triangle Count handles re...
Github user insidedctm commented on the pull request: https://github.com/apache/spark/pull/11290#issuecomment-186838624 @srowen good points, I've updated and pushed changes in line with your comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650][GraphX] Triangle Count handles re...
GitHub user insidedctm opened a pull request: https://github.com/apache/spark/pull/11290 [SPARK-3650][GraphX] Triangle Count handles reverse edges incorrectly ## What changes were proposed in this pull request? Reworking of @jegonzal PR #2495 to address the issue identified in SPARK-3650. Code amended to use the convertToCanonicalEdges method. ## How was the this patch tested? Patch was tested using the unit tests created in PR #2495 You can merge this pull request into a Git repository by running: $ git pull https://github.com/insidedctm/spark spark-3650 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11290 commit 428fa26880bb32f04d0799d2c227e52defb99428 Author: Robin East Date: 2015-09-14T21:09:26Z Change bytes to bits in RoutingTablePartition.toMessage commit cf66402fb77855711ffd17ddb3efa58c7d44296e Author: Robin East Date: 2016-02-18T18:38:37Z Merge remote-tracking branch 'upstream/master' commit 96fcc0aae84450d6cc3edf046807048b2d8c2db1 Author: Joseph E. Gonzalez Date: 2014-09-22T21:57:28Z Improving Triangle Count commit 1edc09df8e32b6717aa300fe62636a9613bcbc27 Author: Joseph E. Gonzalez Date: 2014-09-22T22:16:46Z fixing bug in unit tests where bi-directed edges lead to duplicate triangles. commit 47673cadc957eb35dbab01cdcbbe21382987e691 Author: Joseph E. Gonzalez Date: 2014-11-13T07:18:58Z factored out code for canonicalization commit c6cd74792d4f82e562d1c792d322f17b1877d4af Author: Robin East Date: 2016-02-20T21:46:49Z SPARK-3650 updates to PR 2495 to work with current master commit c8ad0bd4ed998b86a465bc36ec59ddc5dcceef5e Author: Robin East Date: 2016-02-21T11:27:10Z revert unexpected changes to R/pkg/DESCRIPTION --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user insidedctm commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-140515615 @pwendell can this be opened again? As per my discussion on the the JIRA ticket this is an issue that came up on the mailing list recently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10598][DOCS]
GitHub user insidedctm opened a pull request: https://github.com/apache/spark/pull/8756 [SPARK-10598][DOCS] Comments preceding toMessage method state: "The edge partition is encoded in the lower * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to bytes should be changed to bits. This contribution is my original work and I license the work to the Spark project under it's open source license. You can merge this pull request into a Git repository by running: $ git pull https://github.com/insidedctm/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8756 commit 428fa26880bb32f04d0799d2c227e52defb99428 Author: Robin East Date: 2015-09-14T21:09:26Z Change bytes to bits in RoutingTablePartition.toMessage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org