[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5748 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124739083 Merging with master with the first tests passed, and the second one's failure was unrelated. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124730905 [Test build #38395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38395/console) for PR 5748 at commit [`e308913`](https://github.com/apache/spark/commit/e308913423c4c6019b21bcb05630268bc381fa1a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124730979 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124709981 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124709822 [Test build #94 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/94/console) for PR 5748 at commit [`e308913`](https://github.com/apache/spark/commit/e308913423c4c6019b21bcb05630268bc381fa1a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124667713 [Test build #38395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38395/consoleFull) for PR 5748 at commit [`e308913`](https://github.com/apache/spark/commit/e308913423c4c6019b21bcb05630268bc381fa1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124667127 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124666886 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124665624 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124666314 [Test build #94 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/94/consoleFull) for PR 5748 at commit [`e308913`](https://github.com/apache/spark/commit/e308913423c4c6019b21bcb05630268bc381fa1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124665813 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124665485 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124606115 LGTM pending tests. Test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124600961 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124600685 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124600634 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124600254 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124599284 I just checked, and the docs for the private vals won't show up. (I checked the current docs for KMeansModel, which exposes uid but hides parentModel.) Would you mind moving that doc, just to keep things well-organized? That should be it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124597737 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124597508 [Test build #90 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/90/console) for PR 5748 at commit [`5703116`](https://github.com/apache/spark/commit/5703116acea0f3e885061e191cb1956b7d4b2ca7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124595722 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124595446 [Test build #38375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38375/console) for PR 5748 at commit [`5703116`](https://github.com/apache/spark/commit/5703116acea0f3e885061e191cb1956b7d4b2ca7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35445290 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -484,8 +480,9 @@ class Word2VecModel private[spark] ( * @return vector representation of word */ def transform(word: String): Vector = { -model.get(word) match { - case Some(vec) => +wordIndex.get(word) match { + case Some(ind) => +val vec = wordVectors.slice(ind * vectorSize, ind * vectorSize + vectorSize) --- End diff -- You're right, for this one, we have to make a copy anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124588668 Thanks for the updates! It LGTM. I'm just waiting for the docs to compile to check the param doc question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35445443 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -37,6 +37,12 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext { assert(syms.length == 2) assert(syms(0)._1 == "b") assert(syms(1)._1 == "c") + +// Test that model built using Word2Vec, i.e wordVectors and wordIndec +// and a Word2VecMap give the same values. +val word2VecMap = model.getVectors +val newModel = new Word2VecModel(word2VecMap) +assert(newModel.getVectors.mapValues(_.toSeq) == word2VecMap.mapValues(_.toSeq)) --- End diff -- Right you are --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35445108 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,36 +422,41 @@ class Word2Vec extends Serializable with Logging { * Word2Vec model */ @Experimental -class Word2VecModel private[spark] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +class Word2VecModel private[mllib] ( +private val wordIndex: Map[String, Int], +private val wordVectors: Array[Float]) extends Serializable with Saveable { // wordIndex: Maps each word to an index, which can retrieve the corresponding //vector from wordVectors (see below). - private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap + // wordVectors: Array of length numWords * vectorSize, vector corresponding --- End diff -- Good question. Do you know if it shows up in the API docs, even though it's private? (I'll check, but it may take a little while since I need to compile them.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124582475 [Test build #38375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38375/consoleFull) for PR 5748 at commit [`5703116`](https://github.com/apache/spark/commit/5703116acea0f3e885061e191cb1956b7d4b2ca7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124581994 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124581789 [Test build #90 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/90/consoleFull) for PR 5748 at commit [`5703116`](https://github.com/apache/spark/commit/5703116acea0f3e885061e191cb1956b7d4b2ca7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124581955 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124581586 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124581521 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124580801 jenkins my friend. retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124580045 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124576729 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35440919 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -484,8 +480,9 @@ class Word2VecModel private[spark] ( * @return vector representation of word */ def transform(word: String): Vector = { -model.get(word) match { - case Some(vec) => +wordIndex.get(word) match { + case Some(ind) => +val vec = wordVectors.slice(ind * vectorSize, ind * vectorSize + vectorSize) --- End diff -- It gives me a compilation error, so that also works in favor of not changing it :p --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124576755 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35439718 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -37,6 +37,12 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext { assert(syms.length == 2) assert(syms(0)._1 == "b") assert(syms(1)._1 == "c") + +// Test that model built using Word2Vec, i.e wordVectors and wordIndec +// and a Word2VecMap give the same values. +val word2VecMap = model.getVectors +val newModel = new Word2VecModel(word2VecMap) +assert(newModel.getVectors.mapValues(_.toSeq) == word2VecMap.mapValues(_.toSeq)) --- End diff -- The (word, vector) pairs are compared actually, sorry if the name `getVectors` sounds misleading, but I did not write that either :p --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35439383 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -484,8 +480,9 @@ class Word2VecModel private[spark] ( * @return vector representation of word */ def transform(word: String): Vector = { -model.get(word) match { - case Some(vec) => +wordIndex.get(word) match { + case Some(ind) => +val vec = wordVectors.slice(ind * vectorSize, ind * vectorSize + vectorSize) --- End diff -- Are you sure? I think a copy will be produced anyway. It seems if it is a collection.view then it does not produce a copy of collection. Ref: (http://stackoverflow.com/a/6799739/1170730) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35437614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,36 +422,41 @@ class Word2Vec extends Serializable with Logging { * Word2Vec model */ @Experimental -class Word2VecModel private[spark] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +class Word2VecModel private[mllib] ( +private val wordIndex: Map[String, Int], +private val wordVectors: Array[Float]) extends Serializable with Saveable { // wordIndex: Maps each word to an index, which can retrieve the corresponding //vector from wordVectors (see below). - private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap + // wordVectors: Array of length numWords * vectorSize, vector corresponding --- End diff -- But this is not meant to be public at any point of time. Is that okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124298189 It looks good, just tiny comments. We can make sure this gets into 1.5. > However, if the user provides a Word2Vec map by himself to construct the Word2Vec model (in the future, since Word2Vec model is marked as private[mllib]), it creates a huge array of size numWords * numDims. Are we okay with that? I think that's OK, though we could make that constructor public in the future. I think it would only be useful if someone wanted to load a model (created by another library) into MLlib. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392030 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -431,36 +422,41 @@ class Word2Vec extends Serializable with Logging { * Word2Vec model */ @Experimental -class Word2VecModel private[spark] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +class Word2VecModel private[mllib] ( +private val wordIndex: Map[String, Int], +private val wordVectors: Array[Float]) extends Serializable with Saveable { // wordIndex: Maps each word to an index, which can retrieve the corresponding //vector from wordVectors (see below). - private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap + // wordVectors: Array of length numWords * vectorSize, vector corresponding --- End diff -- This doc for wordIndex and wordVectors can go in the class Scala doc and use ```@param```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392033 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -508,7 +507,7 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num > 0, "Number of similar words should > 0") - +// TODO: optimize top-k --- End diff -- I see. Can you please make a JIRA and add its number to the comment here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392037 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -548,6 +545,24 @@ class Word2VecModel private[spark] ( @Experimental object Word2VecModel extends Loader[Word2VecModel] { + private def buildWordIndex(model: Map[String, Array[Float]]): Map[String, Int] = { +model.keys.zipWithIndex.toMap + } + + private def buildWordVectors(model: Map[String, Array[Float]]): Array[Float] = { +require(!model.isEmpty, "Word2VecMap should be non-empty") +val (vectorSize, numWords) = (model.head._2.size, model.size) +val wordList = model.keys.toArray +val wordVectors = new Array[Float](vectorSize * numWords) +var i = 0 +while (i < numWords) { + val vec = model.get(wordList(i)).get --- End diff -- style: Use ```model(wordList(i))``` rather than "get" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392038 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -37,6 +37,12 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext { assert(syms.length == 2) assert(syms(0)._1 == "b") assert(syms(1)._1 == "c") + +// Test that model built using Word2Vec, i.e wordVectors and wordIndec +// and a Word2VecMap give the same values. +val word2VecMap = model.getVectors +val newModel = new Word2VecModel(word2VecMap) +assert(newModel.getVectors.mapValues(_.toSeq) == word2VecMap.mapValues(_.toSeq)) --- End diff -- Could you change this to compare (word, vector) pairs, rather than just the vectors? (Also use triple equals ```===```) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392035 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -548,6 +545,24 @@ class Word2VecModel private[spark] ( @Experimental object Word2VecModel extends Loader[Word2VecModel] { + private def buildWordIndex(model: Map[String, Array[Float]]): Map[String, Int] = { +model.keys.zipWithIndex.toMap + } + + private def buildWordVectors(model: Map[String, Array[Float]]): Array[Float] = { +require(!model.isEmpty, "Word2VecMap should be non-empty") --- End diff -- nit: Use ```model.nonEmpty``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r35392031 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -484,8 +480,9 @@ class Word2VecModel private[spark] ( * @return vector representation of word */ def transform(word: String): Vector = { -model.get(word) match { - case Some(vec) => +wordIndex.get(word) match { + case Some(ind) => +val vec = wordVectors.slice(ind * vectorSize, ind * vectorSize + vectorSize) --- End diff -- Does this work if you call ```wordVectors.view.slice(...)``` instead? I think "view" will tell Scala not to physically create a copy of the slice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-124294661 I'm sorry about the long delay! I'll take a look now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user sujkh85 commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-115255982 NAVER - http://www.naver.com/ su...@naver.com ëê» ë³´ë´ì ë©ì¼ ì´ ë¤ìê³¼ ê°ì ì´ì ë¡ ì ì¡ ì¤í¨íìµëë¤. ë°ë ì¬ëì´ íìëì ë©ì¼ì ìì ì°¨ë¨ íììµëë¤. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-115255530 ping @jkbradley Can you have a look? I think it is one pass away from a merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113629287 [Test build #35309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35309/console) for PR 5748 at commit [`fa04313`](https://github.com/apache/spark/commit/fa043131902fd5633a2ecaf5651b3414bd728669). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113629316 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113608869 @jkbradley I just had a proper look at this after a long time. I think this PR succeeds in preventing the huge Word2Vec map while constructing the Word2Vec model. However, if the user provides a Word2Vec map by himself to construct the Word2Vec model (in the future, since Word2Vec model is marked as private[mllib]), it creates a huge array of size numWords * numDims. Are we okay with that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113608349 [Test build #35309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35309/consoleFull) for PR 5748 at commit [`fa04313`](https://github.com/apache/spark/commit/fa043131902fd5633a2ecaf5651b3414bd728669). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113607327 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113607350 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113596922 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113596920 [Test build #35302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35302/console) for PR 5748 at commit [`b1d61c4`](https://github.com/apache/spark/commit/b1d61c4e441d423782805dcadb017d723d812b79). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113596462 [Test build #35302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35302/consoleFull) for PR 5748 at commit [`b1d61c4`](https://github.com/apache/spark/commit/b1d61c4e441d423782805dcadb017d723d812b79). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113596253 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113596278 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110255663 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110255651 [Test build #34486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34486/console) for PR 5748 at commit [`14ee596`](https://github.com/apache/spark/commit/14ee5960ced3079231543dfe103075ae12e40e05). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228546 [Test build #34486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34486/consoleFull) for PR 5748 at commit [`14ee596`](https://github.com/apache/spark/commit/14ee5960ced3079231543dfe103075ae12e40e05). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228363 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228369 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228137 @jkbradley ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-110228122 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108029957 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108029940 [Test build #33999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33999/consoleFull) for PR 5748 at commit [`14ee596`](https://github.com/apache/spark/commit/14ee5960ced3079231543dfe103075ae12e40e05). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108007641 [Test build #33999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33999/consoleFull) for PR 5748 at commit [`14ee596`](https://github.com/apache/spark/commit/14ee5960ced3079231543dfe103075ae12e40e05). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108007522 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108007496 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-108007247 @jkbradley fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31537727 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -508,7 +507,7 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num > 0, "Number of similar words should > 0") - +// TODO: optimize top-k --- End diff -- https://github.com/apache/spark/pull/5467#discussion_r29032366 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31537671 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -426,38 +422,40 @@ class Word2Vec extends Serializable with Logging { /** * :: Experimental :: * Word2Vec model + * + * @param wordIndex: Maps each word to an index, which can retrieve the corresponding + * vector from wordVectors (see below). + * @param wordVectors: Array of length numWords * vectorSize, vector corresponding + * to the word mapped with index i can be retrieved by the slice + * (i * vectorSize, i * vectorSize + vectorSize) */ @Experimental class Word2VecModel private[mllib] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +wordIndex: Map[String, Int], +wordVectors: Array[Float]) extends Serializable with Saveable { - // wordIndex: Maps each word to an index, which can retrieve the corresponding - //vector from wordVectors (see below). - private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap - - // vectorSize: Dimension of each word's vector. - private val vectorSize = model.head._2.size private val numWords = wordIndex.size + // vectorSize: Dimension of each word's vector. + private val vectorSize = wordVectors.length / numWords + + // wordList: Ordered list of words obtained from wordIndex. + private val wordList: Array[String] = wordIndex.keys.toArray --- End diff -- I hope all this sorting does not cause regressions :P --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31533274 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -400,17 +400,13 @@ class Word2Vec extends Serializable with Logging { } newSentences.unpersist() -val word2VecMap = mutable.HashMap.empty[String, Array[Float]] +val wordArray = new Array[String](vocabSize) var i = 0 while (i < vocabSize) { - val word = bcVocab.value(i).word - val vector = new Array[Float](vectorSize) - Array.copy(syn0Global, i * vectorSize, vector, 0, vectorSize) - word2VecMap += word -> vector + wordArray(i) = bcVocab.value(i).word --- End diff -- Hmm. I just followed the convention used before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31455502 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -400,17 +400,13 @@ class Word2Vec extends Serializable with Logging { } newSentences.unpersist() -val word2VecMap = mutable.HashMap.empty[String, Array[Float]] +val wordArray = new Array[String](vocabSize) var i = 0 while (i < vocabSize) { - val word = bcVocab.value(i).word - val vector = new Array[Float](vectorSize) - Array.copy(syn0Global, i * vectorSize, vector, 0, vectorSize) - word2VecMap += word -> vector + wordArray(i) = bcVocab.value(i).word --- End diff -- This is executing on the driver, so it should not use broadcast variables. Use ```vocab``` Could be shorter to do: ``` val wordArray = vocab.map(_.word) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31455507 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -426,38 +422,40 @@ class Word2Vec extends Serializable with Logging { /** * :: Experimental :: * Word2Vec model + * + * @param wordIndex: Maps each word to an index, which can retrieve the corresponding + * vector from wordVectors (see below). + * @param wordVectors: Array of length numWords * vectorSize, vector corresponding + * to the word mapped with index i can be retrieved by the slice + * (i * vectorSize, i * vectorSize + vectorSize) */ @Experimental class Word2VecModel private[mllib] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +wordIndex: Map[String, Int], +wordVectors: Array[Float]) extends Serializable with Saveable { - // wordIndex: Maps each word to an index, which can retrieve the corresponding - //vector from wordVectors (see below). - private val wordIndex: Map[String, Int] = wordList.zip(0 until model.size).toMap - - // vectorSize: Dimension of each word's vector. - private val vectorSize = model.head._2.size private val numWords = wordIndex.size + // vectorSize: Dimension of each word's vector. + private val vectorSize = wordVectors.length / numWords + + // wordList: Ordered list of words obtained from wordIndex. + private val wordList: Array[String] = wordIndex.keys.toArray --- End diff -- This should sort by ```wordIndex._2``` to make sure the order matches wordVectors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31455519 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -508,7 +507,7 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num > 0, "Number of similar words should > 0") - +// TODO: optimize top-k --- End diff -- Is there a JIRA for this? If so, can you please note the JIRA number here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31455505 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -426,38 +422,40 @@ class Word2Vec extends Serializable with Logging { /** * :: Experimental :: * Word2Vec model + * + * @param wordIndex: Maps each word to an index, which can retrieve the corresponding + * vector from wordVectors (see below). + * @param wordVectors: Array of length numWords * vectorSize, vector corresponding + * to the word mapped with index i can be retrieved by the slice + * (i * vectorSize, i * vectorSize + vectorSize) */ @Experimental class Word2VecModel private[mllib] ( -model: Map[String, Array[Float]]) extends Serializable with Saveable { - - // wordList: Ordered list of words obtained from model. - private val wordList: Array[String] = model.keys.toArray +wordIndex: Map[String, Int], --- End diff -- Make this and wordVectors private vals --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5748#discussion_r31455524 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -38,6 +38,13 @@ class Word2VecSuite extends FunSuite with MLlibTestSparkContext { assert(syms.length == 2) assert(syms(0)._1 == "b") assert(syms(1)._1 == "c") + +val word2VecMap = model.getVectors +val newModel = new Word2VecModel(word2VecMap) +val newSyms = newModel.findSynonyms("a", 2) --- End diff -- Instead of testing newModel like this, can you just compare the model data with the original model? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-104109512 @jkbradley ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-100158192 @jkbradley can you have a look at this too? even if it won't be in this release? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97523885 The code cutoff is this Friday --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97517846 when is the release scheduled? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97516897 I'll try to review this before the code cutoff, but it might slip to 1.5. I think that's OK since it's an internal improvement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97303637 Btw, I addressed the minor comments in this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97193688 [Test build #31150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31150/consoleFull) for PR 5748 at commit [`a17d9c9`](https://github.com/apache/spark/commit/a17d9c9ec568bca12f884720d7685176ce07d7d6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97193722 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97193731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31150/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97165628 [Test build #31150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31150/consoleFull) for PR 5748 at commit [`a17d9c9`](https://github.com/apache/spark/commit/a17d9c9ec568bca12f884720d7685176ce07d7d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97165262 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97165242 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-97164779 cc @mengxr @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/5748 [SPARK-7045] [MLlib] Avoid intermediate representation when creating model Word2Vec used to convert from an Array[Float] representation to a Map[String, Array[Float]] and then back to an Array[Float] through Word2VecModel. This prevents this conversion while still supporting the older method of supplying a Map. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark spark-7045 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5748.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5748 commit a17d9c9ec568bca12f884720d7685176ce07d7d6 Author: MechCoder Date: 2015-04-28T18:23:15Z [SPARK-7045] Avoid intermediate representation when creating model --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org