[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12176 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210439965 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210433724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210433732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55921/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210433327 **[Test build #55921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55921/consoleFull)** for PR 12176 at commit [`8107dc2`](https://github.com/apache/spark/commit/8107dc29f17996128cfe2942baa15d371bcef9dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210419520 **[Test build #55921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55921/consoleFull)** for PR 12176 at commit [`8107dc2`](https://github.com/apache/spark/commit/8107dc29f17996128cfe2942baa15d371bcef9dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-210419431 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r59852468 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -536,13 +532,16 @@ private[clustering] object OnlineLDAOptimizer { * An optimization (Lee, Seung: Algorithms for non-negative matrix factorization, NIPS 2001) * avoids explicit computation of variational parameter `phi`. * @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]] + * + * @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the --- End diff -- @srowen Updated the return doc accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r59848898 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -536,13 +532,16 @@ private[clustering] object OnlineLDAOptimizer { * An optimization (Lee, Seung: Algorithms for non-negative matrix factorization, NIPS 2001) * avoids explicit computation of variational parameter `phi`. * @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]] + * + * @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the --- End diff -- Sounds fine to me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r59844091 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -536,13 +532,16 @@ private[clustering] object OnlineLDAOptimizer { * An optimization (Lee, Seung: Algorithms for non-negative matrix factorization, NIPS 2001) * avoids explicit computation of variational parameter `phi`. * @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]] + * + * @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the --- End diff -- @srowen How about this return doc: > Returns a tuple of `gammad` - estimate of gamma, the topic distribution, `sstatsd` - statistics for updating lambda and `ids` - list of termCounts vector indices --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-209677538 LGTM too pending the doc update (thanks for that) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-209493926 @pravingadakh if you'll augment that doc a little bit more I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r59135877 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -536,13 +532,16 @@ private[clustering] object OnlineLDAOptimizer { * An optimization (Lee, Seung: Algorithms for non-negative matrix factorization, NIPS 2001) * avoids explicit computation of variational parameter `phi`. * @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]] + * + * @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the --- End diff -- This doesn't really describe what they are, but simply one place they're called. It doesn't need to be elaborate; see the explanations of what the firs two are above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-207929631 @srowen Added `@return` for the method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-207894385 @pravingadakh I'll merge if you'll doc the return value of this method. That further justifies the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206674436 Sean is right about the meanings of the return values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206410312 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206410316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55113/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206410132 **[Test build #55113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55113/consoleFull)** for PR 12176 at commit [`fdc8bb0`](https://github.com/apache/spark/commit/fdc8bb02e5b75e1b9691f072e70a0e478c24ee84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206388932 **[Test build #55113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55113/consoleFull)** for PR 12176 at commit [`fdc8bb0`](https://github.com/apache/spark/commit/fdc8bb02e5b75e1b9691f072e70a0e478c24ee84). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206388117 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-206119313 @jkbradley Could you please flag this PR for test build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-205952381 Ha fair enough. From poking around the internet I'm pretty sure the first is an estimate of gamma, the topic distribution (as the comments say) and the second are sufficient statistics for updating lambda. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-205918384 I could document the returned values, but frankly I have no idea what those values are. I can see the documentation of first returned value `gammad` in the comments but none for others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-205882669 A-ha, understood about `.values`. This looks pretty reasonable. My only question is, does it make sense conceptually that this method also returns a list of IDs? it doesn't hurt much in practice, and it seems like there's a reasonable argument for it logically. We have one case where the caller needs it after all. Maybe finish this by documenting the three things returned from this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r58553918 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -542,10 +539,11 @@ private[clustering] object OnlineLDAOptimizer { expElogbeta: BDM[Double], alpha: breeze.linalg.Vector[Double], gammaShape: Double, - k: Int): (BDV[Double], BDM[Double]) = { -val (ids: List[Int], cts: Array[Double]) = termCounts match { - case v: DenseVector => ((0 until v.size).toList, v.values) - case v: SparseVector => (v.indices.toList, v.values) + k: Int, + ids: List[Int]): (BDV[Double], BDM[Double]) = { +val cts: Array[Double] = termCounts match { --- End diff -- Yes it looks redundant, but `values` is not a member of parent trait `Vector`, it is defined at individual implementation level (i.e. `DenseVector` and `SparseVector`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user pravingadakh commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r58553056 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -440,12 +440,9 @@ final class OnlineLDAOptimizer extends LDAOptimizer { val stat = BDM.zeros[Double](k, vocabSize) var gammaPart = List[BDV[Double]]() nonEmptyDocs.foreach { case (_, termCounts: Vector) => -val ids: List[Int] = termCounts match { - case v: DenseVector => (0 until v.size).toList - case v: SparseVector => v.indices.toList -} +val ids: List[Int] = LDAUtils.vectorAsList(termCounts) --- End diff -- Yes I agree. It is better to simply return `ids` rather than having callers to compute it. I will make that change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r58519101 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -440,12 +440,9 @@ final class OnlineLDAOptimizer extends LDAOptimizer { val stat = BDM.zeros[Double](k, vocabSize) var gammaPart = List[BDV[Double]]() nonEmptyDocs.foreach { case (_, termCounts: Vector) => -val ids: List[Int] = termCounts match { - case v: DenseVector => (0 until v.size).toList - case v: SparseVector => v.indices.toList -} +val ids: List[Int] = LDAUtils.vectorAsList(termCounts) --- End diff -- I see what you're optimizing here, but the cost is making all the other callers compute this. Is it enough of a performance bottleneck to bother? Is it at all sensible to simply return `ids` from the existing method call instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12176#discussion_r58518958 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -542,10 +539,11 @@ private[clustering] object OnlineLDAOptimizer { expElogbeta: BDM[Double], alpha: breeze.linalg.Vector[Double], gammaShape: Double, - k: Int): (BDV[Double], BDM[Double]) = { -val (ids: List[Int], cts: Array[Double]) = termCounts match { - case v: DenseVector => ((0 until v.size).toList, v.values) - case v: SparseVector => (v.indices.toList, v.values) + k: Int, + ids: List[Int]): (BDV[Double], BDM[Double]) = { +val cts: Array[Double] = termCounts match { --- End diff -- The `match` here becomes redundant right? both cases return `v.values` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12176#issuecomment-205741735 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...
GitHub user pravingadakh opened a pull request: https://github.com/apache/spark/pull/12176 [SPARK-14370][MLLIB]removed duplicate generation of ids in OnlineLDAOptimizer ## What changes were proposed in this pull request? Removed duplicated generation of `ids` in OnlineLDAOptimizer. ## How was this patch tested? tested with existing unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/pravingadakh/spark SPARK-14370 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12176 commit 12acf536bc920eeef7518c183b61e38c06ecacd2 Author: Pravin GadakhDate: 2016-04-05T10:04:03Z removed duplicate generation of ids in OnlineLDAOptimizer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org