[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user maji2014 commented on a diff in the pull request: https://github.com/apache/spark/pull/3553#discussion_r21213166 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C]( } else { new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context)) } -} else if (dep.aggregator.isEmpty && dep.mapSideCombine) { +} else if (dep.mapSideCombine) { --- End diff -- Yes, this seems simple and elegant --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user maji2014 commented on a diff in the pull request: https://github.com/apache/spark/pull/3553#discussion_r21213167 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C]( } else { new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context)) } -} else if (dep.aggregator.isEmpty && dep.mapSideCombine) { +} else if (dep.mapSideCombine) { --- End diff -- Yes, this seems simple and elegant --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3553#discussion_r21213023 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C]( } else { new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context)) } -} else if (dep.aggregator.isEmpty && dep.mapSideCombine) { +} else if (dep.mapSideCombine) { --- End diff -- Could also write this as ```scala if (dep.aggregator.isDefined) { ... } else { require(!dep.mapSideCombine, "Map-side combine requested without Aggregator specified!") // Convert the Product2s to pairs since this is what downstream RDDs currently expect iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2)) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user maji2014 commented on a diff in the pull request: https://github.com/apache/spark/pull/3553#discussion_r21207386 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C]( } else { new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context)) } -} else if (dep.aggregator.isEmpty && dep.mapSideCombine) { +} else if (dep.mapSideCombine) { --- End diff -- "if(dep.aggregator.isDefined) else if (dep.aggregator.isEmpty)" seems duplicate. isEmpty == !isDefined We need to do another one more judgement for "dep.aggregator.isEmpty". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/3553#discussion_r21205859 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala --- @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C]( } else { new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context)) } -} else if (dep.aggregator.isEmpty && dep.mapSideCombine) { +} else if (dep.mapSideCombine) { --- End diff -- I think the previous way is much more clear and obvious from my understanding :-). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3553#issuecomment-65207158 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...
GitHub user maji2014 opened a pull request: https://github.com/apache/spark/pull/3553 [spark-4691][shuffle]code optimization for judgement In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined" In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ? You can merge this pull request into a Git repository by running: $ git pull https://github.com/maji2014/spark spark-4691 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3553.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3553 commit d8f52dc7dc34bc6c1c368d790b7bdfe30c4eb529 Author: maji2014 Date: 2014-12-02T09:54:33Z code optimization for judgement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org