[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread maji2014
Github user maji2014 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3553#discussion_r21213166
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
   } else {
 new InterruptibleIterator(context, 
dep.aggregator.get.combineValuesByKey(iter, context))
   }
-} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+} else if (dep.mapSideCombine) {
--- End diff --

Yes, this seems simple and elegant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread maji2014
Github user maji2014 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3553#discussion_r21213167
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
   } else {
 new InterruptibleIterator(context, 
dep.aggregator.get.combineValuesByKey(iter, context))
   }
-} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+} else if (dep.mapSideCombine) {
--- End diff --

Yes, this seems simple and elegant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3553#discussion_r21213023
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
   } else {
 new InterruptibleIterator(context, 
dep.aggregator.get.combineValuesByKey(iter, context))
   }
-} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+} else if (dep.mapSideCombine) {
--- End diff --

Could also write this as

```scala
if (dep.aggregator.isDefined) {
  ...
} else {
  require(!dep.mapSideCombine, "Map-side combine requested without 
Aggregator specified!")

  // Convert the Product2s to pairs since this is what downstream RDDs 
currently expect
  iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, 
pair._2))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread maji2014
Github user maji2014 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3553#discussion_r21207386
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
   } else {
 new InterruptibleIterator(context, 
dep.aggregator.get.combineValuesByKey(iter, context))
   }
-} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+} else if (dep.mapSideCombine) {
--- End diff --

"if(dep.aggregator.isDefined) else if (dep.aggregator.isEmpty)"  seems 
duplicate. 
isEmpty == !isDefined
We need to do another one more judgement for "dep.aggregator.isEmpty".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/3553#discussion_r21205859
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
   } else {
 new InterruptibleIterator(context, 
dep.aggregator.get.combineValuesByKey(iter, context))
   }
-} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+} else if (dep.mapSideCombine) {
--- End diff --

I think the previous way is much more clear and obvious from my 
understanding :-).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3553#issuecomment-65207158
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

2014-12-02 Thread maji2014
GitHub user maji2014 opened a pull request:

https://github.com/apache/spark/pull/3553

[spark-4691][shuffle]code optimization for judgement

In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge 
"dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"

In SortShuffleWriter.scala, "dep.aggregator.isEmpty"  is better than 
"!dep.aggregator.isDefined" ?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maji2014/spark spark-4691

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3553


commit d8f52dc7dc34bc6c1c368d790b7bdfe30c4eb529
Author: maji2014 
Date:   2014-12-02T09:54:33Z

code optimization for judgement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org