[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49540356 Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1447 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49401225 This looks okay to me as is, @rxin what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49407083 Upmerged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49407225 QA tests have started for PR 1447. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49415317 QA results for PR 1447:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042631 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- This I'm sure has been done a million times, but one more for good luck -- here is a `map { case (x, y) = x + y}` versus `foreach { xy = xy._1 + xy._2}`: http://www.diffchecker.com/vtv6cptx No new virtual function calls, but one new branch (trivially branch predictable) and one new throw (which will never be invoked), and several store/loads. (Perhaps I'm misreading the bytecode, but isn't `astore_2` followed by `aload_2` a no-op?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042857 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- I think we tested this in the past and saw a difference, perhaps because it can now throw. But I don't remember the details. Weird that it's also doing those stores. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042897 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- There are all kinds of stuff that's really hard to profile at runtime. For example, the JVM might decide not to JIT the code because it is longer. It might decide not to do it because of exceptions. Branch prediction might stop working because it has too many branches. Etc. Again, I'm only in favor of this change because it is small, and it might have impact. If it is a big ugly change, I would be against it without profiling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042905 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), partitioner) -cg.mapValues { case Seq(vs, w1s, w2s, w3s) = - (vs.asInstanceOf[Seq[V]], -w1s.asInstanceOf[Seq[W1]], -w2s.asInstanceOf[Seq[W2]], -w3s.asInstanceOf[Seq[W3]]) -} +cg.mapValues { seq = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], Seq[W3])] } --- End diff -- And yeah as Reynold pointed out, this won't actually cast. A Seq is not a tuple4, you need to make a new one. In this case it might be okay to leave the code as is, because the non-pattern-matching way will be hairier. Basically you'd have to do `seq = (seq(0).asInstanceOf[Seq[V]], seq(1).asInstanceOf[Seq[W1]], etc)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15043849 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), partitioner) -cg.mapValues { case Seq(vs, w1s, w2s, w3s) = - (vs.asInstanceOf[Seq[V]], -w1s.asInstanceOf[Seq[W1]], -w2s.asInstanceOf[Seq[W2]], -w3s.asInstanceOf[Seq[W3]]) -} +cg.mapValues { seq = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], Seq[W3])] } --- End diff -- Ah, right. I'll leave these as they are for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15043860 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- Also on removing the calls to _1, these values should be in cache, so accesses will be really fast. Will hold off on these for now, but happy to make the change if y'all want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15044028 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) val index = p.getPartition(key) def process(it: Iterator[(K, V)]): Seq[V] = { val buf = new ArrayBuffer[V] - for ((k, v) - it if k == key) { --- End diff -- Wait, actually, my understanding of this loop is that it's iterating over every record within the partition. Am I missing something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15044062 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) val index = p.getPartition(key) def process(it: Iterator[(K, V)]): Seq[V] = { val buf = new ArrayBuffer[V] - for ((k, v) - it if k == key) { --- End diff -- Actually yes my bad :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49272425 QA tests have started for PR 1447. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49287721 QA results for PR 1447:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1447 SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... ...s of CoGroupedRDD and PairRDDFunctions This also removes an unnecessary tuple creation in cogroup. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2519-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1447 commit 906e6b362db11a4ac5a3f8096668c6f968dc48aa Author: Sandy Ryza sa...@cloudera.com Date: 2014-07-16T21:00:17Z SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sections of CoGroupedRDD and PairRDDFunctions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037091 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- can we save the key into a variable here so we don't call _1 twice? I was looking at the bytecode generated by pattern matching versus non-pattern-matching, and the main difference is that we bypass some unnecessary branches: {code} final def apply(x0$1: Tuple2): Unit = { case synthetic val x1: Tuple2 = x0$1; case4(){ if (x1.ne(null)) { val k: Int = x1._1$mcI$sp(); val v: Int = x1._2$mcI$sp(); matchEnd3({ scala.this.Predef.println(scala.Int.box(k.+(v))); scala.runtime.BoxedUnit.UNIT }) } else case5() }; case5(){ matchEnd3(throw new MatchError(x1)) }; matchEnd3(x: runtime.BoxedUnit){ () } {code} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037098 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) +map.put(pair._1, if (old == null) pair._2 else func(old, pair._2)) } Iterator(map) } def mergeMaps(m1: JHashMap[K, V], m2: JHashMap[K, V]): JHashMap[K, V] = { - m2.foreach { case (k, v) = -val old = m1.get(k) -m1.put(k, if (old == null) v else func(old, v)) + m2.foreach { pair = +val old = m1.get(pair._1) --- End diff -- again, create a variable to save the key --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037121 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -413,11 +413,11 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) * partition the output RDD. */ def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, Option[W]))] = { -this.cogroup(other, partitioner).flatMapValues { case (vs, ws) = - if (ws.isEmpty) { -vs.map(v = (v, None)) +this.cogroup(other, partitioner).flatMapValues { pair = + if (pair._2.isEmpty) { --- End diff -- save the key and value, i.e. ```scala this.cogroup(other, partitioner).flatMapValues { pair = val vs = pair._1 val ws = pair._2 ... same old code ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037134 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), partitioner) -cg.mapValues { case Seq(vs, w1s, w2s, w3s) = - (vs.asInstanceOf[Seq[V]], -w1s.asInstanceOf[Seq[W1]], -w2s.asInstanceOf[Seq[W2]], -w3s.asInstanceOf[Seq[W3]]) -} +cg.mapValues { seq = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], Seq[W3])] } --- End diff -- extra space here. Does the type casting work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037164 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) val index = p.getPartition(key) def process(it: Iterator[(K, V)]): Seq[V] = { val buf = new ArrayBuffer[V] - for ((k, v) - it if k == key) { --- End diff -- we can probably revert this change since this only works on the partitions rather than per record --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037689 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) val index = p.getPartition(key) def process(it: Iterator[(K, V)]): Seq[V] = { val buf = new ArrayBuffer[V] - for ((k, v) - it if k == key) { --- End diff -- Ah, misread --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15038052 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- If we're using functional style anyway, do we really expect to see any sort of remotely noticeable gain from doing these kinds of optimizations? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15038340 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- It's really hard to tell. I think it is really hard to tell when doing microbenchmarks because branch prediction takes care of the extra branches. But I don't know how this impacts real runtime in a real workload without a lot more instrumentation. My opinion is since it is a small change, it's ok to just do it this way. If it is a big refactoring, that's a very different story and deserves more profiling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042032 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] = { val map = new JHashMap[K, V] - iter.foreach { case (k, v) = -val old = map.get(k) -map.put(k, if (old == null) v else func(old, v)) + iter.foreach { pair = +val old = map.get(pair._1) --- End diff -- Removing the call to _1 isn't hugely helpful IMO; accessor methods get inlined very fast. However, pattern matching adds a whole bunch of potential throw sites and branches and things like that, and we've noticed problems in the past. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042052 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), partitioner) -cg.mapValues { case Seq(vs, w1s, w2s, w3s) = - (vs.asInstanceOf[Seq[V]], -w1s.asInstanceOf[Seq[W1]], -w2s.asInstanceOf[Seq[W2]], -w3s.asInstanceOf[Seq[W3]]) -} +cg.mapValues { seq = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], Seq[W3])] } --- End diff -- This should actually return Iterable, not Seq --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042054 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -589,9 +584,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other), partitioner) -cg.mapValues { case Seq(vs, ws) = - (vs.asInstanceOf[Seq[V]], ws.asInstanceOf[Seq[W]]) -} +cg.mapValues { pair = pair.asInstanceOf[(Seq[V], Seq[W])] } --- End diff -- Same here, should be Iterable (although the types work okay now, it's confusing) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042061 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -604,11 +597,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) throw new SparkException(Default partitioner cannot partition array keys.) } val cg = new CoGroupedRDD[K](Seq(self, other1, other2), partitioner) -cg.mapValues { case Seq(vs, w1s, w2s) = - (vs.asInstanceOf[Seq[V]], - w1s.asInstanceOf[Seq[W1]], - w2s.asInstanceOf[Seq[W2]]) -} +cg.mapValues { seq = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2])] } --- End diff -- Same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---