Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r86383782 --- Diff: core/src/main/scala/org/apache/spark/rdd/ShuffledRDD.scala --- @@ -104,10 +105,26 @@ class ShuffledRDD[K: ClassTag, V: ClassTag, C: ClassTag]( } override def compute(split: Partition, context: TaskContext): Iterator[(K, C)] = { + // Use -1 for our Shuffle ID since we are on the read side of the shuffle. + val shuffleWriteId = -1 + // If our task has data property accumulators we need to keep track of which partitions + // we are processing. + if (context.taskMetrics.hasDataPropertyAccumulators()) { + context.setRDDPartitionInfo(id, shuffleWriteId, split.index) + } val dep = dependencies.head.asInstanceOf[ShuffleDependency[K, V, C]] - SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index, split.index + 1, context) + val itr = SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index, split.index + 1, + context) --- End diff -- I am looking closely at the combiner code to try to confirm this. I think I believe it, I don't think its *guaranteed* to be true in the future. Eg., right now the combiners do an `insertAll` into the `ExternalAppendOnlyMap` before reading from it. But there is no reason spark couldn't change so that what it actually does is just insert the *next* key from all incoming streams into the `ExternalAppendOnlyMap`, and then feed that one key to the downstream iterators. At the very least, we need a test to ensure this doesn't break if that internal implementation were to change. (Does a test like that already exist?) Again, I'm still mulling over whether there is even a good use to bother supporting this at all ...
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org