[ https://issues.apache.org/jira/browse/SPARK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279293#comment-17279293 ]
Apache Spark commented on SPARK-32384: -------------------------------------- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/31480 > repartitionAndSortWithinPartitions avoid shuffle with same partitioner > ---------------------------------------------------------------------- > > Key: SPARK-32384 > URL: https://issues.apache.org/jira/browse/SPARK-32384 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: zhengruifeng > Priority: Minor > > In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner > is the same as the one of the RDD: > {code:java} > if (self.partitioner == Some(partitioner)) { > self.mapPartitions(iter => { > val context = TaskContext.get() > new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, > context)) > }, preservesPartitioning = true) > } else { > new ShuffledRDD[K, V, C](self, partitioner) > .setSerializer(serializer) > .setAggregator(aggregator) > .setMapSideCombine(mapSideCombine) > } > {code} > > In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped > in this case. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org