[
https://issues.apache.org/jira/browse/SPARK-36105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-36105.
---------------------------------
Fix Version/s: 3.2.0
Resolution: Fixed
Issue resolved by pull request 33310
[https://github.com/apache/spark/pull/33310]
> OptimizeLocalShuffleReader support reading data of multiple mappers in one
> task
> -------------------------------------------------------------------------------
>
> Key: SPARK-36105
> URL: https://issues.apache.org/jira/browse/SPARK-36105
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: Michael Zhang
> Priority: Minor
> Fix For: 3.2.0
>
>
> Right now OptimizeLocalShuffleReader tries to match the parallelism of the
> total shuffle reader against the original parallelism of the shuffle
> partition number if no coalescing (i.e., a shuffle stage without
> CustomShuffleReaderExec) or coalesced shuffle number if with coalescing
> (i.e., a shuffle stage with CustomShuffleReaderExec on top), by calling
> equallyDivide.
> This is based on the assumption that the target parallelism is bigger than
> the number of mappers, so equallyDivide will assign a range of reducer ids of
> the same mapper to each downstream task, and that is why
> PartialMapperPartitionSpec has a mapIndex together with a reducerStartIndex
> and a reducerEndIndex.
> However, it is also possible that the target parallelism is smaller than the
> number of mappers, and in that case, we need to “coalesce” the mappers by
> assigning a range of mapper ids to each downstream task. For that purpose, we
> might need to introduce a new type of ShufflePartitionSpec, which has a
> mapStartIndex and mapEndIndex , with the implication that each task will read
> all reducer outputs from mapStartIndex(inclusive) to mapEndIndex(exclusive).
> Note that this is different from CoalescedPartitionSpec which reads all
> mapper outputs from reduceStartIndex to reduceEndIndex.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]