cloud-fan commented on a change in pull request #33310: URL: https://github.com/apache/spark/pull/33310#discussion_r669399284
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala ########## @@ -85,12 +85,24 @@ object OptimizeLocalShuffleReader extends CustomShuffleReaderRule { val expectedParallelism = advisoryParallelism.getOrElse(numReducers) val splitPoints = if (numMappers == 0) { Seq.empty - } else { - equallyDivide(numReducers, math.max(1, expectedParallelism / numMappers)) + } else if (expectedParallelism >= numMappers) { + equallyDivide(numReducers, expectedParallelism / numMappers) + } + else { + equallyDivide(numMappers, expectedParallelism) + } + if (expectedParallelism >= numMappers) { + (0 until numMappers).flatMap { mapIndex => + (splitPoints :+ numReducers).sliding(2).map { + case Seq(start, end) => PartialMapperPartitionSpec(mapIndex, start, end) + } + } } - (0 until numMappers).flatMap { mapIndex => - (splitPoints :+ numReducers).sliding(2).map { - case Seq(start, end) => PartialMapperPartitionSpec(mapIndex, start, end) + else { + (0 until 1).flatMap { _ => + (splitPoints :+ numMappers).sliding(2).map { + case Seq(start, end) => CoalescedMapperPartitionSpec(start, end, numReducers) Review comment: I'm wondering that if we should have a more meticulous algorithm. Let's say that there are 3 mappers and 2 reducers, so 6 shuffle blocks in total: `(M0, R0), (M0, R1), (M1, R0), (M1, R1), (M2, R0), (M2, R1)`. If the expected parallelism is 2, I think each task should read 3 blocks: task 0: `(M0, R0), (M0, R1), (M1, R0)` task1: `(M1, R1), (M2, R0), (M2, R1)` So one task can read some entire mappers and part of one mapper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org