cloud-fan commented on a change in pull request #33310:
URL: https://github.com/apache/spark/pull/33310#discussion_r669399284



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala
##########
@@ -85,12 +85,24 @@ object OptimizeLocalShuffleReader extends 
CustomShuffleReaderRule {
     val expectedParallelism = advisoryParallelism.getOrElse(numReducers)
     val splitPoints = if (numMappers == 0) {
       Seq.empty
-    } else {
-      equallyDivide(numReducers, math.max(1, expectedParallelism / numMappers))
+    } else if (expectedParallelism >= numMappers) {
+      equallyDivide(numReducers, expectedParallelism / numMappers)
+    }
+    else {
+      equallyDivide(numMappers, expectedParallelism)
+    }
+    if (expectedParallelism >= numMappers) {
+      (0 until numMappers).flatMap { mapIndex =>
+        (splitPoints :+ numReducers).sliding(2).map {
+          case Seq(start, end) => PartialMapperPartitionSpec(mapIndex, start, 
end)
+        }
+      }
     }
-    (0 until numMappers).flatMap { mapIndex =>
-      (splitPoints :+ numReducers).sliding(2).map {
-        case Seq(start, end) => PartialMapperPartitionSpec(mapIndex, start, 
end)
+    else {
+      (0 until 1).flatMap { _ =>
+        (splitPoints :+ numMappers).sliding(2).map {
+          case Seq(start, end) => CoalescedMapperPartitionSpec(start, end, 
numReducers)

Review comment:
       I'm wondering that if we should have a more meticulous algorithm.
   
   Let's say that there are 3 mappers and 2 reducers, so 6 shuffle blocks in 
total: `(M0, R0), (M0, R1), (M1, R0), (M1, R1), (M2, R0), (M2, R1)`. If the 
expected parallelism is 2, I think each task should read 3 blocks:
   task 0: `(M0, R0), (M0, R1), (M1, R0)`
   task1: `(M1, R1), (M2, R0), (M2, R1)`
   
   So one task can read some entire mappers and part of one mapper.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to