[GitHub] [spark] mridulm commented on a diff in pull request #38371: [SPARK-40968] Fix a few wrong/misleading comments in DAGSchedulerSuite

GitBox Sun, 30 Oct 2022 21:44:27 -0700


mridulm commented on code in PR #38371:
URL: https://github.com/apache/spark/pull/38371#discussion_r1009018499



##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -3089,13 +3089,14 @@ class DAGSchedulerSuite extends SparkFunSuite with 
TempLocalSparkContext with Ti
     submit(finalRdd, Array(0, 1), properties = new Properties())
 
     // Finish the first 2 shuffle map stages.
-    completeShuffleMapStageSuccessfully(0, 0, 2)
+    completeShuffleMapStageSuccessfully(0, 0, 2, Seq("hostA", "hostB"))

Review Comment:
   This change is not required.
   Fetch failed is due to stage 1 partition on hostB going missing - by 
default, `completeShuffleMapStageSuccessfully` will progressively complete on 
hostA, hostB, etc ... - it will result in recomputing 0 (since there are two 
partitions - on hostA and hostB) and 1 (due to fetch failure) - and 2 ofcourse.
   
   In this case, since there is output on hostB for stage 0 (partition 1) and 
stage 1 (partition 0) , they are recomputed.
   
   If it is confusing, we can add this to the javadoc of 
`completeShuffleMapStageSuccessfully`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #38371: [SPARK-40968] Fix a few wrong/misleading comments in DAGSchedulerSuite

Reply via email to