GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/21019
[SPARK-23948] Trigger mapstage's job listener in submitMissingTasks ## What changes were proposed in this pull request? SparkContext submitted a map stage from `submitMapStage` to `DAGScheduler`, `markMapStageJobAsFinished` is called only in (); But think about below scenario: 1. stage0 and stage1 are all `ShuffleMapStage` and stage1 depends on stage0; 2. We submit stage1 by `submitMapStage`, there are 10 missing tasks in stage1 3. When stage 1 running, `FetchFailed` happened, stage0 and stage1 got resubmitted as stage0_1 and stage1_1; 4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, but stage1 is not inside `runningStages`. So even though all splits(including the speculated tasks) in stage1 succeeded, job listener in stage1 will not be called; 5. stage0_1 finished, stage1_1 starts running. When `submitMissingTasks`, there is no missing tasks. But in current code, job listener is not triggered ## How was this patch tested? Not added yet. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-23948 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21019 ---- commit 685124a11b789af2a42b4978e25ed404b2a15176 Author: jinxing <jinxing6042@...> Date: 2018-04-10T03:33:02Z [SPARK-23948] Trigger mapstage's job listener in submitMissingTasks ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org