Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/686#discussion_r14040631
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1062,10 +1062,15 @@ class DAGScheduler(
               // This is the only job that uses this stage, so fail the stage 
if it is running.
               val stage = stageIdToStage(stageId)
               if (runningStages.contains(stage)) {
    -            taskScheduler.cancelTasks(stageId, shouldInterruptThread)
    -            val stageInfo = stageToInfos(stage)
    -            stageInfo.stageFailed(failureReason)
    -            
listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage)))
    +            try { // cancelTasks will fail if a SchedulerBackend does not 
implement killTask
    +              taskScheduler.cancelTasks(stageId, shouldInterruptThread)
    +              val stageInfo = stageToInfos(stage)
    +              stageInfo.stageFailed(failureReason)
    +              
listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage)))
    +            } catch {
    +              case e: UnsupportedOperationException =>
    +                logInfo(s"Could not cancel tasks for stage $stageId", e)
    +            }
    --- End diff --
    
    Hmmm... not sure that I agree.  A job being cancelled, stages being 
cancelled, and tasks being cancelled are all different things.  The expectation 
is that job cancellation will lead to cancellation of independent stages and 
their associated tasks; but if no stages and tasks get cancelled, it's probably 
still worthwhile for the information to be sent that the job itself was 
cancelled.  I expect that eventually all of the backends will support task 
killing, so this whole no-kill path should never be hit.  But moving the job 
cancellation notification within the try-to-cancelTasks block will result in 
multiple notifications that the parent job was cancelled -- one for each 
independent stage cancellation.  Or am I misunderstanding what you are 
suggesting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to