Marcelo Vanzin created SPARK-20342:
--------------------------------------

             Summary: DAGScheduler sends SparkListenerTaskEnd before updating 
task's accumulators
                 Key: SPARK-20342
                 URL: https://issues.apache.org/jira/browse/SPARK-20342
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: Marcelo Vanzin


Hit this on 2.2, but probably has been there forever. This is similar in spirit 
to SPARK-20205.

Event is sent here, around L1154:

{code}
    listenerBus.post(SparkListenerTaskEnd(
       stageId, task.stageAttemptId, taskType, event.reason, event.taskInfo, 
taskMetrics))
{code}

Accumulators are updated later, around L1173:

{code}
    val stage = stageIdToStage(task.stageId)
    event.reason match {
      case Success =>
        task match {
          case rt: ResultTask[_, _] =>
            // Cast to ResultStage here because it's part of the ResultTask
            // TODO Refactor this out to a function that accepts a ResultStage
            val resultStage = stage.asInstanceOf[ResultStage]
            resultStage.activeJob match {
              case Some(job) =>
                if (!job.finished(rt.outputId)) {
                  updateAccumulators(event)
{code}

Same thing applies here; UI shows correct info because it's pointing at the 
mutable {{TaskInfo}} structure. But the event log, for example, may record the 
wrong information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to