Marcelo Vanzin created SPARK-20342: -------------------------------------- Summary: DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators Key: SPARK-20342 URL: https://issues.apache.org/jira/browse/SPARK-20342 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Marcelo Vanzin
Hit this on 2.2, but probably has been there forever. This is similar in spirit to SPARK-20205. Event is sent here, around L1154: {code} listenerBus.post(SparkListenerTaskEnd( stageId, task.stageAttemptId, taskType, event.reason, event.taskInfo, taskMetrics)) {code} Accumulators are updated later, around L1173: {code} val stage = stageIdToStage(task.stageId) event.reason match { case Success => task match { case rt: ResultTask[_, _] => // Cast to ResultStage here because it's part of the ResultTask // TODO Refactor this out to a function that accepts a ResultStage val resultStage = stage.asInstanceOf[ResultStage] resultStage.activeJob match { case Some(job) => if (!job.finished(rt.outputId)) { updateAccumulators(event) {code} Same thing applies here; UI shows correct info because it's pointing at the mutable {{TaskInfo}} structure. But the event log, for example, may record the wrong information. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org