Github user artemrd commented on the issue: https://github.com/apache/spark/pull/21114 Just a long-running job and memory pressure is not enough. You need to have several attempts for a stage, each new attempt will update Stage._latestInfo, so previous StageInfo and it's accumulators can be GCed. After this AccumulatorContext.get() throws an exception until GCed accumulators are removed by ContextCleaner. It's also important to send an accumulator update for an old attempt before all tasks are finished, otherwise the stage will be marked as completed, removed from DAGScheduler.stageIdToStage and DAGScheduler.handleTaskSetFailed() will be ignored.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org