Yavgeni Hotimsky created SPARK-24848:
----------------------------------------

             Summary: When a stage fails onStageCompleted is called before 
onTaskEnd
                 Key: SPARK-24848
                 URL: https://issues.apache.org/jira/browse/SPARK-24848
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: Yavgeni Hotimsky


It seems that when a stage fails because one of it's tasks failed too many 
times the onStageCompleted callback of the SparkListener is called before the 
onTaskEnd listener for the failing task. We're using structured streaming in 
this case.

We noticed this because we built a listener to track the precise number of 
active tasks per one of my processes to be exported as a metric and was using 
the stage callback to maintain a map from stage ids to some metadata extracted 
from the jobGroupId. The onStageCompleted listener was removing from the map to 
prevent unbounded memory and in this case I could see the onTaskEnd callback 
was being called after the onStageCompleted callback so it couldn't find the 
stageId in the map. We worked around it by replacing the map with a timed cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to