liupengcheng created SPARK-26634:
------------------------------------

             Summary: OutputCommitCoordinator may allow task of 
FetchFailureStage commit again
                 Key: SPARK-26634
                 URL: https://issues.apache.org/jira/browse/SPARK-26634
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0, 2.1.0
            Reporter: liupengcheng


In our production spark cluster, we encoutered a case that the task of retry 
stage due to FetchFailure is denied to commit. However, the task is the first 
attempt of this retry stage.

After carefully investigating, it was found that the call of canCommit of 
OutputCommitCoordinator would allow the task of FetchFailure stage(with the 
same parition number as new task of retry stage) commit. which result in the 
TaskCommitDenied for all the task of retry stage. This is a correctness bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to