liupengcheng created SPARK-26634: ------------------------------------ Summary: OutputCommitCoordinator may allow task of FetchFailureStage commit again Key: SPARK-26634 URL: https://issues.apache.org/jira/browse/SPARK-26634 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 2.1.0 Reporter: liupengcheng
In our production spark cluster, we encoutered a case that the task of retry stage due to FetchFailure is denied to commit. However, the task is the first attempt of this retry stage. After carefully investigating, it was found that the call of canCommit of OutputCommitCoordinator would allow the task of FetchFailure stage(with the same parition number as new task of retry stage) commit. which result in the TaskCommitDenied for all the task of retry stage. This is a correctness bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org