[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866966#comment-17866966 ]
Steve Loughran commented on SPARK-48292: ---------------------------------------- what happens if a TA is authorized to commit, but doesn't return? as a network partition can trigger this. the output file may appear consistent with the committed task after a second tasks is told to commit its TA, but the partitioned TA may commit later? the core mapreduce commit protocols say "exactly one of the TAs shall have its output committed" but don't guarantee it is the second one > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > ------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0 > Reporter: L. C. Hsieh > Assignee: angerszhu > Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.4 > > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org