wankunde opened a new pull request #29197: URL: https://github.com/apache/spark/pull/29197
# What changes were proposed in this pull request? Generally, distributed jobs have two stages of committing files: committing task's output files and committing job's output files. If one attempt fails, another attempt will try to run the task again, after all tasks succeed, the job will commit the output of all tasks. But now if we run a dynamic partition overwrite job, for example, `INSERT OVERWRITE table dst partition(part) SELECT * from src`, then if one of the final stage tasks fails, the job will fail. The first task attempt datawriter in final stage writes the output data directly to spark stage directory.If the first taskattempt fails, the second taskattempt datawriter will fail to setup, because the task's output file is already exists. Then the job will fail. Therefore, I think we should write the temporary data to the task attempt's work directory, and commit result files after the task attempt succeed. ### Why are the changes needed? Bug fix in case one dynamic partition data writer of final stage tasks fails . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org