[ https://issues.apache.org/jira/browse/SPARK-41094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
pengfei zhao updated SPARK-41094: --------------------------------- Affects Version/s: 3.3.1 (was: 2.4.4) > The saveAsTable method fails to be executed, resulting in data file loss > ------------------------------------------------------------------------ > > Key: SPARK-41094 > URL: https://issues.apache.org/jira/browse/SPARK-41094 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.3.1 > Reporter: pengfei zhao > Priority: Major > > We have a problem in the production environment. > The code is: df.write.mode(SaveMode.Overwrite).saveAsTable("xxx"). > When the saveAsTable method is executing, an executor exits due to OOM, > causing half of the data file to be written on the hdfs, but subsequent spark > retries fail again. > It is very similar to the scenario described in SPARK-22504, but it really > happened. > I read the source code. Why does Spark need to delete the table first and > then execute the plan? What if the execution fails after deleting the table? > I know the attitude of the community, but this method of deleting tables > first is too risky. Can we adopt the following processing methods like Hive, > 1. WRITE: create and write data to tempTable > 2. SWAP: swap temptable1 with targetTable by using rename operation > 3. CLEAN: clean up old data -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org