pengfei zhao created SPARK-41094:
------------------------------------

             Summary: The saveAsTable method fails to be executed, resulting in 
data file loss
                 Key: SPARK-41094
                 URL: https://issues.apache.org/jira/browse/SPARK-41094
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.4
            Reporter: pengfei zhao


We have a problem in the production environment. 
The code is: df.write.mode(SaveMode.Overwrite).saveAsTable("xxx").
When the saveAsTable method is executing, an executor exits due to OOM, causing 
half of the data file to be written on the hdfs, but subsequent spark retries 
fail again. 

It is very similar to the scenario described in SPARK-22504, but it really 
happened.

I read the source code. Why does Spark need to delete the table first and then 
execute the plan? What if the execution fails after deleting the table?

I know the attitude of the community, but this method of deleting tables first 
is too risky. Can we adopt the following processing methods like Hive,
1. WRITE: create and write data to tempTable
2. SWAP: swap temptable1 with targetTable by using rename operation
3. CLEAN: clean up old data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to