[ 
https://issues.apache.org/jira/browse/SPARK-42714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047961#comment-18047961
 ] 

zuotingbing commented on SPARK-42714:
-------------------------------------

spark-sql> CREATE TABLE test_parquet (a INT, b INT, c INT, d INT) USING PARQUET 
PARTITIONED BY (b, c);

INSERT overwrite TABLE test_parquet PARTITION(b=1, c=1) SELECT 1,2;   // app1

INSERT overwrite TABLE test_parquet PARTITION(b=2, c=2) SELECT 3,4;   // app2

Two apps are writing to the same table but different partitions simultaneously. 
Then, when they commited job/task,  they both cleans up $dest/_temporary  dir 
by Hadoop FileOutputCommitter and they cause the problem.

> Sparksql temporary file conflict
> --------------------------------
>
>                 Key: SPARK-42714
>                 URL: https://issues.apache.org/jira/browse/SPARK-42714
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.2
>            Reporter: hao
>            Priority: Major
>
> When sparksql inserts overwrite, the name of the temporary file in the middle 
> is not unique. This will cause that when multiple applications write 
> different partition data to the same partition table, it will be possible to 
> delete each other's temporary files between applications, resulting in task 
> failure



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to