[
https://issues.apache.org/jira/browse/SPARK-42714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047961#comment-18047961
]
zuotingbing commented on SPARK-42714:
-------------------------------------
spark-sql> CREATE TABLE test_parquet (a INT, b INT, c INT, d INT) USING PARQUET
PARTITIONED BY (b, c);
INSERT overwrite TABLE test_parquet PARTITION(b=1, c=1) SELECT 1,2; // app1
INSERT overwrite TABLE test_parquet PARTITION(b=2, c=2) SELECT 3,4; // app2
Two apps are writing to the same table but different partitions simultaneously.
Then, when they commited job/task, they both cleans up $dest/_temporary dir
by Hadoop FileOutputCommitter and they cause the problem.
> Sparksql temporary file conflict
> --------------------------------
>
> Key: SPARK-42714
> URL: https://issues.apache.org/jira/browse/SPARK-42714
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.2
> Reporter: hao
> Priority: Major
>
> When sparksql inserts overwrite, the name of the temporary file in the middle
> is not unique. This will cause that when multiple applications write
> different partition data to the same partition table, it will be possible to
> delete each other's temporary files between applications, resulting in task
> failure
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]