[ https://issues.apache.org/jira/browse/SPARK-42694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697614#comment-17697614 ]
Bjørn Jørgensen commented on SPARK-42694: ----------------------------------------- Spark 3.1 [is EOL|https://github.com/apache/spark-website/commit/40f58f884bd258d6a332d583dc91c717b6b461f0 ] Try Spark 3.3.2 or 3.2.3 > Data duplication and loss occur after executing 'insert overwrite...' in > Spark 3.1.1 > ------------------------------------------------------------------------------------ > > Key: SPARK-42694 > URL: https://issues.apache.org/jira/browse/SPARK-42694 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.1.1 > Environment: Spark 3.1.1 > Hadoop 3.2.1 > Hive 3.1.2 > Reporter: FengZhou > Priority: Critical > Labels: shuffle, spark > Attachments: image-2023-03-07-15-59-08-818.png, > image-2023-03-07-15-59-27-665.png > > > We are currently using Spark version 3.1.1 in our production environment. We > have noticed that occasionally, after executing 'insert overwrite ... > select', the resulting data is inconsistent, with some data being duplicated > or lost. This issue does not occur all the time and seems to be more > prevalent on large tables with tens of millions of records. > We compared the execution plans for two runs of the same SQL and found that > they were identical. In the case where the SQL was executed successfully, the > amount of data being written and read during the shuffle stage was the same. > However, in the case where the problem occurred, the amount of data being > written and read during the shuffle stage was different. Please see the > attached screenshots for the write/read data during shuffle stage. > > Normal SQL: > !image-2023-03-07-15-59-08-818.png! > SQL with issues: > !image-2023-03-07-15-59-27-665.png! > > Is this problem caused by a bug in version 3.1.1, specifically (SPARK-34534): > 'New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss > or correctness'? Or is it caused by something else? What could be the root > cause of this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org