[jira] [Commented] (SPARK-42694) Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1

liukai (Jira) Wed, 06 Mar 2024 23:08:07 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-42694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824279#comment-17824279
 ]


liukai commented on SPARK-42694:
--------------------------------

I also encountered this situation in spark3.1.1. Setting the parameter 
spark.shuffle.useOldFetchProtocol to true can avoid this situation, but I have 
not found the specific cause.

> Data duplication and loss occur after executing 'insert overwrite...' in 
> Spark 3.1.1
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-42694
>                 URL: https://issues.apache.org/jira/browse/SPARK-42694
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.1.1
>         Environment: Spark 3.1.1
> Hadoop 3.2.1
> Hive 3.1.2
>            Reporter: FengZhou
>            Priority: Critical
>              Labels: shuffle, spark
>         Attachments: image-2023-03-07-15-59-08-818.png, 
> image-2023-03-07-15-59-27-665.png
>
>
> We are currently using Spark version 3.1.1 in our production environment. We 
> have noticed that occasionally, after executing 'insert overwrite ... 
> select', the resulting data is inconsistent, with some data being duplicated 
> or lost. This issue does not occur all the time and seems to be more 
> prevalent on large tables with tens of millions of records.
> We compared the execution plans for two runs of the same SQL and found that 
> they were identical. In the case where the SQL was executed successfully, the 
> amount of data being written and read during the shuffle stage was the same. 
> However, in the case where the problem occurred, the amount of data being 
> written and read during the shuffle stage was different. Please see the 
> attached screenshots for the write/read data during shuffle stage.
>  
> Normal SQL:
> !image-2023-03-07-15-59-08-818.png!
> SQL with issues:
> !image-2023-03-07-15-59-27-665.png!
>  
> Is this problem caused by a bug in version 3.1.1, specifically (SPARK-34534): 
> 'New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss 
> or correctness'? Or is it caused by something else? What could be the root 
> cause of this problem?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42694) Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1

Reply via email to