[ https://issues.apache.org/jira/browse/SPARK-37473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
haiyangyu updated SPARK-37473: ------------------------------ Description: !image-2021-11-27-15-07-41-516.png|width=715,height=484! We think it has no data when the segment file not exists when all segment files produced by `BypassMergeSortShuffleWriter` is merging; However, `file.exists()` may rerurn `false` when then the disk which segment file in on is missing and the root catagory exists; the missing disk only lead `file.exists()` return `false` but no exception. The task will run in pease and with no current segment file written. The segment data will be ignored and leading shuffle data loss. was: !image-2021-11-27-15-07-41-516.png|width=715,height=484! We think it has no data when the segment file not exists when all segment produced by `BypassMergeSortShuffleWriter` is merging; However, `file.exists()` may rerurn `false` when then the disk which segment file in on is missing and the root catagory exists; the missing disk only lead `file.exists()` return `false` but no exception. The task will run in pease and with no current segment file written. The segment data will be ignored and leading shuffle data loss. > BypassMergeSortShuffleWriter may loss data when disk is missing however > catagory is present > ------------------------------------------------------------------------------------------- > > Key: SPARK-37473 > URL: https://issues.apache.org/jira/browse/SPARK-37473 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 2.4.0, 2.4.1, 2.4.8, 3.0.0, 3.2.0 > Reporter: haiyangyu > Priority: Major > Attachments: image-2021-11-27-15-07-41-516.png > > > !image-2021-11-27-15-07-41-516.png|width=715,height=484! > We think it has no data when the segment file not exists when all segment > files produced by `BypassMergeSortShuffleWriter` is merging; > However, `file.exists()` may rerurn `false` when then the disk which segment > file in on is missing and the root catagory exists; the missing disk only > lead `file.exists()` return `false` but no exception. The task will run in > pease and with no current segment file written. > The segment data will be ignored and leading shuffle data loss. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org