[ https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
duanmeng updated SPARK-33022: ----------------------------- Description: A data file might be empty even after DiskBlockObjectWriter committing it in BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile used by MapStatus, and then cause data lost. This is related to disk/kernel but we can avoid it in spark without any performance loss. We can compare partitionWriterSegments[i].length with the length[i] after Utils.copyStream. I added some logs and reproduce the issue, The log when this issue happened {code:java} 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 0{code} The peer log when this issue didn't happen {code:java} 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462) 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 1462 {code} was: A data file might be empty even after DiskBlockObjectWriter committing it in BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile used by MapStatus, and then cause data lost. This is related to disk/kernel but we can avoid it in spark without any performance loss. We can compare partitionWriterSegments[i].length with the length[i] after Utils.copyStream. I added some logs and reproduce the issue, The log when this issue happened {code:java} 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 0{code} The peer log when this issue didn't happen {code:java} 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462) 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 1462 {code} > partition length is wrong after merge partition segments in > BypassMergeSortShuffleWriter > ---------------------------------------------------------------------------------------- > > Key: SPARK-33022 > URL: https://issues.apache.org/jira/browse/SPARK-33022 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.6 > Reporter: duanmeng > Priority: Major > > A data file might be empty even after DiskBlockObjectWriter committing it in > BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile > used by MapStatus, and then cause data lost. > This is related to disk/kernel but we can avoid it in spark without any > performance loss. We can compare partitionWriterSegments[i].length with the > length[i] after Utils.copyStream. > I added some logs and reproduce the issue, > The log when this issue happened > {code:java} > 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: > partitionWriterSegments[0]: > (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, > length=1462) > 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 > 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream > length: 0{code} > > The peer log when this issue didn't happen > {code:java} > 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: > partitionWriterSegments[0]: > (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, > length=1462) > 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462 > 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream > length: 1462 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org