[jira] [Updated] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

2020-09-29 Thread duanmeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

duanmeng updated SPARK-33022:
-
Description: 
The segment file may be empty even after writer committing caused by 
disk/kernel in heavy load cluster. Compare segment length with copied length to 
guard it.

A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost.

This is related to disk/kernel but we can avoid it in spark without any 
performance loss. We can compare partitionWriterSegments[i].length with the 
length[i] after Utils.copyStream.

I added some logs and reproduce the issue,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 

  was:
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost.

This is related to disk/kernel but we can avoid it in spark without any 
performance loss. We can compare partitionWriterSegments[i].length with the 
length[i] after Utils.copyStream.

I added some logs and reproduce the issue,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 


> partition length is wrong after merge partition segments in 
> BypassMergeSortShuffleWriter
> 
>
> Key: SPARK-33022
> URL: https://issues.apache.org/jira/browse/SPARK-33022
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: duanmeng
>Priority: Major
>
> The segment file may be empty even after writer committing caused by 
> disk/kernel in heavy load cluster. Compare segment length with copied length 
> to guard it.
> A data file might be empty even after  DiskBlockObjectWriter committing it in 
> BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
> used by MapStatus, and then cause data lost.
> This is related to disk/kernel but we can avoid it in spark without any 
> performance loss. We can compare partitionWriterSegments[i].length with the 
> length[i] after Utils.copyStream.
> I added some logs and reproduce the issue,
> The log when this issue happened
> {code:java}
> 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, 
> length=1462) 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 0{code}
>  
> The peer log when this issue didn't happen
> {code:java}
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, 
> length=1462)
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 1462
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

2020-09-29 Thread duanmeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

duanmeng updated SPARK-33022:
-
Description: 
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost.

This is related to disk/kernel but we can avoid it in spark without any 
performance loss. We can compare partitionWriterSegments[i].length with the 
length[i] after Utils.copyStream.

I added some logs and reproduce the issue,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 

  was:
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost. This is related to disk/kernel but 
we can avoid it in spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and reproduce the issue,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 


> partition length is wrong after merge partition segments in 
> BypassMergeSortShuffleWriter
> 
>
> Key: SPARK-33022
> URL: https://issues.apache.org/jira/browse/SPARK-33022
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: duanmeng
>Priority: Major
>
> A data file might be empty even after  DiskBlockObjectWriter committing it in 
> BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
> used by MapStatus, and then cause data lost.
> This is related to disk/kernel but we can avoid it in spark without any 
> performance loss. We can compare partitionWriterSegments[i].length with the 
> length[i] after Utils.copyStream.
> I added some logs and reproduce the issue,
> The log when this issue happened
> {code:java}
> 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, 
> length=1462) 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 0{code}
>  
> The peer log when this issue didn't happen
> {code:java}
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, 
> length=1462)
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 1462
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

2020-09-29 Thread duanmeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

duanmeng updated SPARK-33022:
-
Description: 
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost. This is related to disk/kernel but 
we can avoid it in spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and reproduce the issue,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 

  was:
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost. This is related to disk/kernel but 
we can avoid it in spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 


> partition length is wrong after merge partition segments in 
> BypassMergeSortShuffleWriter
> 
>
> Key: SPARK-33022
> URL: https://issues.apache.org/jira/browse/SPARK-33022
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: duanmeng
>Priority: Major
>
> A data file might be empty even after  DiskBlockObjectWriter committing it in 
> BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
> used by MapStatus, and then cause data lost. This is related to disk/kernel 
> but we can avoid it in spark without any performance loss. We can compare 
> partitionWriterSegments[i].length with the length[i] after Utils.copyStream.
> I added some logs and reproduce the issue,
> The log when this issue happened
> {code:java}
> 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, 
> length=1462) 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 0{code}
>  
> The peer log when this issue didn't happen
> {code:java}
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, 
> length=1462)
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 1462
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

2020-09-29 Thread duanmeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

duanmeng updated SPARK-33022:
-
Description: 
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost. This is related to disk/kernel but 
we can avoid it in spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 

  was:
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile, 
and then cause data lost. This is related to disk/kernel but we can avoid it in 
spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 


> partition length is wrong after merge partition segments in 
> BypassMergeSortShuffleWriter
> 
>
> Key: SPARK-33022
> URL: https://issues.apache.org/jira/browse/SPARK-33022
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: duanmeng
>Priority: Major
>
> A data file might be empty even after  DiskBlockObjectWriter committing it in 
> BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
> used by MapStatus, and then cause data lost. This is related to disk/kernel 
> but we can avoid it in spark without any performance loss. We can compare 
> partitionWriterSegments[i].length with the length[i] after Utils.copyStream.
> I added some logs and caught the failure,
> The log when this issue happened
> {code:java}
> 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, 
> length=1462) 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 0{code}
>  
> The peer log when this issue didn't happen
> {code:java}
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, 
> length=1462)
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 1462
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

2020-09-29 Thread duanmeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

duanmeng updated SPARK-33022:
-
Description: 
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile, 
and then cause data lost. This is related to disk/kernel but we can avoid it in 
spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 

  was:
A data file might be empty even after  DiskBlockObjectWriter committing it in 
BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile, 
and then cause data lost. This is related to disk/kernel but we can avoid it in 
spark without any performance loss. We can compare 
partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 
1462
{code}
 

 


> partition length is wrong after merge partition segments in 
> BypassMergeSortShuffleWriter
> 
>
> Key: SPARK-33022
> URL: https://issues.apache.org/jira/browse/SPARK-33022
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: duanmeng
>Priority: Major
>
> A data file might be empty even after  DiskBlockObjectWriter committing it in 
> BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile, 
> and then cause data lost. This is related to disk/kernel but we can avoid it 
> in spark without any performance loss. We can compare 
> partitionWriterSegments[i].length with the length[i] after Utils.copyStream.
> I added some logs and caught the failure,
> The log when this issue happened
> {code:java}
> 20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, 
> length=1462) 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
> 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 0{code}
>  
> The peer log when this issue didn't happen
> {code:java}
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
> partitionWriterSegments[0]: 
> (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, 
> length=1462)
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
> 20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
> length: 1462
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org