[jira] [Updated] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter

2021-07-13 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36121:
---
Description: 
All our ETL scenarios are configured: 
mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
occurs, the stage retry is triggered, and then the zombie stage and the retry 
stage may write tasks of the same part at the same time, and their task 
directory and file name are exactly the same. This may cause data part loss due 
to conflicts between delete and rename operations.

For example, this is also a data loss case I encountered recently: Stage 5.0 is 
a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. 
They have two tasks concurrently writing the same part file: part-00298.
 # The task of stage 5.1 has preemptively created part file: part-00298 and 
written data.
 # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
going to create this part file to write data, because the file already exists, 
it throw an exception and delete the task's temporary directory.
 # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
execute rename. At this time, because the file has been deleted, it finally 
moves empty without any exception, which causes data loss.

 

I read this part of the code, and currently I think of two ideas:
 # Add stageAttemptNumber to taskAttemptPath to avoid conflicts.
 # Check the number of files after commitTask, and throw an exception directly 
when it is found to be missing.

 

 

  was:
All our ETL scenarios are configured: 
mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
occurs, the stage retry is triggered, and then the zombie stage and the retry 
stage may write tasks of the same part at the same time, and their task 
directory and file name are exactly the same. This may cause data part loss due 
to conflicts between file writing and rename operations.

For example, this is also a data loss case I encountered recently: Stage 5.0 is 
a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. 
They have two tasks concurrently writing the same part file: part-00298.
 # The task of stage 5.1 has preemptively created part file: part-00298 and 
written data.
 # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
going to create this part file to write data, because the file already exists, 
it throw an exception and delete the task's temporary directory.
 # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
execute rename. At this time, because the file has been deleted, it finally 
moves empty without any exception, which causes data loss.

 

I read this part of the code, and currently I think of two ideas: 
 # Add stageAttemptNumber to taskAttemptPath to avoid conflicts.
 # Check the number of files after commitTask, and throw an exception directly 
when it is found to be missing.

 

 


> Write data loss caused by stage retry when enable v2 FileOutputCommitter
> 
>
> Key: SPARK-36121
> URL: https://issues.apache.org/jira/browse/SPARK-36121
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.1, 3.0.1
>Reporter: gaoyajun02
>Priority: Critical
>
> All our ETL scenarios are configured: 
> mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
> occurs, the stage retry is triggered, and then the zombie stage and the retry 
> stage may write tasks of the same part at the same time, and their task 
> directory and file name are exactly the same. This may cause data part loss 
> due to conflicts between delete and rename operations.
> For example, this is also a data loss case I encountered recently: Stage 5.0 
> is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry 
> stage. They have two tasks concurrently writing the same part file: 
> part-00298.
>  # The task of stage 5.1 has preemptively created part file: part-00298 and 
> written data.
>  # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
> going to create this part file to write data, because the file already 
> exists, it throw an exception and delete the task's temporary directory.
>  # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
> execute rename. At this time, because the file has been deleted, it finally 
> moves empty without any exception, which causes data loss.
>  
> I read this part of the code, and currently I think of two ideas:
>  # Add stageAttemptNumber to taskAttemptPath to avoid conflicts.
>  # Check the number of files after commitTask, and throw an exception 
> directly when it is found to be missing.
>  
>  



--
This message 

[jira] [Updated] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter

2021-07-13 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36121:
---
Description: 
All our ETL scenarios are configured: 
mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
occurs, the stage retry is triggered, and then the zombie stage and the retry 
stage may write tasks of the same part at the same time, and their task 
directory and file name are exactly the same. This may cause data part loss due 
to conflicts between file writing and rename operations.

For example, this is also a data loss case I encountered recently: Stage 5.0 is 
a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. 
They have two tasks concurrently writing the same part file: part-00298.
 # The task of stage 5.1 has preemptively created part file: part-00298 and 
written data.
 # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
going to create this part file to write data, because the file already exists, 
it throw an exception and delete the task's temporary directory.
 # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
execute rename. At this time, because the file has been deleted, it finally 
moves empty without any exception, which causes data loss.

 

I read this part of the code, and currently I think of two ideas: 
 # Add stageAttemptNumber to taskAttemptPath to avoid conflicts.
 # Check the number of files after commitTask, and throw an exception directly 
when it is found to be missing.

 

 

  was:
All our ETL scenarios are configured:
mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
occurs, the stage retry is triggered, and then the zombie stage and the retry 
stage may write tasks of the same part at the same time, and their task 
directory and file name are exactly the same. This may cause data part loss due 
to conflicts between file writing and rename operations. For example, recently 
encountered a case of data loss:

Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a 
retry stage. They have two tasks concurrently writing the same part file: 
part-00298.
 # The task of stage 5.1 has preemptively created part file: part-00298 and 
written data.
 # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
going to create this part file to write data, because the file already exists, 
it throw an exception and delete the task's temporary directory.
 # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
execute rename. At this time, because the file has been deleted, it finally 
moves without any exception, which causes data loss.


> Write data loss caused by stage retry when enable v2 FileOutputCommitter
> 
>
> Key: SPARK-36121
> URL: https://issues.apache.org/jira/browse/SPARK-36121
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.1, 3.0.1
>Reporter: gaoyajun02
>Priority: Critical
>
> All our ETL scenarios are configured: 
> mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed 
> occurs, the stage retry is triggered, and then the zombie stage and the retry 
> stage may write tasks of the same part at the same time, and their task 
> directory and file name are exactly the same. This may cause data part loss 
> due to conflicts between file writing and rename operations.
> For example, this is also a data loss case I encountered recently: Stage 5.0 
> is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry 
> stage. They have two tasks concurrently writing the same part file: 
> part-00298.
>  # The task of stage 5.1 has preemptively created part file: part-00298 and 
> written data.
>  # At the same time as the task commit of stage 5.1, the task of sage 5.0 is 
> going to create this part file to write data, because the file already 
> exists, it throw an exception and delete the task's temporary directory.
>  # Then stage 5.0 starts commitTask, it will traverse the sub-directories and 
> execute rename. At this time, because the file has been deleted, it finally 
> moves empty without any exception, which causes data loss.
>  
> I read this part of the code, and currently I think of two ideas: 
>  # Add stageAttemptNumber to taskAttemptPath to avoid conflicts.
>  # Check the number of files after commitTask, and throw an exception 
> directly when it is found to be missing.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org