[jira] [Updated] (HADOOP-18706) Improve S3ABlockOutputStream recovery

Steve Loughran (Jira) Thu, 18 May 2023 13:16:05 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-18706:
------------------------------------
    Description: 
If an application crashes during an S3ABlockOutputStream upload, it's possible 
to complete the upload if fast.upload.buffer is set to disk by uploading the 
s3ablock file with putObject as the final part of the multipart upload. If the 
application has multiple uploads running in parallel though and they're on the 
same part number when the application fails, then there is no way to determine 
which file belongs to which object, and recovery of either upload is impossible.

If the temporary file name for disk buffering included the s3 key, then every 
partial upload would be recoverable.

h3. Important disclaimer

This change does not directly add the Syncable semantics which applications 
that require {{Syncable.hsync()}} to only return after all pending data has 
been durably written to the destination path. S3 is not a filesystem and this 
change does not make it so.

What is does do is assist anyone trying to implement some post-crash recovery 
process which
# interrogates s3 to identofy pending uploads to a specific path and get a list 
of uploaded blocks yet to be committed
# scans the local fs.s3a.buffer dir directories to identify in-progress-write 
blocks for the same target destination. That is those which were being 
uploaded, queued for uploaded and the single "new data being written to" block 
for an output stream
# uploads all those pending blocks
# generates a new POST to complete a multipart upload with all the blocks in 
the correct order

All this patch does is ensure the buffered block filenames include the final 
path and block ID, to aid in identify which blocks need to be uploaded and what 
order. 

h2. warning
causes HADOOP-18744 -always include the relevant fix when backporting

  was:
If an application crashes during an S3ABlockOutputStream upload, it's possible 
to complete the upload if fast.upload.buffer is set to disk by uploading the 
s3ablock file with putObject as the final part of the multipart upload. If the 
application has multiple uploads running in parallel though and they're on the 
same part number when the application fails, then there is no way to determine 
which file belongs to which object, and recovery of either upload is impossible.

If the temporary file name for disk buffering included the s3 key, then every 
partial upload would be recoverable.

h3. Important disclaimer

This change does not directly add the Syncable semantics which applications 
that require {{Syncable.hsync()}} to only return after all pending data has 
been durably written to the destination path. S3 is not a filesystem and this 
change does not make it so.

What is does do is assist anyone trying to implement some post-crash recovery 
process which
# interrogates s3 to identofy pending uploads to a specific path and get a list 
of uploaded blocks yet to be committed
# scans the local fs.s3a.buffer dir directories to identify in-progress-write 
blocks for the same target destination. That is those which were being 
uploaded, queued for uploaded and the single "new data being written to" block 
for an output stream
# uploads all those pending blocks
# generates a new POST to complete a multipart upload with all the blocks in 
the correct order

All this patch does is ensure the buffered block filenames include the final 
path and block ID, to aid in identify which blocks need to be uploaded and what 
order. 


> Improve S3ABlockOutputStream recovery
> -------------------------------------
>
>                 Key: HADOOP-18706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Chris Bevard
>            Assignee: Chris Bevard
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.
> h3. Important disclaimer
> This change does not directly add the Syncable semantics which applications 
> that require {{Syncable.hsync()}} to only return after all pending data has 
> been durably written to the destination path. S3 is not a filesystem and this 
> change does not make it so.
> What is does do is assist anyone trying to implement some post-crash recovery 
> process which
> # interrogates s3 to identofy pending uploads to a specific path and get a 
> list of uploaded blocks yet to be committed
> # scans the local fs.s3a.buffer dir directories to identify in-progress-write 
> blocks for the same target destination. That is those which were being 
> uploaded, queued for uploaded and the single "new data being written to" 
> block for an output stream
> # uploads all those pending blocks
> # generates a new POST to complete a multipart upload with all the blocks in 
> the correct order
> All this patch does is ensure the buffered block filenames include the final 
> path and block ID, to aid in identify which blocks need to be uploaded and 
> what order. 
> h2. warning
> causes HADOOP-18744 -always include the relevant fix when backporting



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18706) Improve S3ABlockOutputStream recovery

Reply via email to