[GitHub] [hadoop] cbevard1 commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery

via GitHub Tue, 18 Apr 2023 12:40:03 -0700


cbevard1 commented on PR #5563:
URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1513699708


   @steveloughran thanks for your feedback. I've added the span ID to the file 
name as you suggested for better debugging.
   
   > If you really want upload to be recoverable then you need to be able to 
combine blocks on the hard disk with the in-progress multipart upload such that 
you can build finish the upload, build the list of etags and then POST the 
complete operation.
   
   With the part number and key derived from the local file name, I've been 
using calls to `list-mulipart-uploads`/`list-parts` to get the uploadID/ETags 
and complete partial uploads. For single part files I call putObject with the 
key, and for multipart uploads I use the upload ID and part number returned by 
`list-mulipart-uploads`/`list-parts` to submit the local file as the final 
part. The key could exceed an OS's file name char limit though, so I think 
including the span ID is a very good idea.
   
   I know it's not a typical use case to recover a partial upload rather than 
retry the entire file, but it's very helpful with using S3A as the underlying 
file system in Accumulo. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] cbevard1 commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery

Reply via email to