[ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713742#comment-17713742
 ] 

ASF GitHub Bot commented on HADOOP-18706:
-----------------------------------------

cbevard1 commented on PR #5563:
URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1513699708

   @steveloughran thanks for your feedback. I've added the span ID to the file 
name as you suggested for better debugging.
   
   > If you really want upload to be recoverable then you need to be able to 
combine blocks on the hard disk with the in-progress multipart upload such that 
you can build finish the upload, build the list of etags and then POST the 
complete operation.
   
   With the part number and key derived from the local file name, I've been 
using calls to `list-mulipart-uploads`/`list-parts` to get the uploadID/ETags 
and complete partial uploads. For single part files I call putObject with the 
key, and for multipart uploads I use the upload ID and part number returned by 
`list-mulipart-uploads`/`list-parts` to submit the local file as the final 
part. The key could exceed an OS's file name char limit though, so I think 
including the span ID is a very good idea.
   
   I know it's not a typical use case to recover a partial upload rather than 
retry the entire file, but it's very helpful with using S3A as the underlying 
file system in Accumulo. 




> The temporary files for disk-block buffer aren't unique enough to recover 
> partial uploads. 
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Chris Bevard
>            Priority: Minor
>              Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to