[ https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714981#comment-17714981 ]
ASF GitHub Bot commented on HADOOP-18706: ----------------------------------------- steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1517764238 looks good, just two issues to worry about minor: checkstyle unhappy about line length...please keep at 100 chars or less One bigger issue, which you already mentioned: excessively long filenames. S3 supports 1024 chars of path so this should work through the other block buffers, and MUST work here too. looking at a table of length, there's 255 chars to play with, including block id, span id etc https://www.baeldung.com/linux/bash-filename-limit How about adding a new test case or modifying testRegularUpload() to create a file with a name > 256 chars just see what happens? Oh, and we have to remember about windows too, though as java apis go through the unicode ones, its 255 char limit doesn't always hold. Maybe the solution is to do some cutting down of paths such that first few and final chars are always preserved. along with span ID that should be good, though it does depend on filenames generated...does accumulo generate sufficiently unique ones that the last, say, 128 chars will be something you can map to an upload? > The temporary files for disk-block buffer aren't unique enough to recover > partial uploads. > ------------------------------------------------------------------------------------------- > > Key: HADOOP-18706 > URL: https://issues.apache.org/jira/browse/HADOOP-18706 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Reporter: Chris Bevard > Priority: Minor > Labels: pull-request-available > > If an application crashes during an S3ABlockOutputStream upload, it's > possible to complete the upload if fast.upload.buffer is set to disk by > uploading the s3ablock file with putObject as the final part of the multipart > upload. If the application has multiple uploads running in parallel though > and they're on the same part number when the application fails, then there is > no way to determine which file belongs to which object, and recovery of > either upload is impossible. > If the temporary file name for disk buffering included the s3 key, then every > partial upload would be recoverable. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org