[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1536085389 thanks, merged to trunk. can you cherrypick locally to branch-3.3 rerun the tests and then put that up as a new PR? I will then merge that. FWIW I still think what you are trying to do for recovery is "bold", the rock climbing "I wonder if they will survive" meaning of the word, but the filename tracking could be useful in other ways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1535241489 thanks for the run. final bit of outstanding style. i know there are lots already, but new code should always start well, whenever possible ``` ./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:114: diskBlockFactory.create("spanId", s3Key, 1, blockSize, null);: 'diskBlockFactory' has incorrect indentation level 12, expected level should be 13. [Indentation] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1533527822 can you stop rebasing this during the review process. it makes it *impossible* for me to use the "review changes since your last commit" once you have confirmed that you are going to stop, I will review the PR again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1517764238 looks good, just two issues to worry about minor: checkstyle unhappy about line length...please keep at 100 chars or less One bigger issue, which you already mentioned: excessively long filenames. S3 supports 1024 chars of path so this should work through the other block buffers, and MUST work here too. looking at a table of length, there's 255 chars to play with, including block id, span id etc https://www.baeldung.com/linux/bash-filename-limit How about adding a new test case or modifying testRegularUpload() to create a file with a name > 256 chars just see what happens? Oh, and we have to remember about windows too, though as java apis go through the unicode ones, its 255 char limit doesn't always hold. Maybe the solution is to do some cutting down of paths such that first few and final chars are always preserved. along with span ID that should be good, though it does depend on filenames generated...does accumulo generate sufficiently unique ones that the last, say, 128 chars will be something you can map to an upload? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1516264118 ok, and they all work? good to know. Don't be afraid to mention any which do fail, as they are often from different developer config, and its good to find out what is wrong with an existing test (or worse, production code) before we ship... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #5563: HADOOP-18706: Improve S3ABlockOutputStream recovery
steveloughran commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1514770186 OK. I think the design is incomplete as it is, you would really want to be a bit more sophisticated and * write the .pending multipart manifest to the temp dir as soon as the multipart is created * update it after every block is written * and don't allow more than one block to be written at a time (this is done for S3-CSE) already but this bit seems ready to go in, low risk and potentially useful for others. now, test policy. which aws s3 region did you run the full "mvn verify" tests for the hadoop-aws module, and what options did you have on the command line? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org