cbevard1 commented on PR #5563: URL: https://github.com/apache/hadoop/pull/5563#issuecomment-1517915320
> minor: checkstyle unhappy about line length...please keep at 100 chars or less No problem, will do. > How about adding a new test case or modifying testRegularUpload() to create a file with a name > 256 chars just see what happens? Initially I had some code in the S3ADataBlock to trim the key if the file name was nearing 255 chars, but after testing what would happen with a massive S3 key passed to `File.createTempFile()` I noticed that it would automatically truncate the end of the prefix to fit within the FS's char limit with the appended the random number and suffix (".tmp" if one wasn't specified). I'll add a unit test like you suggested so we can detect if `createTempFile` ever stops truncating names that exceed the FS's max length . Here's a snippet from the javadocs for `createTempFile()` that describes how prefixes are handled > "To create the new file, the prefix and the suffix may first be adjusted to fit the limitations of the underlying platform. If the prefix is too long then it will be truncated, but its first three characters will always be preserved." > Maybe the solution is to do some cutting down of paths such that first few and final chars are always preserved. along with span ID that should be good, though it does depend on filenames generated...does accumulo generate sufficiently unique ones that the last, say, 128 chars will be something you can map to an upload? With Accumulo, it's the WALs that are important to recover and they're named with a UUID, so they're very unique even without a prefix. The full WAL key is built as such `bucket+instance_volume(folder/prefix)+tserver_hostname+UUID`, so avoiding the 255 char limit is pretty doable if you keep your bucket and instance volume names short. The longest file name I've seen so far in my testing with the spanId now included was 175 chars. If I removed the S3 key prefix from the name and kept just the UUID, that would be a little safer for Accumulo, but could cause issues with other systems where prefix is more important contextually than the name. I lean towards leaving the naming convention as it is (s3ablock-PARTNUM-SPAN_ID-ESCAPED_S3_KEY-RANDOM_NUM.tmp). Even if the UUID was truncated, just a few chars along with the part number will almost always be enough to uniquely match it to a pending Multipart Upload. If the key gets trimmed enough that it's not possible to derive the recovery i nformation you need from the file name, then there's the logs and span ID to fall back on. I'll get that test added and correct the style issue. Let me know if you'd like me to make any changes the file name that's generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org