steveloughran commented on PR #5519: URL: https://github.com/apache/hadoop/pull/5519#issuecomment-1495748445
@cnauroth -thanks for the comments; will update I've converted this to a draft as I am working on the next step of this: streaming the list of files to rename from each manifest into a SequenceFile saved to the local FS; rename stage reading that in and spreading the renames across the worker pool, maybe in batches. this will eliminate the need to store the list of files to rename in memory at all and so not worry about #of files or path lengths. the file will be on localfs, so on an SSD machine fairly quick to write and read back, especially if the os buffers well/is optimised for transient files. does complicate propagation of data, hence the extra work and the need for some more tests, including some of the save/restore process itself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org