steveloughran commented on PR #5519:
URL: https://github.com/apache/hadoop/pull/5519#issuecomment-1495748445

   @cnauroth -thanks for the comments; will update
   
   I've converted this to a draft as I am working on the next step of this: 
streaming the list of files to rename from each manifest into a SequenceFile 
saved to the local FS; rename stage reading that in and spreading the renames 
across the worker pool, maybe in batches.
   
   this will  eliminate the need to store the list of files to rename in memory 
at all and so not worry about #of files or path lengths. the file will be on 
localfs, so on an SSD machine fairly quick to write and read back, especially 
if the os buffers well/is optimised for transient files.
   does complicate propagation of data, hence the extra work and the need for 
some more tests, including some of the save/restore process itself


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to