steveloughran commented on pull request #2971: URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1059545705
just updated with changes from sseth's review * renamed StoreOperations to ManifestStoreOperations, set scope up. that makes for a change which touches many classes. * lots of other review points, all minor in comparison. + new DirEntry type in manifest for dest dirs only, contains dest and status. Status is always 0, "unknown", for now. I think based on future stats of mkdir performance, we may want to evolve dir preparation with two options. Probing for dest dirs in task commit. no side effects and something we can do in parallel with the listing process. Will allow all probes for dest dirs to be omitted from job commit. There will be duplication in the tasks, but off the critical path/parallelised with the treewalk. Actually attempting to create dest dirs in TaskCommit. as well as being slightly side effecting (but no new files..) we would have to deal with * two task commits clashing. use same recovery as job commit. * file at dest. note and report for job commit to process. mkdir in task is clearly more complex; I will ignore for now and leave for a future iteration based on job stats analysis of real world jobs. getFileStatus is low cost and low complexity. job commit will 1. merge dir list and status 2. those with files: delete and create (do this first) 3. those not present one issue here though: final task commit will be slower; all previous tasks will have repeated the operation. will it actually speed things up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org