steveloughran commented on pull request #2971:
URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1059545705


   just updated with changes from sseth's review
   
   * renamed StoreOperations to ManifestStoreOperations, set scope up.
     that makes for a change which touches many classes.
   * lots of other review points, all minor in comparison.
   
   + new DirEntry type in manifest for dest dirs only, contains
   dest and status. Status is always 0, "unknown", for now.
   
   I think based on future stats of mkdir performance, we may want to
   evolve dir preparation with two options.
   
   Probing for dest dirs in task commit. no side effects and something we
   can do in parallel with the listing process. Will allow all probes for
   dest dirs to be omitted from job commit. There will be duplication
   in the tasks, but off the critical path/parallelised with the treewalk.
   
   Actually attempting to create dest dirs in TaskCommit. as well as being
   slightly side effecting (but no new files..) we would have to deal with
   * two task commits clashing. use same recovery as job commit.
   * file at dest. note and report for job commit to process.
   
   mkdir in task is clearly more complex; I will ignore for now and
   leave for a future iteration based on job stats analysis of real
   world jobs.
   getFileStatus is low cost and low complexity. 
   
   job commit will 
   1. merge dir list and status
   2. those with files: delete and create (do this first)
   3. those not present
   
   one issue here though: final task commit will be slower; all previous tasks 
will have repeated the operation.
   will it actually speed things up?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to