[ https://issues.apache.org/jira/browse/MAPREDUCE-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mayank Bansal reassigned MAPREDUCE-6713: ---------------------------------------- Assignee: Mayank Bansal > Distcp doesn't provide any option to override the default staging directory > --------------------------------------------------------------------------- > > Key: MAPREDUCE-6713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6713 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp > Affects Versions: 2.5.1 > Reporter: Mohammad Kamrul Islam > Assignee: Mayank Bansal > > *Current state and shortcoming* > ======================= > By default, distcp writes temporary files into > $TARGET_PATH/.distcp.tmp/$taskatttempttid. (See > RetriableFileCopyCommand#getTmpFile). There is no way a user can override > this staging/tmp directory. The problem is obvious in S3 with versioning. For > example, user wants to turn on S3 versioning only for his target directory > but not the staging/tmp directory. Current distcp also creates versioning for > staging directory which can contain a lot of temporary files. If user can > override this path by a non-versioned S3 path for staging, it will make > things cleaner. > > *Proposed solution* > ============== > Provide a new option(-stage) where user can optionally provide a path from > target FS. Distcp mapper tasks will write distcp temporary files into that > directory. > *Possible Confusions* > ================= > There is another distcp option (-tmp) which can be assumed to serve the same > purpose. But this option works only with "-atomic" option which has a > different meaning of temporary files. > Another confusion could be the staging directory used by mapreduce framework. > The proposed temp directory is for distcp specific. > Working on a patch to upload. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org