[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravi Prakash updated HADOOP-13114: ---------------------------------- Attachment: HADOOP-13114.06.patch Here's rebase of the patch from Suraj and Yongjun. To try it out, you could use this command: {code} hadoop distcp -Ddistcp.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec --compressoutput /input /output {code} > DistCp should have option to compress data on write > --------------------------------------------------- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp > Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 > Reporter: Suraj Nayak > Assignee: Suraj Nayak > Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org