[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suraj Nayak updated HADOOP-13114: --------------------------------- Affects Version/s: 3.0.0 Status: Patch Available (was: Open) > DistCp should have option to compress data on write > --------------------------------------------------- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 3.0.0 > Reporter: Suraj Nayak > Assignee: Suraj Nayak > Priority: Minor > Labels: distcp > Fix For: 3.0.0 > > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org