[ https://issues.apache.org/jira/browse/HBASE-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yechao Chen updated HBASE-21810: -------------------------------- Attachment: HBASE-21810.master.002.patch > bulkload support set hfile compression on client > -------------------------------------------------- > > Key: HBASE-21810 > URL: https://issues.apache.org/jira/browse/HBASE-21810 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 1.3.3, 1.4.9, 1.2.10, 2.0.4, 2.1.3 > Reporter: Yechao Chen > Assignee: Yechao Chen > Priority: Major > Attachments: HBASE-21810.branch-1.001.patch, > HBASE-21810.branch-1.2.001.patch, HBASE-21810.branch-2.001.patch, > HBASE-21810.master.001.patch, HBASE-21810.master.001.patch, > HBASE-21810.master.002.patch > > > hbase bulkload (HFileOutputFormat2) generate hfile ,the compression from the > table(cf) compression, > if the compression can be set on client ,sometimes,it's useful, > some case in our production: > 1、hfile bulkload replication between the data center with bandwidth limit, we > can set the compression of the bulkload hfile not changing the table > compression > 2、bulkload hfile not set compression ,but the table compression is > gz/zstd/snappy... ,can reduce the hfile created time and compaction will make > the hfile to compression finally > 3、somethings the yarn nodes (hfile created by reduce) /dobulkload client has > no compression lib,but the hbase cluster has,it's useful for this case -- This message was sent by Atlassian JIRA (v7.6.3#76005)