Ishan Chhabra created HBASE-10323: ------------------------------------- Summary: Auto detect data block encoding in HFileOutputFormat Key: HBASE-10323 URL: https://issues.apache.org/jira/browse/HBASE-10323 Project: HBase Issue Type: Improvement Reporter: Ishan Chhabra Assignee: Ishan Chhabra
Currently, one has to specify the data block encoding of the table explicitly using the config parameter "hbase.mapreduce.hfileoutputformat.datablock.encoding" when doing a bulkload load. This option is easily missed, not documented and also works differently than compression, block size and bloom filter type, which are auto detected. The solution would be to add support to auto detect datablock encoding similar to other parameters. The current patch does the following: 1. Automatically detects datablock encoding in HFileOutputFormat. 2. Keeps the legacy option of manually specifying the datablock encoding around as a method to override auto detections. 3. Moves string conf parsing to the start of the program so that it fails fast during starting up instead of failing during record writes. It also makes the internals of the program type safe. 4. Adds missing doc strings and unit tests for code serializing and deserializing config paramerters for bloom filer type, block size and datablock encoding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)