I noticed that HBase has the "HFleInputFormat" now, which can directly read
the HFile to KV for map-reduce job:

https://github.com/apache/hbase/blob/master/hbase-
mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/
HFileInputFormat.java

The "MapReduceHFileSplitterJob" is a sample job with this input format.

With this feature, it is possible to merge the segments directly over HFile
instead of from Kylin's cuboid files, and without going through HBase
server. The cuboid files can be removed after a build, that can reduce lots
of storage space.

Does anyone want to investigate this? We welcome community contributions.

-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to