[ https://issues.apache.org/jira/browse/HBASE-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HBASE-9815 started by Manukranth Kolloju. > Add Histogram representative of row key distribution inside a region. > --------------------------------------------------------------------- > > Key: HBASE-9815 > URL: https://issues.apache.org/jira/browse/HBASE-9815 > Project: HBase > Issue Type: New Feature > Components: HFile > Affects Versions: 0.89-fb > Reporter: Manukranth Kolloju > Assignee: Manukranth Kolloju > Fix For: 0.89-fb > > > Using histogram information, users can parallelize the scan workload into > equal sized scans based on the estimated size from the Histogram information. > This will help in enabling systems which are trying to perform queries on top > of HBase to do cost based optimization while scanning. The Idea is to keep > this histogram information in the HFile in the trailer and populate this on > compaction and flush. > The HRegionInterface can expose an API to return the Histogram information of > a region, which can be generated by merging histograms of all the hfiles. > Implementing the histogram on the basis of > http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf > http://dl.acm.org/citation.cfm?id=1951376 > and NumericHistogram from hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)