[ 
https://issues.apache.org/jira/browse/HBASE-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-9815:
--------------------------------------

    Attachment: Histogram-9815.diff

Attaching the implementation based on the above paper.

> Add Histogram representative of row key distribution inside a region.
> ---------------------------------------------------------------------
>
>                 Key: HBASE-9815
>                 URL: https://issues.apache.org/jira/browse/HBASE-9815
>             Project: HBase
>          Issue Type: New Feature
>          Components: HFile
>    Affects Versions: 0.89-fb
>            Reporter: Manukranth Kolloju
>            Assignee: Manukranth Kolloju
>             Fix For: 0.89-fb
>
>         Attachments: Histogram-9815.diff
>
>
> Using histogram information, users can parallelize the scan workload into 
> equal sized scans based on the estimated size from the Histogram information. 
> This will help in enabling systems which are trying to perform queries on top 
> of HBase to do cost based optimization while scanning. The Idea is to keep 
> this histogram information in the HFile in the trailer and populate this on 
> compaction and flush. 
> The HRegionInterface can expose an API to return the Histogram information of 
> a region, which can be generated by merging histograms of all the hfiles.
> Implementing the histogram on the basis of 
> http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
> http://dl.acm.org/citation.cfm?id=1951376
> and NumericHistogram from hive.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to