[jira] [Updated] (HBASE-9815) Add Histogram representative of row key distribution inside a region.

2014-01-14 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-9815:
--

Attachment: Histogram-9815.diff

Attaching the implementation based on the above paper.

 Add Histogram representative of row key distribution inside a region.
 -

 Key: HBASE-9815
 URL: https://issues.apache.org/jira/browse/HBASE-9815
 Project: HBase
  Issue Type: New Feature
  Components: HFile
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Fix For: 0.89-fb

 Attachments: Histogram-9815.diff


 Using histogram information, users can parallelize the scan workload into 
 equal sized scans based on the estimated size from the Histogram information. 
 This will help in enabling systems which are trying to perform queries on top 
 of HBase to do cost based optimization while scanning. The Idea is to keep 
 this histogram information in the HFile in the trailer and populate this on 
 compaction and flush. 
 The HRegionInterface can expose an API to return the Histogram information of 
 a region, which can be generated by merging histograms of all the hfiles.
 Implementing the histogram on the basis of 
 http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
 http://dl.acm.org/citation.cfm?id=1951376
 and NumericHistogram from hive.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9815) Add Histogram representative of row key distribution inside a region.

2013-10-22 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju updated HBASE-9815:
--

Description: 
Using histogram information, users can parallelize the scan workload into equal 
sized scans based on the estimated size from the Histogram information. This 
will help in enabling systems which are trying to perform queries on top of 
HBase to do cost based optimization while scanning. The Idea is to keep this 
histogram information in the HFile in the trailer and populate this on 
compaction and flush. 

The HRegionInterface can expose an API to return the Histogram information of a 
region, which can be generated by merging histograms of all the hfiles.

Implementing the histogram on the basis of 
http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
http://dl.acm.org/citation.cfm?id=1951376
and NumericHistogram from hive.

  was:
Using histogram information, users can parallelize the scan workload into equal 
sized scans based on the estimated size from the Histogram information. This 
will help in enabling systems which are trying to perform queries on top of 
HBase to do cost based optimization while scanning. The Idea is to keep this 
histogram information into the HFile in the trailer and populate this on 
compaction and/or flush. 

The HRegionInterface can expose an API to return the Histogram information of a 
region, which can be generated by merging histograms of all the hfiles.

Implementing the histogram on the basis of 
http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
http://dl.acm.org/citation.cfm?id=1951376
and NumericHistogram from hive.


 Add Histogram representative of row key distribution inside a region.
 -

 Key: HBASE-9815
 URL: https://issues.apache.org/jira/browse/HBASE-9815
 Project: HBase
  Issue Type: New Feature
  Components: HFile
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Fix For: 0.89-fb


 Using histogram information, users can parallelize the scan workload into 
 equal sized scans based on the estimated size from the Histogram information. 
 This will help in enabling systems which are trying to perform queries on top 
 of HBase to do cost based optimization while scanning. The Idea is to keep 
 this histogram information in the HFile in the trailer and populate this on 
 compaction and flush. 
 The HRegionInterface can expose an API to return the Histogram information of 
 a region, which can be generated by merging histograms of all the hfiles.
 Implementing the histogram on the basis of 
 http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
 http://dl.acm.org/citation.cfm?id=1951376
 and NumericHistogram from hive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)