[ 
https://issues.apache.org/jira/browse/PHOENIX-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464167#comment-16464167
 ] 

Vincent Poon commented on PHOENIX-4724:
---------------------------------------

[~aertoria] yea, the idea is when someone wants to build an index table, we 
sample the data table for some index column values, and put it into this 
histogram.  Since this is in equi-depth histogram, each bucket will have the 
same number of elements.  So we get the bucket bounds from the histogram and 
use those to pre-split the index table.

The histogram is relatively small, so we can keep it in memory, and perhaps 
save it for each table as you suggest.   It could then perhaps be used for 
query optimization or approximate count queries.

> Efficient Equi-Depth histogram for streaming data
> -------------------------------------------------
>
>                 Key: PHOENIX-4724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4724
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Major
>         Attachments: PHOENIX-4724.v1.patch
>
>
> Equi-Depth histogram from 
> http://web.cs.ucla.edu/~zaniolo/papers/Histogram-EDBT2011-CamReady.pdf, but 
> without the sliding window - we assume a single window over the entire data 
> set.
> Used to generate the bucket boundaries of a histogram where each bucket has 
> the same # of items.
> This is useful, for example, for pre-splitting an index table, by feeding in 
> data from the indexed column.
> Works on streaming data - the histogram is dynamically updated for each new 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to