[ 
https://issues.apache.org/jira/browse/HBASE-19722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491343#comment-16491343
 ] 

Xu Cang commented on HBASE-19722:
---------------------------------

"7 / e number of items   (350 in this example with error rate being 0.02)"  is 
the extreme case that all elements' frequency is evenly distributed. 

(such as we have 17500 data points and each data appears exactly 50 times. )

 

Another interesting character from this algorithm is: Item with frequency lower 
than 'CurrentTerm - errorRate' will be swept out of this bucket. 

E.g. 1

For example, let's say we have 10k data points. error rate is 0.05. Then, 
bucket size is 1 / 0.05 = 20 

CurrentTerm after inputting all data will be 10k / 20 = 500.

So, all data with occurrence less than 499.95 will be removed. 

 

E.g.2

Let's change error rate to 0.02 from the last example.  

Bucket size will be 1 / 0.02 = 50

CurrentTerm will be 10k / 50 = 200

So, only data with occurrence less than 199.98 will be removed.

 

Intuitive observation from above is, if the error rate is too big, it may 
exclude many elements with fairly high frequency. 

So, this algorithm is a great fit for finding HOT CLIENTS/ HOT TOPIC kind of 
things. Not a good candidate for other things... 

 

 

 

 

 

> Implement a meta query statistics metrics source
> ------------------------------------------------
>
>                 Key: HBASE-19722
>                 URL: https://issues.apache.org/jira/browse/HBASE-19722
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Xu Cang
>            Priority: Major
>         Attachments: HBASE-19722.branch-1.v001.patch, 
> HBASE-19722.master.010.patch, HBASE-19722.master.011.patch, 
> HBASE-19722.master.012.patch, HBASE-19722.master.013.patch
>
>
> Implement a meta query statistics metrics source, created whenever a 
> regionserver starts hosting meta, removed when meta hosting moves. Provide 
> views on top tables by request counts, top meta rowkeys by request count, top 
> clients making requests by their hostname. 
> Can be implemented as a coprocessor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to