[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

Ethan Wang (JIRA) Sun, 30 Jul 2017 23:38:46 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106864#comment-16106864
 ]


Ethan Wang commented on PHOENIX-418:
------------------------------------

Regarding the syntax of approximate discount.  Carry on from the discussion 
from PHOENIX-3390, purposing the syntax to be

Original cardinality count function:
select count(distinct name) from person

With approximate:
select count(distinct name) from person APPROXIMATE 
select count(distinct name) from person APPROXIMATE 'hll' 
select count(distinct name) from person APPROXIMATE 'algorithm ABC' (WITHIN 10 
PERCENT)


> Support approximate COUNT DISTINCT
> ----------------------------------
>
>                 Key: PHOENIX-418
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-418
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Ethan Wang
>              Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

Reply via email to