[jira] [Commented] (PHOENIX-4164) APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values.

Lars Hofhansl (JIRA) Wed, 06 Sep 2017 13:29:45 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155983#comment-16155983
 ]


Lars Hofhansl commented on PHOENIX-4164:
----------------------------------------

Hmm... Lemme retry my test. I'll update tonight (do not have access to my 
personal machine right now).

> APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values.
> -------------------------------------------------------------
>
>                 Key: PHOENIX-4164
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4164
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Ethan Wang
>
> {code}
> 0: jdbc:phoenix:localhost> select count(*) from test;
> +-----------+
> | COUNT(1)  |
> +-----------+
> | 26931816  |
> +-----------+
> 1 row selected (14.604 seconds)
> 0: jdbc:phoenix:localhost> select approx_count_distinct(v1) from test;
> +----------------------------+
> | APPROX_COUNT_DISTINCT(V1)  |
> +----------------------------+
> | 17221394                   |
> +----------------------------+
> 1 row selected (21.619 seconds)
> {code}
> The table is generated from random numbers, and the cardinality of v1 is 
> close to the number of rows.
> (I cannot run a COUNT(DISTINCT(v1)), as it uses up all memory on my machine 
> and eventually kills the regionserver - that's another story and another jira)
> [~aertoria]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-4164) APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values.

Reply via email to