[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

James Taylor (JIRA) Sat, 11 Jun 2016 17:36:07 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326107#comment-15326107
 ]


James Taylor commented on PHOENIX-2965:
---------------------------------------

+1, but if there's not already a test, can you add a test for a {{SELECT 
COUNT(DISTINCT nonPKCol)}} test? I just want to make sure that having 
GroupBy.expressions on an ungrouped aggregation doesn't throw any logic off for 
this case. 

Also, one more optimization that would really benefit the {{SELECT 
COUNT(DISTINCT pkCol)}} case: if there's only a single COUNT(DISTINCT pkCol) 
and the GroupBy ends up being order preserving, you can replace the 
{{COUNT(DISTINCT pkCol)}} with a {{COUNT(pkCol)}} in the select expression 
nodes. Just pass through {{select}} in the call to groupBy.compile() in 
QueryCompiler and you can do the replacement in place. That'll prevent the 
DistinctValueWithCountServerAggregator from being used which keeps a Map of all 
unique values and instead just keep a single overall count, which is all we 
need thanks to your DistinctPrefixFilter.

> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2965
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2965
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 2965-v2.txt, 2965-v3.txt, 2965-v4.txt, 2965-v5.txt, 
> 2965-v6.txt, 2965-v7.txt, 2965-v8.txt, 2965-v9.txt, 2965.txt, 
> PHOENIX-2965_wip.patch
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY 
> operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as 
> well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are 
> planned and optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

Reply via email to