[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

James Taylor (JIRA) Sat, 11 Jun 2016 18:40:49 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326128#comment-15326128
 ]


James Taylor commented on PHOENIX-2965:
---------------------------------------

+1 on the patch. I filed PHOENIX-2988 for the additional optimization. 

A couple of considerations:
- The check for COUNT(DISTINCT ...) in the select is too simplistic. You really 
should use a ParseNodeVisitor, probably something derived from 
StatelessTraverseAllParseNodeVisitor. For example, the following wouldn't be 
optimization:
{code}
SELECT COUNT(DISTINCT foo) / 10 FROM T;
{code}
It's not the end of the world, though, if the optimization doesn't happen - 
just something to improve.
- If the HAVING clause use any aggregate functions other than the same 
COUNT(DISTINCT foo) call, you can't do this optimization. Perhaps that's 
detected by the context.getAggregationManager().isEmpty() call? If so, if the 
same COUNT(DISTINCT) call is used, I think that'd prevent the optimization from 
being used too. Some examples:
{code}
SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(DISTINCT pk)  > 10; // should 
optimize
SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(other_col)  > 10; // can't 
optimize
{code}
Same as above - I believe the optimization just wouldn't be done when it 
couldn't which is ok.

> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2965
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2965
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 2965-v10.txt, 2965-v2.txt, 2965-v3.txt, 2965-v4.txt, 
> 2965-v5.txt, 2965-v6.txt, 2965-v7.txt, 2965-v8.txt, 2965-v9.txt, 2965.txt, 
> PHOENIX-2965_wip.patch
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY 
> operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as 
> well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are 
> planned and optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

Reply via email to