[
https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326128#comment-15326128
]
James Taylor commented on PHOENIX-2965:
---------------------------------------
+1 on the patch. I filed PHOENIX-2988 for the additional optimization.
A couple of considerations:
- The check for COUNT(DISTINCT ...) in the select is too simplistic. You really
should use a ParseNodeVisitor, probably something derived from
StatelessTraverseAllParseNodeVisitor. For example, the following wouldn't be
optimization:
{code}
SELECT COUNT(DISTINCT foo) / 10 FROM T;
{code}
It's not the end of the world, though, if the optimization doesn't happen -
just something to improve.
- If the HAVING clause use any aggregate functions other than the same
COUNT(DISTINCT foo) call, you can't do this optimization. Perhaps that's
detected by the context.getAggregationManager().isEmpty() call? If so, if the
same COUNT(DISTINCT) call is used, I think that'd prevent the optimization from
being used too. Some examples:
{code}
SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(DISTINCT pk) > 10; // should
optimize
SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(other_col) > 10; // can't
optimize
{code}
Same as above - I believe the optimization just wouldn't be done when it
couldn't which is ok.
> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
> Key: PHOENIX-2965
> URL: https://issues.apache.org/jira/browse/PHOENIX-2965
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 4.8.0
>
> Attachments: 2965-v10.txt, 2965-v2.txt, 2965-v3.txt, 2965-v4.txt,
> 2965-v5.txt, 2965-v6.txt, 2965-v7.txt, 2965-v8.txt, 2965-v9.txt, 2965.txt,
> PHOENIX-2965_wip.patch
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY
> operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as
> well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are
> planned and optimized.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)