[ https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326128#comment-15326128 ]
James Taylor commented on PHOENIX-2965: --------------------------------------- +1 on the patch. I filed PHOENIX-2988 for the additional optimization. A couple of considerations: - The check for COUNT(DISTINCT ...) in the select is too simplistic. You really should use a ParseNodeVisitor, probably something derived from StatelessTraverseAllParseNodeVisitor. For example, the following wouldn't be optimization: {code} SELECT COUNT(DISTINCT foo) / 10 FROM T; {code} It's not the end of the world, though, if the optimization doesn't happen - just something to improve. - If the HAVING clause use any aggregate functions other than the same COUNT(DISTINCT foo) call, you can't do this optimization. Perhaps that's detected by the context.getAggregationManager().isEmpty() call? If so, if the same COUNT(DISTINCT) call is used, I think that'd prevent the optimization from being used too. Some examples: {code} SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(DISTINCT pk) > 10; // should optimize SELECT COUNT(DISTINCT pk) FROM T HAVING COUNT(other_col) > 10; // can't optimize {code} Same as above - I believe the optimization just wouldn't be done when it couldn't which is ok. > Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY > ------------------------------------------------------------------------------ > > Key: PHOENIX-2965 > URL: https://issues.apache.org/jira/browse/PHOENIX-2965 > Project: Phoenix > Issue Type: Sub-task > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 4.8.0 > > Attachments: 2965-v10.txt, 2965-v2.txt, 2965-v3.txt, 2965-v4.txt, > 2965-v5.txt, 2965-v6.txt, 2965-v7.txt, 2965-v8.txt, 2965-v9.txt, 2965.txt, > PHOENIX-2965_wip.patch > > > Parent uses skip scanning to optimize DISTINCT and certain GROUP BY > operations along the row key. > COUNT queries are optimized differently, could be sped up significantly as > well. > [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are > planned and optimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)