James Taylor created PHOENIX-2988:
-------------------------------------

             Summary: Replace COUNT(DISTINCT...) with COUNT(...) when possible
                 Key: PHOENIX-2988
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2988
             Project: Phoenix
          Issue Type: Sub-task
            Reporter: James Taylor


An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) 
case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up 
being order preserving, you can replace the COUNT(DISTINCT pkCol) with a 
COUNT(pkCol) in the select expression nodes. That'll prevent the 
DistinctValueWithCountServerAggregator from being used which keeps a Map of all 
unique values and instead just keep a single overall count, which is all we 
need thanks to your DistinctPrefixFilter.

A few considerations in the implementation:
* Pass through select in the call to groupBy.compile() in QueryCompiler and you 
replacement the COUNT(DISTINCT ...) in place.
* The same replacements need to be done for the HAVING clause. We have a 
ParseNodeRewriter that'll help with that. You could do that by creating a 
derived class, overriding the {{visitLeave(final FunctionParseNode node, 
List<ParseNode> nodes)}} method to return a new COUNT parse node with the 
{{nodes}} passed in as children if {{node}} equals the DistinctCountParseNode 
that you replaced in the select statement.
* The compilation of the HAVING clause should be moved after the call to 
groupBy compile in QueryCompiler, like this:
{code}
        groupBy = groupBy.compile(context, select, innerPlanTupleProjector);
        Expression having = HavingCompiler.compile(context, select, groupBy);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to