[ 
https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321286#comment-15321286
 ] 

James Taylor commented on PHOENIX-2965:
---------------------------------------

Good stuff, [~lhofhansl]. More for my understanding, but a couple of questions:
- For these queries, do we end up with the DistinctCountAggregateFunction in 
the GROUP BY expressions? Or only the second one?
{code}
+        testCount("SELECT %s COUNT(DISTINCT prefix1) FROM " + testTable, 4);
+        testCount("SELECT COUNT(*) FROM (SELECT %s DISTINCT prefix1, prefix2 
FROM " + testTable + ")", 11);
{code}
And is that why you need the following change?
{code}
@@ -325,7 +346,7 @@ public class GroupByCompiler {
             ParseNode node = groupByNodes.get(i);
             Expression expression = node.accept(compiler);
             if (!expression.isStateless()) {
-                if (compiler.isAggregate()) {
+                if (!isUngroupedAggregate && compiler.isAggregate()) {
                     throw new 
SQLExceptionInfo.Builder(SQLExceptionCode.AGGREGATE_IN_GROUP_BY)
                         
.setMessage(expression.toString()).build().buildException();
                 }
{code}
- What about the corner case of queries that have multiple COUNT(DISTINCT) 
aggregations? Not clear that the optimization would work since we've got a 
single scan running but need to find distinct values along more than one axis. 
Like in this contrived example:
{code}
SELECT COUNT(DISTINCT pk1), COUNT(DISTINCT pk2) FROM t WHERE pk1='foo'
{code}
Or slightly less contrived (and perhaps not working due to the RVC bug I think 
you found):
{code}
SELECT COUNT(DISTINCT pk1), COUNT(DISTINCT (pk1,pk2)) FROM t
{code}
- Do we need the check for the statement being distinct here, as at this point 
it's already been compiled to be an aggregation?
{code}
+++ 
b/phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java
@@ -230,7 +230,7 @@ public abstract class BaseResultIterators extends 
ExplainTable implements Result
                 
!plan.getStatement().getHint().hasHint(HintNode.Hint.RANGE_SCAN) &&
                 cols < 
plan.getTableRef().getTable().getRowKeySchema().getFieldCount() &&
                 plan.getGroupBy().isOrderPreserving() && 
-                (plan.getStatement().isDistinct() || 
context.getAggregationManager().isEmpty()))
+                (plan.getStatement().isDistinct() || 
context.getAggregationManager().isEmpty() || 
plan.getGroupBy().isUngroupedAggregate()))
{code}
- Does the above check ever pass when it shouldn't? For example, the following 
query cannot use the optimization since it needs to count every row, but would 
it pass that check? 
{code}
SELECT COUNT(pk1), COUNT(DISTINCT pk1) FROM t;
{code}
Might be good to have a QueryCompilerTest (similar to 
testNotGroupByOrderPreserving) that confirms that the optimization isn't used 
when it shouldn't be.

> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2965
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2965
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 2965-v2.txt, 2965-v3.txt, 2965.txt
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY 
> operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as 
> well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are 
> planned and optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to