[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

James Taylor (JIRA) Wed, 08 Jun 2016 14:07:45 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321443#comment-15321443
 ]


James Taylor commented on PHOENIX-2965:
---------------------------------------

I'd remove the statement.isDistinct() check here as I don't think it's needed 
and it might even lead to an issue:
{code}
--- 
a/phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java
+++ 
b/phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java
@@ -230,7 +230,7 @@ public abstract class BaseResultIterators extends 
ExplainTable implements Result
                 
!plan.getStatement().getHint().hasHint(HintNode.Hint.RANGE_SCAN) &&
                 cols < 
plan.getTableRef().getTable().getRowKeySchema().getFieldCount() &&
                 plan.getGroupBy().isOrderPreserving() && 
-                (plan.getStatement().isDistinct() || 
context.getAggregationManager().isEmpty()))
+                (plan.getStatement().isDistinct() || 
context.getAggregationManager().isEmpty() || 
plan.getGroupBy().isUngroupedAggregate()))
{code}

One more test to add would be an aggregate query that does a distinct (in which 
case plan.getStatement().isDistinct() would be true). In this case, the 
distinct is executed by deduping on the client side. I don't think you'd want 
to use the optimization, but it might kick in with out changing the above. 
{code}
SELECT DISTINCT sum(pk2) FROM t GROUP BY pk1;
{code}

> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2965
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2965
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 2965-v2.txt, 2965-v3.txt, 2965-v4.txt, 2965-v5.txt, 
> 2965.txt
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY 
> operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as 
> well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are 
> planned and optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

Reply via email to