[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530425#comment-16530425
 ] 

Gerald Sangudi commented on PHOENIX-4751:
-----------------------------------------

Ready for review. Pull request https://github.com/apache/phoenix/pull/308


> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.0, 4.13.1
>            Reporter: Gerald Sangudi
>            Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-------------------------------------------------------------+----------------++------------------+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-------------------------------------------------------------+----------------++------------------+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-------------------------------------------------------------+----------------++------------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to