[jira] [Commented] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.

ASF subversion and git services (Jira) Fri, 12 Feb 2021 09:33:06 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283863#comment-17283863
 ]


ASF subversion and git services commented on IMPALA-4805:
---------------------------------------------------------

Commit 4721978e8fb6d80a9f023e568b983b12b14f8acc in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4721978 ]

IMPALA-4805: Avoid hash exchange before analytic function if appropriate

This patch avoids adding a hash exchange below an analytic function
that has partition by b as long as the child can satisfy that
requirement through an equivalence relationship .. i.e an exact match is
not required.

For example:
select count(*) over (partition by b) from t1, t2 where a = b

In this case, the analytic sort has a required partitioning on b but the
child is an inner join whose output partition key could be either 'a' or
'b' (it happens to be 'a' given how the data partition was populated),
then we should still be able to use the child's partitioning without
adding a hash exchange. Note that for outer joins the logic is slightly
different.

Testing:
 - Added a new planner test with analytic function + inner join
   (outer join test case already existed before).

Change-Id: Icb6289d1e70cfb6bbd5b38eedb00856dbc85ac77
Reviewed-on: http://gerrit.cloudera.org:8080/16888
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Avoid hash exchanges before analytic functions in more situations.
> ------------------------------------------------------------------
>
>                 Key: IMPALA-4805
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4805
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Assignee: Aman Sinha
>            Priority: Major
>              Labels: performance, ramp-up
>
> This case works as expected. There is no no hash exchange before 
> sort+analytic:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t1.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String                                            |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
> |                                                           |
> | PLAN-ROOT SINK                                            |
> | |                                                         |
> | 07:EXCHANGE [UNPARTITIONED]                               |
> | |                                                         |
> | 04:ANALYTIC                                               |
> | |  functions: count(*)                                    |
> | |  partition by: t1.id                                    |
> | |                                                         |
> | 03:SORT                                                   |
> | |  order by: id ASC NULLS FIRST                           |
> | |                                                         |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
> | |  hash predicates: t1.id = t2.id                         |
> | |  runtime filters: RF000 <- t2.id                        |
> | |                                                         |
> | |--06:EXCHANGE [HASH(t2.id)]                              |
> | |  |                                                      |
> | |  01:SCAN HDFS [functional.alltypes t2]                  |
> | |     partitions=24/24 files=24 size=478.45KB             |
> | |                                                         |
> | 05:EXCHANGE [HASH(t1.id)]                                 |
> | |                                                         |
> | 00:SCAN HDFS [functional.alltypes t1]                     |
> |    partitions=24/24 files=24 size=478.45KB                |
> |    runtime filters: RF000 -> t1.id                        |
> +-----------------------------------------------------------+
> {code}
> This equivalent case has an unnecessary hash exchange:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t2.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +-----------------------------------------------------------+
> | Explain String                                            |
> +-----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
> |                                                           |
> | PLAN-ROOT SINK                                            |
> | |                                                         |
> | 08:EXCHANGE [UNPARTITIONED]                               |
> | |                                                         |
> | 04:ANALYTIC                                               |
> | |  functions: count(*)                                    |
> | |  partition by: t2.id                                    |
> | |                                                         |
> | 03:SORT                                                   |
> | |  order by: id ASC NULLS FIRST                           |
> | |                                                         |
> | 07:EXCHANGE [HASH(t2.id)]                                 |
> | |                                                         |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
> | |  hash predicates: t1.id = t2.id                         |
> | |  runtime filters: RF000 <- t2.id                        |
> | |                                                         |
> | |--06:EXCHANGE [HASH(t2.id)]                              |
> | |  |                                                      |
> | |  01:SCAN HDFS [functional.alltypes t2]                  |
> | |     partitions=24/24 files=24 size=478.45KB             |
> | |                                                         |
> | 05:EXCHANGE [HASH(t1.id)]                                 |
> | |                                                         |
> | 00:SCAN HDFS [functional.alltypes t1]                     |
> |    partitions=24/24 files=24 size=478.45KB                |
> |    runtime filters: RF000 -> t1.id                        |
> +-----------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.

Reply via email to