[ https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha resolved IMPALA-4805. -------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed > Avoid hash exchanges before analytic functions in more situations. > ------------------------------------------------------------------ > > Key: IMPALA-4805 > URL: https://issues.apache.org/jira/browse/IMPALA-4805 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.8.0 > Reporter: Alexander Behm > Assignee: Aman Sinha > Priority: Major > Labels: performance, ramp-up > Fix For: Impala 4.0 > > > This case works as expected. There is no no hash exchange before > sort+analytic: > {code} > explain select /* +straight_join */ count(*) over (partition by t1.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +-----------------------------------------------------------+ > | Explain String | > +-----------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 | > | | > | PLAN-ROOT SINK | > | | | > | 07:EXCHANGE [UNPARTITIONED] | > | | | > | 04:ANALYTIC | > | | functions: count(*) | > | | partition by: t1.id | > | | | > | 03:SORT | > | | order by: id ASC NULLS FIRST | > | | | > | 02:HASH JOIN [INNER JOIN, PARTITIONED] | > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id | > | | | > | |--06:EXCHANGE [HASH(t2.id)] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | | | > | 05:EXCHANGE [HASH(t1.id)] | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > | partitions=24/24 files=24 size=478.45KB | > | runtime filters: RF000 -> t1.id | > +-----------------------------------------------------------+ > {code} > This equivalent case has an unnecessary hash exchange: > {code} > explain select /* +straight_join */ count(*) over (partition by t2.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +-----------------------------------------------------------+ > | Explain String | > +-----------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 | > | | > | PLAN-ROOT SINK | > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:ANALYTIC | > | | functions: count(*) | > | | partition by: t2.id | > | | | > | 03:SORT | > | | order by: id ASC NULLS FIRST | > | | | > | 07:EXCHANGE [HASH(t2.id)] | > | | | > | 02:HASH JOIN [INNER JOIN, PARTITIONED] | > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id | > | | | > | |--06:EXCHANGE [HASH(t2.id)] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | | | > | 05:EXCHANGE [HASH(t1.id)] | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > | partitions=24/24 files=24 size=478.45KB | > | runtime filters: RF000 -> t1.id | > +-----------------------------------------------------------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)