Alexander Behm created IMPALA-6286: -------------------------------------- Summary: Wrong results with outer join and RUNTIME_FILTER_MODE=GLOBAL Key: IMPALA-6286 URL: https://issues.apache.org/jira/browse/IMPALA-6286 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, Impala 2.7.0, Impala 2.6.0, Impala 2.5.0 Reporter: Alexander Behm Priority: Blocker
Queries with the following characteristics may produce wrong results due to an incorrectly assigned runtime filter: * The query option RUNTIME_FILTER_MODE is set to GLOBAL * The query has an outer join * A scan on the nullable side of that outer join has a runtime filter with a NULL-checking expression such as COALESCE/IFNULL/CASE * The latter point imples that there is another join above the outer join with a NULL-checking expression in it's join condition Reproduction: {code} select count(*) from functional.alltypestiny t1 left outer join functional.alltypestiny t2 on t1.id = t2.id where coalesce(t2.id + 10, 100) in (select 100) +----------+ | count(*) | +----------+ | 8 | +----------+ {code} We expect a count of 0. A count of 8 is incorrect. Query plan: {code} +---------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=3.88MB | | Per-Host Resource Estimates: Memory=87.88MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 10:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | | | 09:EXCHANGE [UNPARTITIONED] | | | | | 05:AGGREGATE | | | output: count(*) | | | | | 04:HASH JOIN [LEFT SEMI JOIN, BROADCAST] | | | hash predicates: coalesce(t2.id + 10, 100) = `$a$1`.`$c$1` | | | runtime filters: RF000 <- `$a$1`.`$c$1` | | | | | |--08:EXCHANGE [BROADCAST] | | | | | | | 02:UNION | | | constant-operands=1 | | | | | 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | | | hash predicates: t1.id = t2.id | | | | | |--07:EXCHANGE [HASH(t2.id)] | | | | | | | 01:SCAN HDFS [functional.alltypestiny t2] | | | partitions=4/4 files=4 size=460B | | | runtime filters: RF000 -> coalesce(t2.id + 10, 100) <--- This runtime filter is not correct | | | | | 06:EXCHANGE [HASH(t1.id)] | | | | | 00:SCAN HDFS [functional.alltypestiny t1] | | partitions=4/4 files=4 size=460B | +---------------------------------------------------------------+ {code} Explanation: * RF000 filters out all rows in the scan * In join 03 there are no join matches since the right-hand is empty. All rows from the right-hand side are nulled. * The join condition in join 04 now satisfies all input rows because every "t2.id" is NULL, so after the COALESCE() the join condition becomes 100 = 100 *Workaround* * Set RUNTIME_FILTER_MODE to LOCAL or OFF -- This message was sent by Atlassian JIRA (v6.4.14#64029)