[ https://issues.apache.org/jira/browse/IMPALA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-8386: ----------------------------------- Description: skyyws <sky...@163.com> reported a bug [in the mailing list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E] on the following data set: {code:java} table A +------+ | a_id | +------+ | 1 | | 2 | +------+ table B +------+--------+ | b_id | amount | +------+--------+ | 1 | 10 | | 1 | 20 | | 2 | NULL | +------+--------+ table C +------+------+ | a_id | b_id | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+{code} The following query returns a wrong result "1": {code:java} select count(1) from ( select t2.a_id,t2.amount1,t2.amount2 from( select a_id from a) t1 left outer join ( select c.a_id,sum(amount) as amount1,sum(amount) as amount2 from b join c on b.b_id = c.b_id group by c.a_id) t2 on t1.a_id = t2.a_id ) t; {code} Removing "t2.amount2" can get the right result "2": {code:java} select count(1) from ( select t2.a_id,t2.amount1 from( select a_id from a) t1 left outer join ( select c.a_id,sum(amount) as amount1,sum(amount) as amount2 from b join c on b.b_id = c.b_id group by c.a_id) t2 on t1.a_id = t2.a_id ) t; {code} The problem is that in query 1, Impala generates a wrong predicate "sum(amount) = sum(amount)" which rejects nulls: {code:java} +-------------------------------------------------------------+ | Explain String | +-------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 | | Per-Host Resource Estimates: Memory=143MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 12:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | row-size=8B cardinality=1 | | | | | 11:EXCHANGE [UNPARTITIONED] | | | | | 06:AGGREGATE | | | output: count(*) | | | row-size=8B cardinality=1 | | | | | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | | | hash predicates: c.a_id = a_id | | | other predicates: sum(amount) = sum(amount) <---------- Wrong inferred predicate which incorrectly reject nulls | | runtime filters: RF000 <- a_id | | | row-size=16B cardinality=2 | | | | | |--10:EXCHANGE [HASH(a_id)] | | | | | | | 00:SCAN HDFS [default.a] | | | partitions=1/1 files=1 size=4B | | | row-size=4B cardinality=2 | | | | | 09:AGGREGATE [FINALIZE] | | | output: sum:merge(amount) | | | group by: c.a_id | | | row-size=12B cardinality=2 | | | | | 08:EXCHANGE [HASH(c.a_id)] | | | | | 04:AGGREGATE [STREAMING] | | | output: sum(amount) | | | group by: c.a_id | | | row-size=12B cardinality=2 | | | | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: b.b_id = c.b_id | | | runtime filters: RF002 <- c.b_id | | | row-size=16B cardinality=3 | | | | | |--07:EXCHANGE [BROADCAST] | | | | | | | 02:SCAN HDFS [default.c] | | | partitions=1/1 files=1 size=8B | | | runtime filters: RF000 -> default.c.a_id | | | row-size=8B cardinality=2 | | | | | 01:SCAN HDFS [default.b] | | partitions=1/1 files=1 size=15B | | runtime filters: RF002 -> b.b_id | | row-size=8B cardinality=3 | +-------------------------------------------------------------+ {code} was: skyyws <sky...@163.com> reported a bug [in the mailing list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E] on the following data set: {code:java} table A +------+ | a_id | +------+ | 1 | | 2 | +------+ table B +------+--------+ | b_id | amount | +------+--------+ | 1 | 10 | | 1 | 20 | | 2 | NULL | +------+--------+ table C +------+------+ | a_id | b_id | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+{code} The following query returns a wrong result "1": {code:java} select count(1) from ( select t2.a_id,t2.amount1,t2.amount2 from( select a_id from a) t1 left outer join ( select c.a_id,sum(amount) as amount1,sum(amount) as amount2 from b join c on b.b_id = c.b_id group by c.a_id) t2 on t1.a_id = t2.a_id ) t; {code} Removing "t2.amount2" can get the right result "2": {code:java} select count(1) from ( select t2.a_id,t2.amount1 from( select a_id from a) t1 left outer join ( select c.a_id,sum(amount) as amount1,sum(amount) as amount2 from b join c on b.b_id = c.b_id group by c.a_id) t2 on t1.a_id = t2.a_id ) t; {code} > Incorrect predicate in a left outer join query > ---------------------------------------------- > > Key: IMPALA-8386 > URL: https://issues.apache.org/jira/browse/IMPALA-8386 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.12.0, Impala 3.2.0 > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > Labels: correctness > > skyyws <sky...@163.com> reported a bug [in the mailing > list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E] > on the following data set: > {code:java} > table A > +------+ > | a_id | > +------+ > | 1 | > | 2 | > +------+ > table B > +------+--------+ > | b_id | amount | > +------+--------+ > | 1 | 10 | > | 1 | 20 | > | 2 | NULL | > +------+--------+ > table C > +------+------+ > | a_id | b_id | > +------+------+ > | 1 | 1 | > | 2 | 2 | > +------+------+{code} > The following query returns a wrong result "1": > {code:java} > select count(1) from ( > select t2.a_id,t2.amount1,t2.amount2 > from( select a_id from a) t1 > left outer join ( > select c.a_id,sum(amount) as amount1,sum(amount) as amount2 > from b join c on b.b_id = c.b_id group by c.a_id) t2 > on t1.a_id = t2.a_id > ) t; > {code} > Removing "t2.amount2" can get the right result "2": > {code:java} > select count(1) from ( > select t2.a_id,t2.amount1 > from( select a_id from a) t1 > left outer join ( > select c.a_id,sum(amount) as amount1,sum(amount) as amount2 > from b join c on b.b_id = c.b_id group by c.a_id) t2 > on t1.a_id = t2.a_id > ) t; > {code} > The problem is that in query 1, Impala generates a wrong predicate > "sum(amount) = sum(amount)" which rejects nulls: > {code:java} > +-------------------------------------------------------------+ > | Explain String | > +-------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 | > | Per-Host Resource Estimates: Memory=143MB | > | Codegen disabled by planner | > | | > | PLAN-ROOT SINK | > | | | > | 12:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1 | > | | | > | 11:EXCHANGE [UNPARTITIONED] | > | | | > | 06:AGGREGATE | > | | output: count(*) | > | | row-size=8B cardinality=1 | > | | | > | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | > | | hash predicates: c.a_id = a_id | > | | other predicates: sum(amount) = sum(amount) <---------- Wrong > inferred predicate which incorrectly reject nulls > | | runtime filters: RF000 <- a_id | > | | row-size=16B cardinality=2 | > | | | > | |--10:EXCHANGE [HASH(a_id)] | > | | | | > | | 00:SCAN HDFS [default.a] | > | | partitions=1/1 files=1 size=4B | > | | row-size=4B cardinality=2 | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: sum:merge(amount) | > | | group by: c.a_id | > | | row-size=12B cardinality=2 | > | | | > | 08:EXCHANGE [HASH(c.a_id)] | > | | | > | 04:AGGREGATE [STREAMING] | > | | output: sum(amount) | > | | group by: c.a_id | > | | row-size=12B cardinality=2 | > | | | > | 03:HASH JOIN [INNER JOIN, BROADCAST] | > | | hash predicates: b.b_id = c.b_id | > | | runtime filters: RF002 <- c.b_id | > | | row-size=16B cardinality=3 | > | | | > | |--07:EXCHANGE [BROADCAST] | > | | | | > | | 02:SCAN HDFS [default.c] | > | | partitions=1/1 files=1 size=8B | > | | runtime filters: RF000 -> default.c.a_id | > | | row-size=8B cardinality=2 | > | | | > | 01:SCAN HDFS [default.b] | > | partitions=1/1 files=1 size=15B | > | runtime filters: RF002 -> b.b_id | > | row-size=8B cardinality=3 | > +-------------------------------------------------------------+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org