[
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamil Gorlo updated HIVE-9146:
--
Description:
I have two queries which should be equal (I only swap two join conditions) but
they are not. They are simplest queries I could produce to reproduce bug.
I have two simple tables:
desc kgorlo_comm;
| col_name | data_type | comment |
| id| bigint | |
| dest_id | bigint | |
desc kgorlo_log;
| col_name | data_type | comment |
| id| bigint | |
| dest_id | bigint | |
| tstamp| bigint | |
With data:
select * from kgorlo_comm;
| kgorlo_comm.id | kgorlo_comm.dest_id |
| 1 | 2|
| 2 | 1|
| 1 | 3|
| 2 | 3|
| 3 | 5|
| 4 | 5|
select * from kgorlo_log;
| kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 5 | 0 |
| 3 | 1 | 0 |
And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}
I get result (which is correct):
| log.id | log.dest_id | com1.msgs | com2.msgs |
| 1 | 2| 1 | 1 |
| 1 | 3| 1 | NULL |
| 1 | 5| NULL | NULL |
| 3 | 1| NULL | 1 |
But when I run second query (query no. 2):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
{quote}
I get different (and bad, in my opinion) result:
|log.id | log.dest_id | com1.msgs | com2.msgs|
|1|2|1|1|
|1|3|1|1|
|1|5|NULL|NULL|
|3|1|NULL|NULL|
Query no. 1 and query no. 2 are different in only one place, it is second join
condition:
bf. com2.dest_id=log.id and com2.id=log.dest_id
vs
bf. com2.id=log.dest_id and com2.dest_id=log.id
which in my opinion are equal.
Explains for both queries are of course slightly different (columns are
swapped) and they are here:
https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)
was:
I have two queries which should be equal (I only swap two join conditions) but
they are not. They are simplest queries I could produce to reproduce bug.
I have two simple tables:
desc kgorlo_comm;
| col_name | data_type | comment |
| id| bigint | |
| dest_id | bigint | |
desc kgorlo_log;
| col_name | data_type | comment |
| id| bigint | |
| dest_id | bigint | |
| tstamp| bigint | |
With data:
select * from kgorlo_comm;
| kgorlo_comm.id | kgorlo_comm.dest_id |
| 1 | 2|
| 2 | 1|
| 1 | 3|
| 2 | 3|
| 3 | 5|
| 4 | 5|
select * from kgorlo_log;
| kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 5 | 0 |
| 3 | 1 | 0 |
And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}
I get result (which is correct):
| log.id | log.dest_id | com1.msgs | com2.msgs |
| 1 | 2| 1 | 1 |
| 1 | 3| 1 | NULL |
| 1 | 5| NULL | NULL |
| 3 | 1| NU