[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9146:
---
Component/s: Logical Optimizer

> Query with left joins produces wrong result when join condition is written in 
> different order
> -
>
> Key: HIVE-9146
> URL: https://issues.apache.org/jira/browse/HIVE-9146
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 0.13.1
>Reporter: Kamil Gorlo
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
>
> I have two queries which should be equal (I only swap two join conditions) 
> but they are not. They are simplest queries I could produce to reproduce bug.
> I have two simple tables:
> desc kgorlo_comm;
> | col_name  | data_type  | comment  |
> | id| bigint |  |
> | dest_id   | bigint |  |
> desc kgorlo_log; 
> | col_name  | data_type  | comment  |
> | id| bigint |  |
> | dest_id   | bigint |  |
> | tstamp| bigint |  |
> With data:
> select * from kgorlo_comm; 
> | kgorlo_comm.id  | kgorlo_comm.dest_id  |
> | 1   | 2|
> | 2   | 1|
> | 1   | 3|
> | 2   | 3|
> | 3   | 5|
> | 4   | 5|
> select * from kgorlo_log; 
> | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
> | 1  | 2   | 0  |
> | 1  | 3   | 0  |
> | 1  | 5   | 0  |
> | 3  | 1   | 0  |
> And when I run this query (query no. 1):
> {quote}
> select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
> group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
> group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
> {quote}
> I get result (which is correct):
> | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
> | 1   | 2| 1  | 1  |
> | 1   | 3| 1  | NULL   |
> | 1   | 5| NULL   | NULL   |
> | 3   | 1| NULL   | 1  |
> But when I run second query (query no. 2):
> {quote}
> select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
> group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
> group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
> {quote}
> I get different (and bad, in my opinion) result:
> |log.id | log.dest_id | com1.msgs | com2.msgs|
> |1|2|1|1|
> |1|3|1|1|
> |1|5|NULL|NULL|
> |3|1|NULL|NULL|
> Query no. 1 and query no. 2 are different in only one place, it is second 
> join condition:
> bf. com2.dest_id=log.id and com2.id=log.dest_id
> vs
> bf. com2.id=log.dest_id and com2.dest_id=log.id
> which in my opinion are equal.
> Explains for both queries are of course slightly different (columns are 
> swapped) and they are here:
> https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
> https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-17 Thread Kamil Gorlo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Gorlo updated HIVE-9146:
--
Description: 
I have two queries which should be equal (I only swap two join conditions) but 
they are not. They are simplest queries I could produce to reproduce bug.

I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}

I get result (which is correct):
| log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
| 1   | 2| 1  | 1  |
| 1   | 3| 1  | NULL   |
| 1   | 5| NULL   | NULL   |
| 3   | 1| NULL   | 1  |

But when I run second query (query no. 2):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
{quote}

I get different (and bad, in my opinion) result:
|log.id | log.dest_id | com1.msgs | com2.msgs|
|1|2|1|1|
|1|3|1|1|
|1|5|NULL|NULL|
|3|1|NULL|NULL|

Query no. 1 and query no. 2 are different in only one place, it is second join 
condition:
bf. com2.dest_id=log.id and com2.id=log.dest_id
vs
bf. com2.id=log.dest_id and com2.dest_id=log.id

which in my opinion are equal.

Explains for both queries are of course slightly different (columns are 
swapped) and they are here:

https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)

  was:
I have two queries which should be equal (I only swap two join conditions) but 
they are not. They are simplest queries I could produce to reproduce bug.

I have two simple tables:

desc kgorlo_comm;
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |

desc kgorlo_log; 
| col_name  | data_type  | comment  |
| id| bigint |  |
| dest_id   | bigint |  |
| tstamp| bigint |  |

With data:

select * from kgorlo_comm; 
| kgorlo_comm.id  | kgorlo_comm.dest_id  |
| 1   | 2|
| 2   | 1|
| 1   | 3|
| 2   | 3|
| 3   | 5|
| 4   | 5|

select * from kgorlo_log; 
| kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
| 1  | 2   | 0  |
| 1  | 3   | 0  |
| 1  | 5   | 0  |
| 3  | 1   | 0  |

And when I run this query (query no. 1):
{quote}
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group 
by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
{quote}

I get result (which is correct):
| log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
| 1   | 2| 1  | 1  |
| 1   | 3| 1  | NULL   |
| 1   | 5| NULL   | NULL   |
| 3   | 1| NU