[ 
https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srini Pindi updated HIVE-11034:
-------------------------------
    Description: 
Join between one main table with other tables with different join columns 
returns wrong results in hive. Changing the order of the joins between main 
table and other tables is producing different results.

Here are the steps to reproduce the issue:
1. Create tables as follows:
    create table p(ck string, email string);
    create table a1(ck string, flag string);
    create table a2(email string, flag string);
    create table a3(ck string, flag string);
2. Load data into the tables as follows:
    P
    ||ck||email||
    |10|e10|
    |20|e20|
    |30|e30|
    |40|e40|
 
    A1
    ||ck||flag||
    |10|N|
    |20|Y|
    |30|Y|
    |40|Y|

    A2
    ||email||flag||
    |e10|Y|
    |e20|N|
    |e30|Y|
    |e40|Y|
 
    A3
    ||ck||flag||
    |10|Y|
    |20|Y|
    |30|N|
    |40|Y|
  
 3. Good query:
{panel}
select p.ck 
from p 
left outer join a1 on p.ck = a1.ck 
left outer join a3 on p.ck = a3.ck 
left outer join a2 on p.email = a2.email 
where a1.flag = 'Y'
  and a3.flag = 'Y'
  and a2.flag = 'Y'
;
{panel}

and results are
  40

4. Bad query
{panel}
select p.ck 
from p 
left outer join a1 on p.ck = a1.ck 
left outer join a2 on p.email = a2.email 
left outer join a3 on p.ck = a3.ck 
where a1.flag = 'Y'
  and a2.flag = 'Y'
  and a3.flag = 'Y'
;
{panel}

 Producing results as:
 30
 40




  was:
Join between one main table with other tables with different join columns 
returns wrong results in hive. Changing the order of the joins between main 
table and other tables is producing different results.

Here are the steps to reproduce the issue:
1. Create tables as follows:
    create table p(ck string, email string);
    create table a1(ck string, flag string);
    create table a2(email string, flag string);
    create table a3(ck string, flag string);
2. Load data into the tables as follows:
    P
    ||ck||email||
    |10|e10|
    |20|e20|
    |30|e30|
    |40|e40|
 
    A1
    ||ck||flag||
    |10||N|
    |20||Y|
    |30||Y|
    |40||Y|

    A2
    ||email||flag||
    |e10||Y|
    |e20||N|
    |e30||Y|
    |e40||Y|
 
    A3
    ||ck||flag||
    |10||Y|
    |20||Y|
    |30||N|
    |40||Y|
  
 3. Good query:
{panel}
select p.ck 
from p 
left outer join a1 on p.ck = a1.ck 
left outer join a3 on p.ck = a3.ck 
left outer join a2 on p.email = a2.email 
where a1.flag = 'Y'
  and a3.flag = 'Y'
  and a2.flag = 'Y'
;
{panel}

and results are
  40

4. Bad query
{panel}
select p.ck 
from p 
left outer join a1 on p.ck = a1.ck 
left outer join a2 on p.email = a2.email 
left outer join a3 on p.ck = a3.ck 
where a1.flag = 'Y'
  and a2.flag = 'Y'
  and a3.flag = 'Y'
;
{panel}

 Producing results as:
 30
 40





> Joining multiple tables producing different results with different order of 
> join
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-11034
>                 URL: https://issues.apache.org/jira/browse/HIVE-11034
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0
>         Environment: Linux 2.6.32-279.19.1.el6.x86_64
>            Reporter: Srini Pindi
>            Priority: Critical
>
> Join between one main table with other tables with different join columns 
> returns wrong results in hive. Changing the order of the joins between main 
> table and other tables is producing different results.
> Here are the steps to reproduce the issue:
> 1. Create tables as follows:
>     create table p(ck string, email string);
>     create table a1(ck string, flag string);
>     create table a2(email string, flag string);
>     create table a3(ck string, flag string);
> 2. Load data into the tables as follows:
>     P
>     ||ck||email||
>     |10|e10|
>     |20|e20|
>     |30|e30|
>     |40|e40|
>  
>     A1
>     ||ck||flag||
>     |10|N|
>     |20|Y|
>     |30|Y|
>     |40|Y|
>     A2
>     ||email||flag||
>     |e10|Y|
>     |e20|N|
>     |e30|Y|
>     |e40|Y|
>  
>     A3
>     ||ck||flag||
>     |10|Y|
>     |20|Y|
>     |30|N|
>     |40|Y|
>   
>  3. Good query:
> {panel}
> select p.ck 
> from p 
> left outer join a1 on p.ck = a1.ck 
> left outer join a3 on p.ck = a3.ck 
> left outer join a2 on p.email = a2.email 
> where a1.flag = 'Y'
>   and a3.flag = 'Y'
>   and a2.flag = 'Y'
> ;
> {panel}
> and results are
>   40
> 4. Bad query
> {panel}
> select p.ck 
> from p 
> left outer join a1 on p.ck = a1.ck 
> left outer join a2 on p.email = a2.email 
> left outer join a3 on p.ck = a3.ck 
> where a1.flag = 'Y'
>   and a2.flag = 'Y'
>   and a3.flag = 'Y'
> ;
> {panel}
>  Producing results as:
>  30
>  40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to