Ping Lu created HIVE-13228:
------------------------------

             Summary: when set hive.auto.convert.join and 
hive.optimize.skewjoin to true, the SELECT query which contains 3-way join is 
not properly executed
                 Key: HIVE-13228
                 URL: https://issues.apache.org/jira/browse/HIVE-13228
             Project: Hive
          Issue Type: Bug
    Affects Versions: 2.0.0, 0.13.1
         Environment: Hadoop2.4.0 Hive0.13.1/2.0.0
            Reporter: Ping Lu
            Assignee: Ping Lu


when set hive.auto.convert.join and hive.optimize.skewjoin to true, the SELECT 
query which contains 3-way join(like the following Query1) is not properly 
executed. According to hive execution logs, the query ran finished without any 
errors, but didn’t return any results. If I set hive.optimize.skewjoin to 
false, or set hive.auto.convert.join to false and  hive.optimize.skewjoin to 
true, the SELECT query can both execute properly and return correct results. 

Steps to reproduce
Prepare: 
create three test tables and load data 
        create table test1(col1 string,col2 string) row format delimited fields 
terminated by "\t";
        create table test2(col1 string,col2 string) row format delimited fields 
terminated by "\t";
        create table test3(col1 string,col2 string) row format delimited fields 
terminated by "\t";
        load data local inpath "test1" into table test1;  //11870882 rows
        load data local inpath "test2" into table test2;  //11870882 rows
        load data local inpath "test3" into table test3;  //11870882 rows

Query1: SELECT query got wrong result
        set hive.optimize.skewjoin = true;
        set hive.auto.convert.join = true;
select
                t1.col1
        from
                test1 t1
        join
                test2 t2
        on t1.col1 = t2.col1
        join
                test3 t3
        on t1.col2 = t3.col1
        ;
Query2: SELECT query got correct result

        set hive.optimize.skewjoin = false;
        set hive.auto.convert.join = true;
select
                t1.col1
        from
                test1 t1
        join
                test2 t2
        on t1.col1 = t2.col1
        join
                test3 t3
        on t1.col2 = t3.col1
        ;

the execute plan for Query1 names explain1.txt .
the hive execution logs for Query1: SELECT statement names execution1.txt .
the execute plan for the Query2 names explain2.txt .
the hive execution logs for Query2 names execution2.txt .




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to