Ping Lu created HIVE-13265:
------------------------------

             Summary: Query consists of union all and mapjoin, throw Exception 
“Unable to deserialize reduce input key”
                 Key: HIVE-13265
                 URL: https://issues.apache.org/jira/browse/HIVE-13265
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.13.1
         Environment: Hadoop2.4.0 Hive0.13.1
            Reporter: Ping Lu


Steps to reproduce
Prepare: 
create four test tables and load data 
        create table tmp_test1(col1 string);
        create table tmp_test2(col1 string);
        create table tmp_test3(col1 string,col2 string) row format delimited 
fields terminated by "\t";  
        create table tmp_test4(col1 string);
load data local inpath "test3" into table tmp_test1;  // 6 rows
load data local inpath "test3" into table tmp_test2;  // 5 rows
load data local inpath "test3" into table tmp_test3;  // 6 rows
load data local inpath "test4" into table tmp_test4;  // 3000011 rows, 
26670421Byte(>25M)

Query1: error encountered while executing
set hive.auto.convert.join=true;
select
    sq.col1,
    count(distinct sq.col2) num
from(
    select
        col1,
        null col2
    from
        tmp_test1
    union all
    select
        col1,
        null col2
    from
        tmp_test2
    union all
    select
        col1,
        col2
    from
        tmp_test3
)sq --sq'size is far smaller than 25M
join
    tmp_test4 ta
ON sq.col1 = ta.col1
group by sq.col1;
    when set hive.auto.convert.join to true, join was converted to MapJoin and 
sq was chosen as the small table.

Query2: SELECT query got correct result
set hive.auto.convert.join=false;
select
    sq.col1,
    count(distinct sq.col2) num
from(
    select
        col1,
        null col2
    from
        tmp_test1
    union all
    select
        col1,
        null col2
    from
        tmp_test2
    union all
    select
        col1,
        col2
    from
        tmp_test3
)sq
join
    tmp_test4 ta
ON sq.col1 = ta.col1
group by sq.col1; 

the execute plan for Query1 names explain1.txt .
the hive execution logs for Query1: SELECT statement names execution1.txt .
the execute plan for the Query2 names explain2.txt .
the hive execution logs for Query2 names execution2.txt .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to