Ping Lu created HIVE-13265:
------------------------------
Summary: Query consists of union all and mapjoin, throw Exception
“Unable to deserialize reduce input key”
Key: HIVE-13265
URL: https://issues.apache.org/jira/browse/HIVE-13265
Project: Hive
Issue Type: Bug
Affects Versions: 0.13.1
Environment: Hadoop2.4.0 Hive0.13.1
Reporter: Ping Lu
Steps to reproduce
Prepare:
create four test tables and load data
create table tmp_test1(col1 string);
create table tmp_test2(col1 string);
create table tmp_test3(col1 string,col2 string) row format delimited
fields terminated by "\t";
create table tmp_test4(col1 string);
load data local inpath "test3" into table tmp_test1; // 6 rows
load data local inpath "test3" into table tmp_test2; // 5 rows
load data local inpath "test3" into table tmp_test3; // 6 rows
load data local inpath "test4" into table tmp_test4; // 3000011 rows,
26670421Byte(>25M)
Query1: error encountered while executing
set hive.auto.convert.join=true;
select
sq.col1,
count(distinct sq.col2) num
from(
select
col1,
null col2
from
tmp_test1
union all
select
col1,
null col2
from
tmp_test2
union all
select
col1,
col2
from
tmp_test3
)sq --sq'size is far smaller than 25M
join
tmp_test4 ta
ON sq.col1 = ta.col1
group by sq.col1;
when set hive.auto.convert.join to true, join was converted to MapJoin and
sq was chosen as the small table.
Query2: SELECT query got correct result
set hive.auto.convert.join=false;
select
sq.col1,
count(distinct sq.col2) num
from(
select
col1,
null col2
from
tmp_test1
union all
select
col1,
null col2
from
tmp_test2
union all
select
col1,
col2
from
tmp_test3
)sq
join
tmp_test4 ta
ON sq.col1 = ta.col1
group by sq.col1;
the execute plan for Query1 names explain1.txt .
the hive execution logs for Query1: SELECT statement names execution1.txt .
the execute plan for the Query2 names explain2.txt .
the hive execution logs for Query2 names execution2.txt .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)