Yongzhi Chen created HIVE-10880:
-----------------------------------
Summary: The bucket number is not respected in insert overwrite.
Key: HIVE-10880
URL: https://issues.apache.org/jira/browse/HIVE-10880
Project: Hive
Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Priority: Blocker
When hive.enforce.bucketing is true, the bucket number defined in the table is
no longer respected in current master and 1.2. This is a regression.
Reproduce:
{noformat}
CREATE TABLE IF NOT EXISTS buckettestinput(
data string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput1(
data string
)CLUSTERED BY(data)
INTO 2 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
CREATE TABLE IF NOT EXISTS buckettestoutput2(
data string
)CLUSTERED BY(data)
INTO 2 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then I inserted the following data into the "buckettestinput" table
firstinsert1
firstinsert2
firstinsert3
firstinsert4
firstinsert5
firstinsert6
firstinsert7
firstinsert8
secondinsert1
secondinsert2
secondinsert3
secondinsert4
secondinsert5
secondinsert6
secondinsert7
secondinsert8
set hive.enforce.bucketing = true;
set hive.enforce.sorting=true;
insert overwrite table buckettestoutput1
select * from buckettestinput where data like 'first%';
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of
buckets for table buckettestoutput1 is 2, whereas the number of files is 1
(state=42000,code=10141)
{noformat}
The related debug information related to insert overwrite:
{noformat}
0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1
select * from buckettestinput where data like 'first%'insert overwrite table
buckettestoutput1
0: jdbc:hive2://localhost:10000> ;
select * from buckettestinput where data like '
first%';
INFO : Number of reduce tasks determined at compile time: 2
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapred.reduce.tasks=<number>
INFO : Job running in-process (local Hadoop)
INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100%
INFO : Ended Job = job_local107155352_0001
INFO : Loading data to table default.buckettestoutput1 from
file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4,
totalSize=52, rawDataSize=48]
No rows affected (1.692 seconds)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)