dima machlin created HIVE-7045: ---------------------------------- Summary: Wrong results in multi-table insert aggregating without group by clause Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.10.0 Reporter: dima machlin
The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding "group by 1" in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)