[jira] [Commented] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

Hive QA (JIRA) Fri, 04 Jul 2014 04:02:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052353#comment-14052353
 ]


Hive QA commented on HIVE-7045:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12654053/HIVE-7045.1.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5676 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/679/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/679/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-679/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12654053

> Wrong results in multi-table insert aggregating without group by clause
> -----------------------------------------------------------------------
>
>                 Key: HIVE-7045
>                 URL: https://issues.apache.org/jira/browse/HIVE-7045
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.12.0
>            Reporter: dima machlin
>            Assignee: Navis
>            Priority: Blocker
>         Attachments: HIVE-7045.1.patch.txt
>
>
> This happens whenever there are more than 1 reducers.
> The scenario :
> CREATE  TABLE t1 (a int, b int);
> CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);
> insert into table t1 select 1,1 from asd limit 1;
> insert into table t1 select 2,2 from asd limit 1;
> t1 contains :
> 1 1
> 2 2
> from  t1
> insert overwrite table t2 partition(var_name='a') select count(a) cnt 
> insert overwrite table t2 partition(var_name='b') select count(b) cnt ;
> select * from t2;
> returns : 
> 2 a
> 2 b
> as expected.
> Setting the number of reducers higher than 1 :
> set mapred.reduce.tasks=2;
> from  t1
> insert overwrite table t2 partition(var_name='a') select count(a) cnt
> insert overwrite table t2 partition(var_name='b') select count(b) cnt;
> select * from t2;
> 1     a
> 1     a
> 1     b
> 1     b
> Wrong results.
> This happens when ever t1 is big enough to automatically generate more than 1 
> reducers and without specifying it directly.
> adding "group by 1" in the end of each insert solves the problem :
> from  t1
> insert overwrite table t2 partition(var_name='a') select count(a) cnt group 
> by 1
> insert overwrite table t2 partition(var_name='b') select count(b) cnt group 
> by 1;
> generates : 
> 2 a
> 2 b
> This should work without the group by...
> The number of rows for each partition will be the amount of reducers.
> Each reducer calculated a sub total of the count.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

Reply via email to