[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094401#comment-15094401
 ] 

Pengcheng Xiong commented on HIVE-12788:


[~ctang.ma], i assume i already +1 on review board. if that is not enough, +1 
here. Thanks.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-11 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092739#comment-15092739
 ] 

Chaoyu Tang commented on HIVE-12788:


[~pxiong] Thanks for reviewing the patch. Yeah, I agree with you that it is 
ideal that we should optimize every subquery whenever possible, and will look 
into this possibility. But I wonder if we can treat it as a further enhancement 
with a separate JIRA so that here we can at least unblock the critical query 
data incorrect issue? Thanks

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-11 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092720#comment-15092720
 ] 

Chaoyu Tang commented on HIVE-12788:


The failed tests seem not related to the patch.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091878#comment-15091878
 ] 

Hive QA commented on HIVE-12788:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12781454/HIVE-12788.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10004 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6587/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6587/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6587/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12781454 - PreCommit-HIVE-TRUNK-Build

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091171#comment-15091171
 ] 

Pengcheng Xiong commented on HIVE-12788:


[~ctang.ma], I read your reply to my comments and I quoted here
{code}
Could you please be more specific on the reason why "if TS[0] branch for src1 
is not optimized, then there is no need to continue processing TS[6] branch? 
Thanks.
The AST tree of query (select max(value) from src1 union all select max(value) 
from src2) passed to StatsOptimizator after UnionRemove optimization is:
  TS[0]->SEL[1]->GBY[2]-RS[3]->GBY[4]->FS[17]   --- for subquery src1
  TS[6]->SEL[7]->GBY[8]-RS[9]->GBY[10]->FS[18]  --- for subquery src2
It has two top Operators, TS[0] for table src1 and TS[6] for table src2. If the 
TS[0] branch (for subquery src1) is not optimized but TS[6] branch (for 
subquery src2) is, in existing code, TS[6] branch result will be set to 
FetchTask in ParseContext and the entire query is not further compiled into 
MRTasks (in SemanticAnalyzer.analyzeInternal step 9). So the union query will 
return result with only the row from TS[6] (the subquery src2). It is obviously 
not right. So for union query, if any one of its subqueries could not be Stats 
Optimizated, the whole query should not be optimized and fails back to regular 
plan. I wonder if it is a littler clear. Thanks

- Chaoyu
{code}

Did you investigate the possibility to merge the result from a FetchTask and a 
MR task? I would still prefer a solution that can best leverage the stats 
optimizer whenever possible (even only for a branch of a union). [~ashutoshc], 
could you provide some comments on the possibility here? Thanks.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090936#comment-15090936
 ] 

Hive QA commented on HIVE-12788:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12781288/HIVE-12788.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10001 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6573/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6573/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6573/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12781288 - PreCommit-HIVE-TRUNK-Build

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-08 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089867#comment-15089867
 ] 

Chaoyu Tang commented on HIVE-12788:


This patch mainly addresses the StatsOptimization issue in union with aggregate 
function. [~ashutoshc], [~pxiong], [~xuefuz], could you help to review the 
patch? Thanks.

I also created HIVE-12812 to set mapred.input.dir.recursive=true by default in 
Hive to address the above case 2 separately.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-08 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089817#comment-15089817
 ] 

Chaoyu Tang commented on HIVE-12788:


Patch has been uploaded to https://reviews.apache.org/r/42081/ . Thanks for 
review in advanced.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
> Attachments: HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

2016-01-06 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086764#comment-15086764
 ] 

Chaoyu Tang commented on HIVE-12788:


1. When hive.compute.query.using.stats is enabled, the union all with aggregate 
function with union.remove optimization only returns one row, which I think is 
due to an issue in StatsOptimizator and I am working on now.
{code}
set hive.compute.query.using.stats=true;
set hive.optimize.union.remove=true;
hive> Select count(*) as scount from default.sample02 union all Select count(*) 
as scount from default.sample01;
OK
723
{code}
2. When hive.compute.query.using.stats is disabled, you have to set 
mapred.input.dir.recursive=true in order to make hive.optimize.union.remove 
work. 
{code}
set hive.compute.query.using.stats=false;
set hive.optimize.union.remove=true;
set mapred.input.dir.recursive=true;
hive> Select count(*) as scount from default.sample02 union all Select count(*) 
as scount from default.sample01;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. tez, spark) 
or using Hive 1.X releases.
Query ID = ctang_20160106151655_c0eb9943-2963-4162-b9f4-c964005bf1a3
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2016-01-06 22:47:52,677 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local51783692_0010
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2016-01-06 22:47:55,278 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_local1194656206_0011
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Stage-Stage-2:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
823
723
Time taken: 8.339 seconds, Fetched: 2 row(s)
{code}
3. With union remove optimization disabled, union all with aggregation function 
always works regardless StatsOptimization is enabled or not since the 
StatsOptimization is not applicable.


> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> 
>
> Key: HIVE-12788
> URL: https://issues.apache.org/jira/browse/HIVE-12788
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Eric Lin
>Assignee: Chaoyu Tang
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:1/default> create table test (a int);
> 0: jdbc:hive2://localhost:1/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:1/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> +--+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT 
> * FROM test;
> ++--+
> | _u1.a  |
> ++--+
> | 1  |
> | 1  |
> ++--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove;
> +---+--+
> |set|
> +---+--+
> | hive.optimize.union.remove=false  |
> +---+--+
> 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +--+--+
> | _u1._c0  |
> +--+--+
> | 1|
> | 1|
> +--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)