[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094401#comment-15094401 ] Pengcheng Xiong commented on HIVE-12788: [~ctang.ma], i assume i already +1 on review board. if that is not enough, +1 here. Thanks. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092739#comment-15092739 ] Chaoyu Tang commented on HIVE-12788: [~pxiong] Thanks for reviewing the patch. Yeah, I agree with you that it is ideal that we should optimize every subquery whenever possible, and will look into this possibility. But I wonder if we can treat it as a further enhancement with a separate JIRA so that here we can at least unblock the critical query data incorrect issue? Thanks > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092720#comment-15092720 ] Chaoyu Tang commented on HIVE-12788: The failed tests seem not related to the patch. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091878#comment-15091878 ] Hive QA commented on HIVE-12788: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12781454/HIVE-12788.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10004 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6587/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6587/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6587/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12781454 - PreCommit-HIVE-TRUNK-Build > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091171#comment-15091171 ] Pengcheng Xiong commented on HIVE-12788: [~ctang.ma], I read your reply to my comments and I quoted here {code} Could you please be more specific on the reason why "if TS[0] branch for src1 is not optimized, then there is no need to continue processing TS[6] branch? Thanks. The AST tree of query (select max(value) from src1 union all select max(value) from src2) passed to StatsOptimizator after UnionRemove optimization is: TS[0]->SEL[1]->GBY[2]-RS[3]->GBY[4]->FS[17] --- for subquery src1 TS[6]->SEL[7]->GBY[8]-RS[9]->GBY[10]->FS[18] --- for subquery src2 It has two top Operators, TS[0] for table src1 and TS[6] for table src2. If the TS[0] branch (for subquery src1) is not optimized but TS[6] branch (for subquery src2) is, in existing code, TS[6] branch result will be set to FetchTask in ParseContext and the entire query is not further compiled into MRTasks (in SemanticAnalyzer.analyzeInternal step 9). So the union query will return result with only the row from TS[6] (the subquery src2). It is obviously not right. So for union query, if any one of its subqueries could not be Stats Optimizated, the whole query should not be optimized and fails back to regular plan. I wonder if it is a littler clear. Thanks - Chaoyu {code} Did you investigate the possibility to merge the result from a FetchTask and a MR task? I would still prefer a solution that can best leverage the stats optimizer whenever possible (even only for a branch of a union). [~ashutoshc], could you provide some comments on the possibility here? Thanks. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090936#comment-15090936 ] Hive QA commented on HIVE-12788: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12781288/HIVE-12788.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10001 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6573/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6573/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6573/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12781288 - PreCommit-HIVE-TRUNK-Build > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089867#comment-15089867 ] Chaoyu Tang commented on HIVE-12788: This patch mainly addresses the StatsOptimization issue in union with aggregate function. [~ashutoshc], [~pxiong], [~xuefuz], could you help to review the patch? Thanks. I also created HIVE-12812 to set mapred.input.dir.recursive=true by default in Hive to address the above case 2 separately. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089817#comment-15089817 ] Chaoyu Tang commented on HIVE-12788: Patch has been uploaded to https://reviews.apache.org/r/42081/ . Thanks for review in advanced. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > Attachments: HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086764#comment-15086764 ] Chaoyu Tang commented on HIVE-12788: 1. When hive.compute.query.using.stats is enabled, the union all with aggregate function with union.remove optimization only returns one row, which I think is due to an issue in StatsOptimizator and I am working on now. {code} set hive.compute.query.using.stats=true; set hive.optimize.union.remove=true; hive> Select count(*) as scount from default.sample02 union all Select count(*) as scount from default.sample01; OK 723 {code} 2. When hive.compute.query.using.stats is disabled, you have to set mapred.input.dir.recursive=true in order to make hive.optimize.union.remove work. {code} set hive.compute.query.using.stats=false; set hive.optimize.union.remove=true; set mapred.input.dir.recursive=true; hive> Select count(*) as scount from default.sample02 union all Select count(*) as scount from default.sample01; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = ctang_20160106151655_c0eb9943-2963-4162-b9f4-c964005bf1a3 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Job running in-process (local Hadoop) 2016-01-06 22:47:52,677 Stage-1 map = 100%, reduce = 100% Ended Job = job_local51783692_0010 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Job running in-process (local Hadoop) 2016-01-06 22:47:55,278 Stage-2 map = 100%, reduce = 100% Ended Job = job_local1194656206_0011 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 723 Time taken: 8.339 seconds, Fetched: 2 row(s) {code} 3. With union remove optimization disabled, union all with aggregation function always works regardless StatsOptimization is enabled or not since the StatsOptimization is not applicable. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Eric Lin >Assignee: Chaoyu Tang > > See the test case below: > {code} > 0: jdbc:hive2://localhost:1/default> create table test (a int); > 0: jdbc:hive2://localhost:1/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:1/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > +--+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT > * FROM test; > ++--+ > | _u1.a | > ++--+ > | 1 | > | 1 | > ++--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove; > +---+--+ > |set| > +---+--+ > | hive.optimize.union.remove=false | > +---+--+ > 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +--+--+ > | _u1._c0 | > +--+--+ > | 1| > | 1| > +--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)