[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2014-07-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052987#comment-14052987
 ] 

Lefty Leverenz commented on HIVE-4002:
--

*hive.fetch.task.aggr* is documented in the wiki here:

* [Configuration Properties -- hive.fetch.task.aggr | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.fetch.task.aggr]

Also see doc comments on HIVE-5793 (Update hive-default.xml.template for 
HIVE-4002).

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756015#comment-13756015
 ] 

Hudson commented on HIVE-4002:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2303 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2303/])
HIVE-4002 Fetch task aggregation for simple group by query (Navis Ryu and Yin 
Huai via egc) (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519306)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/fetch_aggregation.q
* /hive/trunk/ql/src/test/results/clientpositive/fetch_aggregation.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755785#comment-13755785
 ] 

Hive QA commented on HIVE-4002:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600993/HIVE-4002.patch

{color:green}SUCCESS:{color} +1 2903 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/588/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/588/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755790#comment-13755790
 ] 

Edward Capriolo commented on HIVE-4002:
---

+1. With all tests passing, we should commit. If we get caught up in another 
re-base it could be weeks before we get it all settled out again. This feature 
is off by default, so if there is an issue with it we can tackle it in a follow 
up.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755821#comment-13755821
 ] 

Hudson commented on HIVE-4002:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #80 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/80/])
HIVE-4002 Fetch task aggregation for simple group by query (Navis Ryu and Yin 
Huai via egc) (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519306)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/fetch_aggregation.q
* /hive/trunk/ql/src/test/results/clientpositive/fetch_aggregation.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755850#comment-13755850
 ] 

Hudson commented on HIVE-4002:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #147 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/147/])
HIVE-4002 Fetch task aggregation for simple group by query (Navis Ryu and Yin 
Huai via egc) (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519306)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/fetch_aggregation.q
* /hive/trunk/ql/src/test/results/clientpositive/fetch_aggregation.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-09-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755860#comment-13755860
 ] 

Hudson commented on HIVE-4002:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #395 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/395/])
HIVE-4002 Fetch task aggregation for simple group by query (Navis Ryu and Yin 
Huai via egc) (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519306)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/fetch_aggregation.q
* /hive/trunk/ql/src/test/results/clientpositive/fetch_aggregation.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755530#comment-13755530
 ] 

Edward Capriolo commented on HIVE-4002:
---

[~yhuai][~navis] Are you two discussing possible revisions or is this patch 
ready to be committed?

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751330#comment-13751330
 ] 

Phabricator commented on HIVE-4002:
---

yhuai has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think 
that flush is only needed for blocking operators. With this optimization, the 
operator tree in the fetch task seems only have a single blocking operator 
which is GBY. Since GBY is the first operator in the fetch task (the operator 
shown in flush() in this class), I do not think we need to call all operators 
in the operator tree. Is that possible GBY is not the first operator?
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 there 
are other places where we are using colInfo.getInternalName(). I think it is 
better to also change those places if we want to use field.
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 Let's say we 
have a chain of operators OP1-OP2-OP3. With this change, when flush in OP1 is 
called, it will call its flushOp and then call flushOp in OP2. Seems flush or 
flushOp in OP3 will never be called. Also, when I introduced flush with 
Correlation Optimizer, this method was not designed to propagate the signal to 
its children.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751333#comment-13751333
 ] 

Phabricator commented on HIVE-4002:
---

yhuai has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 I did not mean 
we cannot have a recursive flush method. I meant that Demux and Mux operators 
should not use a recursive flush method.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750638#comment-13750638
 ] 

Phabricator commented on HIVE-4002:
---

yhuai has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Seems 
that this line is the same as the line 3633
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 Why do 
we need to change getInternalName to field? If we want to use field instead of 
getInternalName, can you also make this to other places of this class?
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 why do we need 
flushOp? I think it is not necessary to have flushOp. Also, can you change an 
blocking operator to a blocking operator? I am sorry about the typo I made...
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think we 
can just use operator.flush() to tell GBY to process its buffer.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750945#comment-13750945
 ] 

Phabricator commented on HIVE-4002:
---

navis has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Right. 
I'll fix that.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 It's 
the same thing. I just want to be more consistent.
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 I need recursive 
flush method for implementing this, like what init or close method does. I 
think I've broken something rebasing the patch. Can I ask what query was not 
working with this patch? Test framework seemed not working recently.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 Flush 
should be called to all operators in execution tree, for this patch.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749714#comment-13749714
 ] 

Edward Capriolo commented on HIVE-4002:
---

{quote}
[edward@jackintosh hive-trunk]$ patch -p0  D8739\?download\=true 
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
Hunk #3 succeeded at 119 (offset 9 lines).
Hunk #4 succeeded at 679 (offset 26 lines).
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Hunk #1 succeeded at 3503 (offset -19 lines).
Hunk #2 succeeded at 3609 (offset -19 lines).
Hunk #3 succeeded at 3622 (offset -19 lines).
Hunk #4 succeeded at 3634 (offset -19 lines).
Hunk #5 succeeded at 3684 (offset -19 lines).
Hunk #6 succeeded at 3713 (offset -19 lines).
Hunk #7 succeeded at 3820 (offset -19 lines).
Hunk #8 succeeded at 6964 (offset -18 lines).
Hunk #9 succeeded at 6990 (offset -18 lines).
patching file ql/src/test/queries/clientpositive/fetch_aggregation.q
patching file ql/src/test/results/clientpositive/fetch_aggregation.q.out
patching file ql/src/test/results/compiler/plan/groupby1.q.xml
Hunk #5 succeeded at 1312 (offset -10 lines).
Hunk #6 succeeded at 1326 (offset -10 lines).
Hunk #7 succeeded at 1345 (offset -10 lines).
Hunk #8 succeeded at 1426 (offset -10 lines).
Hunk #9 succeeded at 1478 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby2.q.xml
Hunk #10 succeeded at 1087 (offset -10 lines).
Hunk #11 succeeded at 1428 (offset -10 lines).
Hunk #12 succeeded at 1482 (offset -10 lines).
Hunk #13 succeeded at 1508 (offset -10 lines).
Hunk #14 succeeded at 1541 (offset -10 lines).
Hunk #15 succeeded at 1618 (offset -10 lines).
Hunk #16 succeeded at 1647 (offset -10 lines).
Hunk #17 succeeded at 1715 (offset -10 lines).
Hunk #18 succeeded at 1734 (offset -10 lines).
Hunk #19 succeeded at 1819 (offset -10 lines).
Hunk #20 succeeded at 1832 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby3.q.xml
Hunk #8 succeeded at 1299 (offset -7 lines).
Hunk #9 succeeded at 1627 (offset -7 lines).
Hunk #10 succeeded at 1640 (offset -7 lines).
Hunk #11 succeeded at 1653 (offset -7 lines).
Hunk #12 succeeded at 1695 (offset -7 lines).
Hunk #13 succeeded at 1709 (offset -7 lines).
Hunk #14 succeeded at 1723 (offset -7 lines).
Hunk #15 succeeded at 1770 (offset -7 lines).
Hunk #16 succeeded at 1846 (offset -7 lines).
Hunk #17 succeeded at 1859 (offset -7 lines).
Hunk #18 succeeded at 1872 (offset -7 lines).
Hunk #19 succeeded at 1938 (offset -7 lines).
Hunk #20 succeeded at 2144 (offset -7 lines).
Hunk #21 succeeded at 2157 (offset -7 lines).
Hunk #22 succeeded at 2170 (offset -7 lines).
patching file ql/src/test/results/compiler/plan/groupby5.q.xml
Hunk #5 succeeded at 1175 (offset -10 lines).
Hunk #6 succeeded at 1189 (offset -10 lines).
Hunk #7 succeeded at 1208 (offset -10 lines).
Hunk #8 succeeded at 1295 (offset -10 lines).
Hunk #9 succeeded at 1347 (offset -10 lines).
patching file serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

{quote}

THis did not patch perfectly clean. Running test now manually.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF 

[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-25 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749766#comment-13749766
 ] 

Yin Huai commented on HIVE-4002:


[~appodictic] Sorry for jumping in late. Seems changes in DemuxOperator and 
MuxOperator will break plans optimized by Correlation Optimizer. Let me take a 
look and leave my comments on phabricator.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748576#comment-13748576
 ] 

Edward Capriolo commented on HIVE-4002:
---

+1 this is a very exciting feature. Will commit when tests pass.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-07-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723315#comment-13723315
 ] 

Edward Capriolo commented on HIVE-4002:
---

[~navis]Sorry I dropped the ball on this review. Can you rebase?

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-07-08 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702865#comment-13702865
 ] 

Navis commented on HIVE-4002:
-

Yes, some threshold might be more useful. 

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-07-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702867#comment-13702867
 ] 

Edward Capriolo commented on HIVE-4002:
---

Testing now. The threshold can be a follow on. I will do a more critical review 
in the next couple of days.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-07-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701661#comment-13701661
 ] 

Edward Capriolo commented on HIVE-4002:
---

This is a nice feature. There are times when I know that count(distinct(col)) 
and other operations like the one you have requested produce small result sets 
and the shuffle is the bottleneck. I do like this feature but turning it on 
manually is cumbersome for the end user.

I wonder if we can convert the last step at runtime somehow.(Probably not 
easily but that would be nice)

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira