[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052987#comment-14052987 ]
Lefty Leverenz commented on HIVE-4002: -------------------------------------- *hive.fetch.task.aggr* is documented in the wiki here: * [Configuration Properties -- hive.fetch.task.aggr | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.fetch.task.aggr] Also see doc comments on HIVE-5793 (Update hive-default.xml.template for HIVE-4002). > Fetch task aggregation for simple group by query > ------------------------------------------------ > > Key: HIVE-4002 > URL: https://issues.apache.org/jira/browse/HIVE-4002 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Fix For: 0.12.0 > > Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, > HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch, HIVE-4002.patch > > > Aggregation queries with no group-by clause (for example, select count(*) > from src) executes final aggregation in single reduce task. But it's too > small even for single reducer because the most of UDAF generates just single > row for map aggregation. If final fetch task can aggregate outputs from map > tasks, shuffling time can be removed. > This optimization transforms operator tree something like, > TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK > into > TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) > With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 > min, before). -- This message was sent by Atlassian JIRA (v6.2#6252)