[
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751330#comment-13751330
]
Phabricator commented on HIVE-4002:
-----------------------------------
yhuai has commented on the revision "HIVE-4002 [jira] Fetch task aggregation
for simple group by query".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think
that flush is only needed for blocking operators. With this optimization, the
operator tree in the fetch task seems only have a single blocking operator
which is GBY. Since GBY is the first operator in the fetch task (the operator
shown in flush() in this class), I do not think we need to call all operators
in the operator tree. Is that possible GBY is not the first operator?
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 there
are other places where we are using colInfo.getInternalName(). I think it is
better to also change those places if we want to use field.
ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 Let's say we
have a chain of operators OP1-OP2-OP3. With this change, when flush in OP1 is
called, it will call its flushOp and then call flushOp in OP2. Seems flush or
flushOp in OP3 will never be called. Also, when I introduced flush with
Correlation Optimizer, this method was not designed to propagate the signal to
its children.
REVISION DETAIL
https://reviews.facebook.net/D8739
To: JIRA, navis
Cc: yhuai
> Fetch task aggregation for simple group by query
> ------------------------------------------------
>
> Key: HIVE-4002
> URL: https://issues.apache.org/jira/browse/HIVE-4002
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch,
> HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*)
> from src) executes final aggregation in single reduce task. But it's too
> small even for single reducer because the most of UDAF generates just single
> row for map aggregation. If final fetch task can aggregate outputs from map
> tasks, shuffling time can be removed.
> This optimization transforms operator tree something like,
> TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
> into
> TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
> With the patch, time taken for auto_join_filters.q test reduced to 6 min (10
> min, before).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira