[
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608
]
Yin Huai commented on HIVE-2340:
--------------------------------
Let me explain the reason that I introduced the fake RS operator instead of
just removing the original RS. When I was developing the patch for 2206, I
found that the aggregation operator (GBY) and the join operator (JOIN) use
different logic on processing rows forwarded to it. Although they both buffer
rows, a GBY determines if it need to forward results to its children in
processOp. While, a JOIN replies on endGroup to know when it should forward
results. When we have plans like GBY-GBY or JOIN-GBY, that difference on
processing logic is fine. However, when we have plan like
{code}
GBY---- GBY----
\ \
----JOIN or ----JOIN
/ /
GBY---- JOIN---
{code}
We need operators between the child JOIN and parent GBYs and JOINs to make sure
JOIN process rows in a correct way. This is also the reason that in
CorrelationLocalSimulativeReduceSinkOperator, it determines when to start the
group of its children in processOp and leave a empty startGroup and endGroup.
Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs
and JOINs which will be merged into the same Reduce phase. Since the input of
the first operator in the Reduce side is in the format of [key, value, tag], so
I use those fake RSs to generate rows in the same format.
But this part of work was implemented about almost 2 years ago. Definitely let
me know if anything has been changed and this fake RS is no longer needed.
> optimize orderby followed by a groupby
> --------------------------------------
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
> Issue Type: Sub-task
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY
> optimizer(cluster-by following group-by).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira