[
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556513#comment-13556513
]
Ashutosh Chauhan commented on HIVE-2340:
----------------------------------------
Yeah, correct JOIN-GBY and GBY-GBY are taken care of in ysmart also. Its the
group-by followed by order-by case which is also of interest to me, which this
already covers.
Besides the scenario covered by these two patches, I am also comparing the
approaches taken in these two. I have just briefly looked at this patch, but
fundamental difference which I can make out in this approach Vs ysmart approach
is that here RS is deduplicated that is completely removed from operator
pipeline, wherever it could be (i.e. when keys of subsequent RS is superset of
the earlier one) thus fusing multiple MR jobs. Ysmart on the other hand instead
replaces the second RS with a new operator its introducing
(LocalSimulatedReduceSink?) which fakes the RS but doesn't let the plan split
in 2 MR jobs and thus generating one MR job. I haven't thought through
completely on this, but on initial pass it seems like approach of this patch is
better than ysmart because:
* Here you don't need a new operator.
* Here you are simplifying the plan by eliminating the operators as oppose to
ysmart which is replacing the operator thereby increasing the complexity of
plan (by having a new type of operator)
* In that new operator ysmart currently serializes and deserializes the data
through that operator, thereby unnecessarily introducing performance penalty.
Granted this could be improved, but problem doesn't exist in patch proposed on
this jira to begin with.
Though there are certainly other scenarios which ysmart can cover (Yin, can you
list those) which this patch is not covering, but for the scenarios that are
common this approach seems to be better.
There might be other differences in the approach, please feel free to raise
those.
> optimize orderby followed by a groupby
> --------------------------------------
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
> Issue Type: Sub-task
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY
> optimizer(cluster-by following group-by).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira