[
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575336#comment-13575336
]
Ashutosh Chauhan commented on HIVE-3403:
----------------------------------------
Thinking more about my point a) above, there are three potential join
optimization opportunities:
a) Convert a JoinOperator to non-bucketed MapJoinOperator.
b) Convert a JoinOperator to bucketed MapJoinOpperator.
c) Convert a JoinOperator to sort-merge-bucketed MapJoinOperator.
Among these c) doesn't need to buffer data in memory, so can be determined
completely at compile time, which this patch enables. a) and b) buffers data in
memory so need to be done at run time. a) is already taken care of in
HIVE-3784.
So, we are left with b) now. With this patch, we will convert a Join Operator
to bucketed MapJoin Operator at compile time by attempting to convert a
map-join operator (which will be there because user provided the hint). But
ideally this should also be done at runtime just like a). At run-time we should
see first if tables are bucketed than check if the size of required buckets of
smaller table can fit in memory and if they do than convert a JoinOperator to
BMJ. If table is not bucketed than check size of whole of small table and than
convert it into non-bucketed map-join. If we do this than we can completely get
rid of map-join hints. If we get there, that will be advantageous to users
since they never have to provide hints in their queries, hive optimizer will
generate most optimal plan possible. It will be advantageous to hive devs since
they will never have to bother about map-join operators in query compilation
phase because map-join operator will never be part of plan at compile time. It
will only appear at run-time if Join Operator is optimized to MapJoin Operator.
This will simplify semantic analysis, plan generation and compile time
optimizations a lot.
Namit, is this analysis correct?
> user should not specify mapjoin to perform sort-merge bucketed join
> -------------------------------------------------------------------
>
> Key: HIVE-3403
> URL: https://issues.apache.org/jira/browse/HIVE-3403
> Project: Hive
> Issue Type: Bug
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: hive.3403.10.patch, hive.3403.11.patch,
> hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch,
> hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch,
> hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch,
> hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch,
> hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch,
> hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch,
> hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch
>
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the
> mapjoin hint.
> The user should not specify any hints.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira