[
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439418#comment-13439418
]
Namit Jain commented on HIVE-3403:
----------------------------------
If the input tables are bucketed/sorted on the join keys, and we convert
the join to sort-merge bucketed map join, the only drawback is that we will
use BucketizedHiveInputFormat (or one mapper per file), which may restrict the
parallelism if the input files are large.
In most of the cases, the advantages of sort merge join (and no reducer) should
overweigh the reduced parallelism. I can add a configurable parameter for this
behavior to be triggered, if need be.
In general, it seems a good idea to automatically use sort-merge join whenever
possible.
> user should not specify mapjoin to perform sort-merge bucketed join
> -------------------------------------------------------------------
>
> Key: HIVE-3403
> URL: https://issues.apache.org/jira/browse/HIVE-3403
> Project: Hive
> Issue Type: Bug
> Reporter: Namit Jain
> Assignee: Namit Jain
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the
> mapjoin hint.
> The user should not specify any hints.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira