[
https://issues.apache.org/jira/browse/HIVE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810501#comment-13810501
]
Laljo John Pullokkaran commented on HIVE-5709:
----------------------------------------------
This must be ideally a cost based decision. Pulling one join key out and
applying it as filter has following consequences:
Pro:
1. It saves one shuffling cost
Con:
1. Degree of parallelism may be reduced. Since partitioning of mapper's result
set is based on join key.
hf(a,b) != hf(a)
2. The intermediate result set may be large when some join keys are pushed
above join as filter.
Due to above factors it seems like this should be a cost based decision.
> Extend Join merging logic to merge 2 Joins when one Join expression list is a
> subset of the other.
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-5709
> URL: https://issues.apache.org/jira/browse/HIVE-5709
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Harish Butani
>
> As pointed out by [~ashutoshc] here: https://reviews.apache.org/r/14953/
> For the following query
> {noformat}
> select p1.name, p2.name, p3.name
> from part p1 join p2 on p1.name = p2.name and p1.key = p2.key join
> part p3 on p1.name = p3.name
> {noformat}
> 2 jobs are generated:
> - p1 join p2 on name, key
> - join p3 on name
> This can be done as:
> - 1 3-way join of p1,p2,p3 on name
> - followed by a Filter on p1.key = p2.key
> This is valid only for inner joins.
> This can be done by extending the Merge Join logic to check for a subset
> relation between 2 QBJoinTree expression lists.
--
This message was sent by Atlassian JIRA
(v6.1#6144)