[
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419027#comment-13419027
]
Namit Jain commented on HIVE-3286:
----------------------------------
Navis, Nadeem is already working on this in a different approach
https://cwiki.apache.org/Hive/skewed-join-optimization.html
I am not sure if there is a jira, but I know he is pretty close to getting one
out.
> Explicit skew join on user provided condition
> ---------------------------------------------
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
>
> Join operation on table with skewed data takes most of execution time
> handling the skewed keys. But mostly we already know about that and even know
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <=
> a.key < 150, and 17 reducers for others (could be extended to assign more
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira