Navis created HIVE-3286:
---------------------------
Summary: Explicit skew join on user provided condition
Key: HIVE-3286
URL: https://issues.apache.org/jira/browse/HIVE-3286
Project: Hive
Issue Type: Improvement
Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Minor
Join operation on table with skewed data takes most of execution time handling
the skewed keys. But mostly we already know about that and even know what is
look like the skewed keys.
If we can explicitly assign reducer slots for the skewed keys, total execution
time could be greatly shortened.
As for a start, I've extended join grammar something like this.
{code}
select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 <
100, a.key < 150);
{code}
which means if above query is executed by 20 reducers, one reducer for a.key+1
< 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= a.key < 150,
and 17 reducers for others (could be extended to assign more than one reducer
later)
This can be only used with common-inner-equi joins. And skew condition should
be composed of join keys only.
Work till done now will be updated shortly after code cleanup.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira