Navis created HIVE-3286: --------------------------- Summary: Explicit skew join on user provided condition Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Minor
Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 < 100, a.key < 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= a.key < 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira