[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420008#comment-13420008 ]
Namit Jain commented on HIVE-3286: ---------------------------------- @Navis, can you explain the semantics of the above grammar ? What doe SKEWED BY, DISTRIBUTE BY imply ? Also, in the base case: select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 < 100, a.key < 150); are you expecting skewed keys for key <= 49. Is it true that the skewed keys will only be handled by reducers ? If yes, why would it reduce the execution time ? The main advantage should be that reducer wont get any other key, so wont be burdened. Is that the idea ? > Explicit skew join on user provided condition > --------------------------------------------- > > Key: HIVE-3286 > URL: https://issues.apache.org/jira/browse/HIVE-3286 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Navis > Assignee: Navis > Priority: Minor > > Join operation on table with skewed data takes most of execution time > handling the skewed keys. But mostly we already know about that and even know > what is look like the skewed keys. > If we can explicitly assign reducer slots for the skewed keys, total > execution time could be greatly shortened. > As for a start, I've extended join grammar something like this. > {code} > select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 > < 100, a.key < 150); > {code} > which means if above query is executed by 20 reducers, one reducer for > a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= > a.key < 150, and 17 reducers for others (could be extended to assign more > than one reducer later) > This can be only used with common-inner-equi joins. And skew condition should > be composed of join keys only. > Work till done now will be updated shortly after code cleanup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira