[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420100#comment-13420100
 ] 

Navis commented on HIVE-3286:
-----------------------------

This is for assigning some number of reducers exclusively for a key (or group 
of keys). 

"SKEWED BY 30 PERCENT" means if the total number of reducer for MR is 20, hive 
assign 20*0.3=6 reducers for the group. If not specified, one reducer is 
assigned for that group. For above example, resultant partition number of group 
1 is distributed in the range of 12~17, group 2 is 18, group 3 is 19, and 
remaining keys are distributed in the range of 0~11.

"DISTRIBUTED BY a.key-1" means if partition range is more than 1(like group 1), 
distribution in the range(12~17) is based on hash of evaluated value by the 
expression 1.key-1. I think this is not yet enough for real usage.
                
> Explicit skew join on user provided condition
> ---------------------------------------------
>
>                 Key: HIVE-3286
>                 URL: https://issues.apache.org/jira/browse/HIVE-3286
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to