Navis created HIVE-3286:
---------------------------

             Summary: Explicit skew join on user provided condition
                 Key: HIVE-3286
                 URL: https://issues.apache.org/jira/browse/HIVE-3286
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.10.0
            Reporter: Navis
            Assignee: Navis
            Priority: Minor


Join operation on table with skewed data takes most of execution time handling 
the skewed keys. But mostly we already know about that and even know what is 
look like the skewed keys.

If we can explicitly assign reducer slots for the skewed keys, total execution 
time could be greatly shortened.

As for a start, I've extended join grammar something like this.
{code}
select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 < 
100, a.key < 150);
{code}

which means if above query is executed by 20 reducers, one reducer for a.key+1 
< 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= a.key < 150, 
and 17 reducers for others (could be extended to assign more than one reducer 
later)

This can be only used with common-inner-equi joins. And skew condition should 
be composed of join keys only.

Work till done now will be updated shortly after code cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to