[
https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427239#comment-13427239
]
Namit Jain commented on HIVE-3086:
----------------------------------
@Yongqiang, the current skew join does the optimization after most of the
damage has already been done.
The reducer detects that a particular key is skewed, and then processes that
key in a separate MR job.
However, in this approach, we are planning to know about the skewed keys before
hand (stored in the metastore),
and then use them to do a map-join for the skewed keys and a normal join for
the other keys. This does require
some change from the user (the user needs to store the skewed keys in the
metastore). However, this approach can
be very good for repetitive workloads - similar queries running every day for
similar data. Most probably, the skew
does not change every day. The skew can be calculated periodically.
> Skewed Join Optimization
> ------------------------
>
> Key: HIVE-3086
> URL: https://issues.apache.org/jira/browse/HIVE-3086
> Project: Hive
> Issue Type: New Feature
> Reporter: Nadeem Moidu
> Assignee: Nadeem Moidu
>
> During a join operation, if one of the columns has a skewed key, it can cause
> that particular reducer to become the bottleneck. The following feature will
> address it:
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira