Adrian Popescu created HIVE-6041:
------------------------------------
Summary: Incorrect task dependency graph for skewed join
optimization
Key: HIVE-6041
URL: https://issues.apache.org/jira/browse/HIVE-6041
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.11.0
Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Priority: Critical
The dependency graph among task stages is incorrect for the skewed join
optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". For
the case that skewed keys do not exist, all tasks following the common join are
filtered out.
In particular, the conditional task in the optimized plan maintains no
dependency with the child tasks of the common join task in the original plan.
The conditional task is composed of the map join task which
maintains all these dependencies, but for the case the map join task is
filtered out (i.e., no skewed keys exist), all these dependencies are lost.
Hence, all the other task stages of the query are skipped.
The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java",
processSkewJoin() function,
immediately after the ConditionalTask is created and its dependencies are set.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)