[ https://issues.apache.org/jira/browse/HIVE-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920505#comment-13920505 ]
Navis commented on HIVE-6520: ----------------------------- MapJoinOperator cannot handle skew join, which should know the total number of a join key. We can disable converting to MapJoin when it's for skewjoin. But If it can be converted MapJoin, it would be faster than doing it in classical skew join. > Skew Join optimization doesn't work if parent gets converted to MapJoin task > ---------------------------------------------------------------------------- > > Key: HIVE-6520 > URL: https://issues.apache.org/jira/browse/HIVE-6520 > Project: Hive > Issue Type: Bug > Affects Versions: 0.11.0 > Reporter: Ankit Kamboj > > Skew join optimization (GenMRSkewJoinProcessor.java) assumes that its parent > stage(that will create directory structure for skewed keys) will have a > Reduce Join Operator. GenMRSkewJoinProcessor sets the "handleSkewJoin" flag > only in that case. > But it is possible that parent stage gets converted to MapJoin task (because > of hive.auto.convert.join flag). In that case "handleSkewJoin" is not set for > parent stage and it will not create directory structure for skewed keys in > hdfs. This eventually leads to elimination of skew join conditional task (and > its children) because the conditional task is not able to find the skewed key > directories. > Shouldn't the MapJoinOperator also handle skew join and create directory > structure for skewed keys in addition to performing map join for the > non-skewed keys? -- This message was sent by Atlassian JIRA (v6.2#6252)