[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016871#comment-13016871
 ] 

Liyin Tang commented on HIVE-2095:
----------------------------------

I will take a look

> auto convert map join bug
> -------------------------
>
>                 Key: HIVE-2095
>                 URL: https://issues.apache.org/jira/browse/HIVE-2095
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to