[jira] Commented: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Namit Jain (JIRA) Wed, 17 Feb 2010 06:47:50 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834784#action_12834784
 ]


Namit Jain commented on HIVE-1158:
----------------------------------

+1

0.5 patch looks good - will commit if the tests pass

> Introducing a new parameter for Map-side join bucket size
> ---------------------------------------------------------
>
>                 Key: HIVE-1158
>                 URL: https://issues.apache.org/jira/browse/HIVE-1158
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Reply via email to