[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Ning Zhang (JIRA) Wed, 17 Feb 2010 00:20:52 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ning Zhang updated HIVE-1158:
-----------------------------

    Attachment: HIVE-1158_branch_0_5.patch

Uploading HIVE-1158_branch_0_5.patch for branch 0.5. This patch includes 
changes pulled from other patches in trunk to make the packport possible. 

Still running unit tests, but it seems all relavent tests have passed. I will 
update the test results once they are done. 

> Introducing a new parameter for Map-side join bucket size
> ---------------------------------------------------------
>
>                 Key: HIVE-1158
>                 URL: https://issues.apache.org/jira/browse/HIVE-1158
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Reply via email to