[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Namit Jain (JIRA) Wed, 17 Feb 2010 08:37:51 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Namit Jain updated HIVE-1158:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.5.0
           Status: Resolved  (was: Patch Available)

Committed in 0.5 also. Thanks Ning

> Introducing a new parameter for Map-side join bucket size
> ---------------------------------------------------------
>
>                 Key: HIVE-1158
>                 URL: https://issues.apache.org/jira/browse/HIVE-1158
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0
>
>         Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

Reply via email to