Introducing a new parameter for Map-side join bucket size
---------------------------------------------------------

                 Key: HIVE-1158
                 URL: https://issues.apache.org/jira/browse/HIVE-1158
             Project: Hadoop Hive
          Issue Type: Improvement
    Affects Versions: 0.5.0, 0.6.0
            Reporter: Ning Zhang
            Assignee: Ning Zhang


Map-side join cache the small table in memory and join with the split of the 
large table at the mapper side. If the small table is too large, it uses 
RowContainer to cache a number of rows indicated by parameter 
hive.join.cache.size, whose default value is 25000. This parameter is also used 
for regular reducer-side joins to cache all input tables except the streaming 
table. This default value is too large for map-side join bucket size, resulting 
in OOM exceptions sometimes. We should define a different parameter to separate 
these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to