[ https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834784#action_12834784 ]
Namit Jain commented on HIVE-1158: ---------------------------------- +1 0.5 patch looks good - will commit if the tests pass > Introducing a new parameter for Map-side join bucket size > --------------------------------------------------------- > > Key: HIVE-1158 > URL: https://issues.apache.org/jira/browse/HIVE-1158 > Project: Hadoop Hive > Issue Type: Improvement > Affects Versions: 0.5.0, 0.6.0 > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch > > > Map-side join cache the small table in memory and join with the split of the > large table at the mapper side. If the small table is too large, it uses > RowContainer to cache a number of rows indicated by parameter > hive.join.cache.size, whose default value is 25000. This parameter is also > used for regular reducer-side joins to cache all input tables except the > streaming table. This default value is too large for map-side join bucket > size, resulting in OOM exceptions sometimes. We should define a different > parameter to separate these two cache sizes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.