[ https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403468#comment-13403468 ]
Dmitriy V. Ryaboy commented on PIG-2774: ---------------------------------------- I like the first paragraph of what you said; the second is more applicable to skew join (reduce side) than map join (map side), I think. With a mapside join, we might have other operations queued up after the join happening on the same mapper, and tracing through separate split files will get unnecessarily complicated. > Fix merge join to work with many duplicate left keys > ---------------------------------------------------- > > Key: PIG-2774 > URL: https://issues.apache.org/jira/browse/PIG-2774 > Project: Pig > Issue Type: Bug > Reporter: Aneesh Sharma > > A merge join can throw an OOM error if the number of duplicate left tuples is > large as it accumulates all of them in memory. There are two solutions around > this problem: > 1. Serialize the accumulated tuples to disk if they exceed a certain size. > 2. Spit out join output periodically, and re-seek on the right hand side > index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira