[ https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402923#comment-13402923 ]
Dmitriy V. Ryaboy commented on PIG-2774: ---------------------------------------- Generating non-standard splits can get tricky in the solution Thejas proposed.. Also I'd like to avoid having the user encode these details in the pig script. > Fix merge join to work with many duplicate left keys > ---------------------------------------------------- > > Key: PIG-2774 > URL: https://issues.apache.org/jira/browse/PIG-2774 > Project: Pig > Issue Type: Bug > Reporter: Aneesh Sharma > > A merge join can throw an OOM error if the number of duplicate left tuples is > large as it accumulates all of them in memory. There are two solutions around > this problem: > 1. Serialize the accumulated tuples to disk if they exceed a certain size. > 2. Spit out join output periodically, and re-seek on the right hand side > index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira