Aneesh Sharma created PIG-2774: ---------------------------------- Summary: Fix merge join to work with many duplicate left keys Key: PIG-2774 URL: https://issues.apache.org/jira/browse/PIG-2774 Project: Pig Issue Type: Bug Reporter: Aneesh Sharma
A merge join can throw an OOM error if the number of duplicate left tuples is large as it accumulates all of them in memory. There are two solutions around this problem: 1. Serialize the accumulated tuples to disk if they exceed a certain size. 2. Spit out join output periodically, and re-seek on the right hand side index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira