Aneesh Sharma created PIG-2774:
----------------------------------
Summary: Fix merge join to work with many duplicate left keys
Key: PIG-2774
URL: https://issues.apache.org/jira/browse/PIG-2774
Project: Pig
Issue Type: Bug
Reporter: Aneesh Sharma
A merge join can throw an OOM error if the number of duplicate left tuples is
large as it accumulates all of them in memory. There are two solutions around
this problem:
1. Serialize the accumulated tuples to disk if they exceed a certain size.
2. Spit out join output periodically, and re-seek on the right hand side index.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira