[ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894586#action_12894586 ]
Thejas M Nair commented on PIG-1516: ------------------------------------ The core and contrib tests pass on my machine. The release audit warning is about javadoc html files. Patch is ready for review. > finalize in bag implementations causes pig to run out of memory in reduce > -------------------------------------------------------------------------- > > Key: PIG-1516 > URL: https://issues.apache.org/jira/browse/PIG-1516 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1516.2.patch, PIG-1516.patch > > > *Problem:* > pig bag implementations that are subclasses of DefaultAbstractBag, have > finalize methods implemented. As a result, the garbage collector moves them > to a finalization queue, and the memory used is freed only after the > finalization happens on it. > If the bags are not finalized fast enough, a lot of memory is consumed by the > finalization queue, and pig runs out of memory. This can happen if large > number of small bags are being created. > *Solution:* > The finalize function exists for the purpose of deleting the spill files that > are created when the bag is too large. But if the bags are small enough, no > spill files are created, and there is no use of the finalize function. > A new class that holds a list of files will be introduced (FileList). This > class will have a finalize method that deletes the files. The bags will no > longer have finalize methods, and the bags will use FileList instead of > ArrayList<File>. > *Possible workaround for earlier releases:* > Since the fix is going into 0.8, here is a workaround - > Disabling the combiner will reduce the number of bags getting created, as > there will not be the stage of combining intermediate merge results. But I > would recommend disabling it only if you have this problem as it is likely to > slow down the query . > To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.