[
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894586#action_12894586
]
Thejas M Nair commented on PIG-1516:
------------------------------------
The core and contrib tests pass on my machine.
The release audit warning is about javadoc html files.
Patch is ready for review.
> finalize in bag implementations causes pig to run out of memory in reduce
> --------------------------------------------------------------------------
>
> Key: PIG-1516
> URL: https://issues.apache.org/jira/browse/PIG-1516
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1516.2.patch, PIG-1516.patch
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have
> finalize methods implemented. As a result, the garbage collector moves them
> to a finalization queue, and the memory used is freed only after the
> finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the
> finalization queue, and pig runs out of memory. This can happen if large
> number of small bags are being created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that
> are created when the bag is too large. But if the bags are small enough, no
> spill files are created, and there is no use of the finalize function.
> A new class that holds a list of files will be introduced (FileList). This
> class will have a finalize method that deletes the files. The bags will no
> longer have finalize methods, and the bags will use FileList instead of
> ArrayList<File>.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as
> there will not be the stage of combining intermediate merge results. But I
> would recommend disabling it only if you have this problem as it is likely to
> slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.