[
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892176#action_12892176
]
Ankur commented on PIG-1516:
----------------------------
The solution to have the finalize method AT ALL for the purpose of deleting
files when object is garbage collected is NOT a good one. Generally speaking
using finalizers to release non-memory resources like file handles should be
avoided as it has an insidious bug. From the article on "Object finalization
and Cleanup" - http://www.javaworld.com/jw-06-1998/jw-06-techniques.html
"Don't rely on finalizers to release non-memory resources"
An example of an object that breaks this rule is one that opens a file in its
constructor and closes the file in its finalize() method. Although this design
seems neat, tidy, and symmetrical, it potentially creates an insidious bug. A
Java program generally will have only a finite number of file handles at its
disposal. When all those handles are in use, the program won't be able to open
any more files.
> finalize in bag implementations causes pig to run out of memory in reduce
> --------------------------------------------------------------------------
>
> Key: PIG-1516
> URL: https://issues.apache.org/jira/browse/PIG-1516
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have
> finalize methods implemented. As a result, the garbage collector moves them
> to a finalization queue, and the memory used is freed only after the
> finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the
> finalization queue, and pig runs out of memory. This can happen if large
> number of small bags are being created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that
> are created when the bag is too large. But if the bags are small enough, no
> spill files are created, and there is no use of the finalize function.
> A new class that holds a list of files will be introduced (FileList). This
> class will have a finalize method that deletes the files. The bags will no
> longer have finalize methods, and the bags will use FileList instead of
> ArrayList<File>.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as
> there will not be the stage of combining intermediate merge results. But I
> would recommend disabling it only if you have this problem as it is likely to
> slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.