[ 
https://issues.apache.org/jira/browse/PIG-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796477#comment-16796477
 ] 

Koji Noguchi commented on PIG-5384:
-----------------------------------

Requirement of Pig keeping the entire bag in memory until the corresponding 
spill is done comes from the fact that Pig can continue to run when spilling to 
a file fails.  (It drops the spill file and keeps on using the bag in memory.)

Spilling can fail when disks are full but I'm guessing task would eventually 
fail when that happens.
Spilling can also fail when a user passes a custom List instance that doesn't 
support clear().  But for this case, this bag shouldn't be part of spillables 
in the first place.

So wondering if we can provide an option to fail the task when spilling fails 
and let Pig release each Tuple as soon as it writes to a spill file (before 
closing).

> OOM while spilling large bag 
> -----------------------------
>
>                 Key: PIG-5384
>                 URL: https://issues.apache.org/jira/browse/PIG-5384
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
>
> One of the common OOM issue in Pig is, Pig hitting OOM while trying to spill 
> a large bag. Current solutions is to give higher heapsize or tweak 
> {noformat}
> pig.spill.memory.usage.threshold.fraction
> pig.spill.collection.threshold.fraction
> pig.spill.unused.memory.threshold.size
> {noformat}
> and make sure spilling starts early enough.  These params are still critical 
> but wondering if any improvement can be made to increase the chances of 
> avoiding OOM while spilling a single large bag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to