[
https://issues.apache.org/jira/browse/PIG-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796483#comment-16796483
]
Koji Noguchi commented on PIG-5384:
-----------------------------------
My first thought was to let the spill() create multiple spill files but this
require the change on the reader side. Given user may have their own custom
Bag, I don't think this would work.
Second was to let the spill() call {{out.flush()} for every N
records/bytes/seconds and release that part of the tuples. This may be a good
middle ground but requires the exception handling to re-read the flushed spill
file to recover the memory state.
> OOM while spilling large bag
> -----------------------------
>
> Key: PIG-5384
> URL: https://issues.apache.org/jira/browse/PIG-5384
> Project: Pig
> Issue Type: Improvement
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Major
>
> One of the common OOM issue in Pig is, Pig hitting OOM while trying to spill
> a large bag. Current solutions is to give higher heapsize or tweak
> {noformat}
> pig.spill.memory.usage.threshold.fraction
> pig.spill.collection.threshold.fraction
> pig.spill.unused.memory.threshold.size
> {noformat}
> and make sure spilling starts early enough. These params are still critical
> but wondering if any improvement can be made to increase the chances of
> avoiding OOM while spilling a single large bag.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)