[
https://issues.apache.org/jira/browse/PIG-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-5083:
------------------------------------
Attachment: PIG-5083-1.patch
While analyzing the heapdump during OOM for the Combiner, found that while
deserializing next value in NullableTuple
{code}
mValue = bis.readTuple(in);
{code}
the previous value of mValue could not be collected. In case of DISTINCT inside
nested foreach and map.exec.PartAgg=true that could be a really big bag and can
lead to OOM. That is also fixed in this patch.
Just noticed that it could be applied to LitePackager as well and added that to
the patch. So rerunning the full unit and e2e tests now.
> CombinerPackager and LitePackager should not materialize bags
> -------------------------------------------------------------
>
> Key: PIG-5083
> URL: https://issues.apache.org/jira/browse/PIG-5083
> Project: Pig
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-5083-1.patch
>
>
> Before PIG-3591 and creation of CombinerPackager, POCombinerPackage directly
> read from the combiner/reducer input instead of materializing the bag.
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java#L140-L161
> The unnecessary materialization leads to lot of spills and OOMs in some cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)