[
https://issues.apache.org/jira/browse/PIG-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878294#action_12878294
]
Ashutosh Chauhan commented on PIG-1448:
---------------------------------------
Problem here is not as bad as it may sound. All the physical operator already
detaches the input tuple after they are done with it. In the getNext() phy op
first calls processInput() which first attaches the input tuple and then
detaches it at the end. So, physical operators contained within inner plans
will also do that. Problem is when there is a Bin Cond, Pig short circuits one
of the branches of the inner plan, in which case getNext() of the operator is
never called and thus tuple is never detached. Note in these cases, tuple was
already attached by the operator which had this inner plan to all the roots of
the plan. So, in this particular use case tuple got attached but was never
detached and thus had the stray reference which cannot be GC'ed. This still
will not be a problem if there is only a single pipeline in mapper or reducer
since the next time new key/value pair is read and is run through pipeline, the
reference will be overwritten and thus tuple which was not detached in previous
run can now be GC'ed. Only if you have Multi Query optimized script the same
pipeline may not be run when the next key/value pair is read in map() or
reduce() and then stray reference will not be overwritten. If all of these
conditions are met and if tuple itself is large or contains large bags, we may
end up with OOME.
> Detach tuple from inner plans of physical operator
> ---------------------------------------------------
>
> Key: PIG-1448
> URL: https://issues.apache.org/jira/browse/PIG-1448
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0
> Reporter: Ashutosh Chauhan
> Fix For: 0.8.0
>
>
> This is a follow-up on PIG-1446 which only addresses this general problem for
> a specific instance of For Each. In general, all the physical operators which
> can have inner plans are vulnerable to this. Few of them include
> POLocalRearrange, POFilter, POCollectedGroup etc. Need to fix all of these.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.