[ 
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900440#action_12900440
 ] 

Thejas M Nair commented on PIG-1544:
------------------------------------

bq. While computing the number of bags, we should remember to consider the 
multi-query case as well.
In case of multi-query, the sub-plans for each query in multi-query are 
executed one at a time for a given tuple with large bags. So the number of 
large bags that can't be garbage collected would be similar to that of single 
query. 

Another thing to keep in mind is that multiple bags that are working on common 
input (in case of  distinct/order-by in nested foreach), would be sharing 
some/most of the memory with the input bag because pig does not create copies 
of the column objects.


> proactive-spill bags should share the memory alloted for it
> -----------------------------------------------------------
>
>                 Key: PIG-1544
>                 URL: https://issues.apache.org/jira/browse/PIG-1544
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>
> Initially proactive spill bags were designed for use in (co)group 
> (InternalCacheBag) and they knew the total number of proactive bags that were 
> present, and shared the memory limit specified using the property 
> pig.cachedbag.memusage .
> But the two proactive bag implementations were added later - 
> InternalDistinctBag and InternalSortedBag are not aware of actual number of 
> bags being used - their users always assume total-numbags = 3. 
> This needs to be fixed and all proactive-spill bags should share the 
> memory-limit .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to