[ https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900440#action_12900440 ]
Thejas M Nair commented on PIG-1544: ------------------------------------ bq. While computing the number of bags, we should remember to consider the multi-query case as well. In case of multi-query, the sub-plans for each query in multi-query are executed one at a time for a given tuple with large bags. So the number of large bags that can't be garbage collected would be similar to that of single query. Another thing to keep in mind is that multiple bags that are working on common input (in case of distinct/order-by in nested foreach), would be sharing some/most of the memory with the input bag because pig does not create copies of the column objects. > proactive-spill bags should share the memory alloted for it > ----------------------------------------------------------- > > Key: PIG-1544 > URL: https://issues.apache.org/jira/browse/PIG-1544 > Project: Pig > Issue Type: Bug > Reporter: Thejas M Nair > > Initially proactive spill bags were designed for use in (co)group > (InternalCacheBag) and they knew the total number of proactive bags that were > present, and shared the memory limit specified using the property > pig.cachedbag.memusage . > But the two proactive bag implementations were added later - > InternalDistinctBag and InternalSortedBag are not aware of actual number of > bags being used - their users always assume total-numbags = 3. > This needs to be fixed and all proactive-spill bags should share the > memory-limit . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.