[ https://issues.apache.org/jira/browse/DATAFU-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189158#comment-15189158 ]
Eyal Allweil commented on DATAFU-116: ------------------------------------- As far as I know, the behavior you're describing is how Pig deals with UDF's that implement the Accumulator interface. If the UDF doesn't (if it only extends EvalFunc) the parameters (including bags) are passed in memory in their entirety. I'm basing this on [this quote from Programming Pig|http://stackoverflow.com/a/15813789/150992]. That's why I'm suggesting this change. > Make SetIntersect and SetDifference implement Accumulator > --------------------------------------------------------- > > Key: DATAFU-116 > URL: https://issues.apache.org/jira/browse/DATAFU-116 > Project: DataFu > Issue Type: Improvement > Affects Versions: 1.3.0 > Reporter: Eyal Allweil > > SetIntersect and SetDifference accept only sorted bags, and the output is > always smaller than the inputs. Therefore an accumulator implementation > should be possible and it will improve memory usage (somewhat) and allow Pig > to optimize loops with these operations better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)