[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775158#action_12775158 ]
Alan Gates commented on PIG-979: -------------------------------- A test should be added that checks that when accumulator UDFs are mixed with non-accumulator UDFs it works properly. Why is the optimization not applied in the case that inner is set on POPackage? It seems the accumulator interface should still work in this case. Some comments on what AccumulatorOptimizer.check() is and what it allows would be helpful. The code contains tabs in some spots instead of 4 spaces. The cases in which the accumulator interface can be used has been greatly extended by adding the support for unary and binary operators. But this comes at a cost. Every binary and unary comparison now has to make the accumChild call. 99% of the time this will be false. It's not clear to me how often users will do things like: {code} foreach C generate accumfunc1(A) + accumfunc2(A) OR foreach C generate (accumfunc1(A) > 100 ? 0 : 1) {code} which is the only time I can see where this additional functionality is useful, since we don't currently allow these functions in filters. It's possible that JIT along with branch prediction will remove this extra cost, since the branch will always be one way or another for a given query. But I'd like to see this tested. It would be interesting to compare a query with heavy use of binary operators (but no accumulator UDFs) with and without this change. I don't understand why you need the new interface AccumulativeTupleBuffer and class AccumulativeBag. Why can't the block of tuples read off of the iterator just be put in a regular bag and then passed to the UDFs? In all the sum implementations of accumulate you calculate the sum of the block of tuples twice. It should be done once and cached. In COUNT.accumulate rather than making intermediateCount a Long and then forcing the creation of a new Long each time you add one you should instead keep it as a long and depend on boxing to convert it to Long when you return it in getValue. Same in COUNT_STAR.accumulate > Acummulator Interface for UDFs > ------------------------------ > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Assignee: Ying He > Attachments: PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.