[
https://issues.apache.org/jira/browse/PIG-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272968#comment-13272968
]
Jonathan Coveney commented on PIG-2651:
---------------------------------------
Please find attached a patch with tests. Note: in the process of adding the
tests, I ran into this:
https://issues.apache.org/jira/browse/PIG-2694
It's not blocking, but something to consider...
Also: this patch includes the contents of
https://issues.apache.org/jira/browse/PIG-2066. The new files are:
.../apache/pig/IteratingAccumulatorEvalFunc.java
.../udf/evalfunc/IteratingAccumulatorCount.java
.../udf/evalfunc/IteratingAccumulatorIsEmpty.java
.../test/udf/evalfunc/IteratingAccumulatorSum.java
And of course, new e2e tests in nightly.conf
> Provide a much easier to use accumulator interface
> --------------------------------------------------
>
> Key: PIG-2651
> URL: https://issues.apache.org/jira/browse/PIG-2651
> Project: Pig
> Issue Type: New Feature
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2651-0.patch, PIG-2651-1.patch
>
>
> This introduces a new interface, IteratingAccumulatorEvalFunc (that name is
> NOT final...). The cool thing about this patch is that it is built purely on
> top of the existing Accumulator code (well, it uses PIG-2066, but it could
> easily work without it). That is to say, it's an easier way to write
> accumulators without having to fork the Pig codebase.
> The downside is that the only way I am able to provide such a clean interface
> is by using a second thread. I need to explore any potential performance
> implications, but given that most of the easy to use Pig stuff has
> performance implications, I think as long as we measure and and document
> them, it's worth the much more usable interface. Plus I don't think it will
> be too bad as one thread does the heavy lifting, while another just ferries
> values in between. SUM could now be written as:
> {code}
> public class SUM extends IteratingAccumulatorEvalFunc<Long> {
> public Long exec(Iterator<Tuple> it) throws IOException {
> long sum = 0;
> while (it.hasNext()) {
> sum += (Long)it.next().get(0);
> }
> return sum;
> }
> }
> {code}
> Besides performance tests, I need to figure out how to properly test this
> sort of thing. I particularly welcome advice on that front.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira