[ 
https://issues.apache.org/jira/browse/PIG-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228745#comment-13228745
 ] 

Jonathan Coveney commented on PIG-2551:
---------------------------------------

I have LongSum, COUNT, AlgSum, and AlgCount. AlgSum and AlgCount are just 
wrappers which extend AlgebraicEvalFunc, returning the static classes from 
LongSum and COUNT respectiviely (the purpose being that their Algebraic 
implementation is identical, so you're testing the overhead of the extra 
function calls in the Accumulator they give you).

I then used Caliper to run a benchmark which instantiated each as an 
Accumulator<Long>, and ran it on a DataBag I streamed through it.

See the code to set up:

{code}
@Override protected void setUp() {
    try {
        theBag = mBagFactory.newDefaultBag();
        for (int i = 0; i < size; i++) {
            Tuple t = mTupleFactory.newTuple(1);
            t.set(0, i); 
            theBag.add(t);
        }   
    } catch (Exception e) {
        throw new RuntimeException("Error in setup");
    }   
}   

{code}

See the code to run:

{code}
public long go(Accumulator<Long> acc) {
    try {
        Iterator<Tuple> it = theBag.iterator();
        while (it.hasNext()) {
            DataBag tempBag = mBagFactory.newDefaultBag();
            for (int j = 0; it.hasNext() && j < perAcc; j++)
                tempBag.add(it.next());
            Tuple t = mTupleFactory.newTuple(1);
            t.set(0, tempBag);
            acc.accumulate(t);
        }   
        return acc.getValue();
    } catch (Exception e) {
        throw new RuntimeException("Error in go");
    }   
}  
{code}

The parameter "perAcc" is how many elements will be streamed through the 
accumulate function at a time, and was set to 1000. The size was set to 
1000000. There were 10 trials.
                
> Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which 
> gives you the lower levels for free
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2551
>                 URL: https://issues.apache.org/jira/browse/PIG-2551
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2551-0.patch, PIG-2551-1.patch
>
>
> This is more of a win for the Algebraic interface than the Accumulator 
> interface, but the idea is that if you implement the Algebraic interface, you 
> should get Accumulator/EvalFunc for free, and if you implement Accumulator, 
> you should get EvalFunc for free. The win of this is that in cases such as 
> JRuby, you don't have to muck around doing this yourself...you have them 
> implement the algebraic portion, and the rest comes free (that is where this 
> came out of, but I feel like it is generally useful enough).
> The next piece of work I'd like to do is making an easier to implement way to 
> make Algebraic UDFs, but then again, my to do is huge :) Would love thoughts 
> on this. If it doesn't make it into Pig, it's still going to come in the 
> JRuby stuff, so I thought it'd at least be worth having it separate, tested, 
> and available to everyone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to