[
https://issues.apache.org/jira/browse/PIG-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228745#comment-13228745
]
Jonathan Coveney commented on PIG-2551:
---------------------------------------
I have LongSum, COUNT, AlgSum, and AlgCount. AlgSum and AlgCount are just
wrappers which extend AlgebraicEvalFunc, returning the static classes from
LongSum and COUNT respectiviely (the purpose being that their Algebraic
implementation is identical, so you're testing the overhead of the extra
function calls in the Accumulator they give you).
I then used Caliper to run a benchmark which instantiated each as an
Accumulator<Long>, and ran it on a DataBag I streamed through it.
See the code to set up:
{code}
@Override protected void setUp() {
try {
theBag = mBagFactory.newDefaultBag();
for (int i = 0; i < size; i++) {
Tuple t = mTupleFactory.newTuple(1);
t.set(0, i);
theBag.add(t);
}
} catch (Exception e) {
throw new RuntimeException("Error in setup");
}
}
{code}
See the code to run:
{code}
public long go(Accumulator<Long> acc) {
try {
Iterator<Tuple> it = theBag.iterator();
while (it.hasNext()) {
DataBag tempBag = mBagFactory.newDefaultBag();
for (int j = 0; it.hasNext() && j < perAcc; j++)
tempBag.add(it.next());
Tuple t = mTupleFactory.newTuple(1);
t.set(0, tempBag);
acc.accumulate(t);
}
return acc.getValue();
} catch (Exception e) {
throw new RuntimeException("Error in go");
}
}
{code}
The parameter "perAcc" is how many elements will be streamed through the
accumulate function at a time, and was set to 1000. The size was set to
1000000. There were 10 trials.
> Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which
> gives you the lower levels for free
> ------------------------------------------------------------------------------------------------------------
>
> Key: PIG-2551
> URL: https://issues.apache.org/jira/browse/PIG-2551
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Priority: Minor
> Fix For: 0.11
>
> Attachments: PIG-2551-0.patch, PIG-2551-1.patch
>
>
> This is more of a win for the Algebraic interface than the Accumulator
> interface, but the idea is that if you implement the Algebraic interface, you
> should get Accumulator/EvalFunc for free, and if you implement Accumulator,
> you should get EvalFunc for free. The win of this is that in cases such as
> JRuby, you don't have to muck around doing this yourself...you have them
> implement the algebraic portion, and the rest comes free (that is where this
> came out of, but I feel like it is generally useful enough).
> The next piece of work I'd like to do is making an easier to implement way to
> make Algebraic UDFs, but then again, my to do is huge :) Would love thoughts
> on this. If it doesn't make it into Pig, it's still going to come in the
> JRuby stuff, so I thought it'd at least be worth having it separate, tested,
> and available to everyone.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira