[ 
https://issues.apache.org/jira/browse/PIG-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2551:
----------------------------------

    Attachment: PIG-2551-1.patch

Thanks for your comments, Julien and Daniel!

All, please find attached the revised patch, per your notes.

- I added comments
- I added a basic heuristic to apply the intermediate EvalFunc in cases where 
applying it gives a useful reduction in size.
- I added PigCounterHelper to Pig from ElephantBird. It's a more reasonable 
place to live, and it is useful. This facilitates logging to Pig from UDFs. I 
use this to collect stats on the combining activity when an Algebraic UDF is 
used as an Accumulator.

Also, Daniel, I did some benchmarking per Dmitriy's comment, and I don't know 
that it's appreciably slower. On 1M bags, here is a benchmark on the 
accumulator piece:

   AlgSum 14.9 ============================
 AlgCount 15.9 ==============================
      Sum 13.7 =========================
    Count 13.4 =========================

AlgSum and AlgCount are just a version of AlgebraicEvalFunc that returns the 
static classes from LongSum and COUNT, but in this benchmark I called 
accumulate. The purpose of this is because it is in using accumulate that the 
function calling overhead is going to be largest.

As you can see, the falloff is minimal, so I don't know that some big 
disclaimer is necessary (any more than it's necessary to say that Jython UDFs 
are slower than Java UDFs or whatnot).

For the accumulator eval func, there is no overhead, and a lot of people I know 
when implementing accumulative UDFs basically do that manually as is.
                
> Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which 
> gives you the lower levels for free
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2551
>                 URL: https://issues.apache.org/jira/browse/PIG-2551
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2551-0.patch, PIG-2551-1.patch
>
>
> This is more of a win for the Algebraic interface than the Accumulator 
> interface, but the idea is that if you implement the Algebraic interface, you 
> should get Accumulator/EvalFunc for free, and if you implement Accumulator, 
> you should get EvalFunc for free. The win of this is that in cases such as 
> JRuby, you don't have to muck around doing this yourself...you have them 
> implement the algebraic portion, and the rest comes free (that is where this 
> came out of, but I feel like it is generally useful enough).
> The next piece of work I'd like to do is making an easier to implement way to 
> make Algebraic UDFs, but then again, my to do is huge :) Would love thoughts 
> on this. If it doesn't make it into Pig, it's still going to come in the 
> JRuby stuff, so I thought it'd at least be worth having it separate, tested, 
> and available to everyone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to