[
https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Le Dem updated PIG-2421:
-------------------------------
Attachment: examples.patch
Hi Alan,
Thanks for starting this.
I like the annotation approach. I'm attaching a list of examples
(examples.patch) of what I had in mind (sorry, it's been seating on my computer
for a while).
It is pretty close to what you are proposing.
In particular, I'm suggesting:
- Not to have to extend EvalFunc at all: the udf context is provided through a
@Context annotation and is different per call of the UDF (not per FuncSpec).
The UDF context also specifies if we are in the frontend or backend and
provides methods for optional information passed by the UDF (output schema,
...) and access to the distributed cache (name-spaced by the UDF context).
- To have @Mapper @Combiner @Reducer instead of @Initial, @Intermediate,
@Final.
- We don't need to define @Accumulate if we allow the UDF to take an
Iterator<Tuple> as an input.
- I was trying to define schema-aware Tuples but did not get there yet.
Let me know what you think.
Julien
> EvalFuncs need redesigned
> -------------------------
>
> Key: PIG-2421
> URL: https://issues.apache.org/jira/browse/PIG-2421
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Affects Versions: 0.11
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: PIG-newudf.patch, examples.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator
> interfaces) have grown unwieldy. In particular, people have noted the
> following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type
> checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using
> getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code
> from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to
> know when to return a tuple (Initial, Intermediate) and when not to (exec,
> Final).
> # The separation of Initial, Intermediate, and Final into separate classes
> forces code duplication and makes it hard for UDFs in other languages to use
> those interfaces.
> # There is unused code in the current interface that occasionally causes
> confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue
> working essentially forever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira