[ 
https://issues.apache.org/jira/browse/PIG-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127330#comment-13127330
 ] 

Dmitriy V. Ryaboy commented on PIG-2234:
----------------------------------------

I am starting to think there are enough things wrong with the interfaces of 
Tuple, Bag, Algebraic, Accumulative, and EvalFunc itself that rather than 
trying for incremental improvements that will introduce significant complexity 
to an already complex codebase, we should rethink the whole thing from the 
ground up, mapred/mapreduce style. There are some core things in the apis that 
limit our ability to evolve further.

                
> Alebraic udf Init and Intermediate functions should be able to return non 
> tuple data types
> ------------------------------------------------------------------------------------------
>
>                 Key: PIG-2234
>                 URL: https://issues.apache.org/jira/browse/PIG-2234
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>
> The exec() call to Algebraic UDF initial and intermediate classes are 
> required to return a Tuple. This has been done because the output is 
> collected in a DataBag and passed to Intermediate.exec() and Final.exec() 
> calls, and DataBag in pig needs to contain a Tuple. But this results in 
> additional Tuple objects getting created and also adds additional 
> (de)serialization costs. Functions such as COUNT, SUM are also having to wrap 
> the initial and intermediate results in Tuples.
> The Algebraic interface needs to change to reduce the costs for udfs that 
> don't need an intermediate tuple .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to