[
https://issues.apache.org/jira/browse/PIG-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-1836:
--------------------------------
Fix Version/s: (was: 0.10)
> Accumulator like interface should be used with Pig operators after (co)group
> in certain cases
> ---------------------------------------------------------------------------------------------
>
> Key: PIG-1836
> URL: https://issues.apache.org/jira/browse/PIG-1836
> Project: Pig
> Issue Type: Improvement
> Reporter: Alan Gates
>
> There are a number of cases where people (co)group their data, and then pass
> it to an operator other than foreach with a UDF, but where an accumulator
> like interface would still make sense. A few examples:
> {code}
> C = group B by $0;
> D = foreach C generate flatten(B);
> ...
> C = group B by $0;
> D = stream C through 'script.py';
> ...
> C = group B by $0;
> store C into 'output';
> {code}
> In all these cases the following operator does not require all the data to be
> held in memory at once. There may be others beyond this. Changing this part
> of the pipeline would greatly speed these types of queries and make them less
> likely to die with out of memory errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira