[ 
https://issues.apache.org/jira/browse/PIG-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1836:
--------------------------------

    Fix Version/s:     (was: 0.10)
    
> Accumulator like interface should be used with Pig operators after (co)group 
> in certain cases
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-1836
>                 URL: https://issues.apache.org/jira/browse/PIG-1836
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Alan Gates
>
> There are a number of cases where people (co)group their data, and then pass 
> it to an operator other than foreach with a UDF, but where an accumulator 
> like interface would still make sense.  A few examples:
> {code}
> C = group B by $0;
> D = foreach C generate flatten(B);
> ...
> C = group B by $0;
> D = stream C through 'script.py';
> ...
> C = group B by $0;
> store C into 'output';
> {code}
> In all these cases the following operator does not require all the data to be 
> held in memory at once.  There may be others beyond this.  Changing this part 
> of the pipeline would greatly speed these types of queries and make them less 
> likely to die with out of memory errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to