[ 
https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547218
 ] 

Alan Gates commented on PIG-7:
------------------------------

Utkarsh,

Thanks for the feedback, I'll address the first two issues.

On the third issue, I have a question.  Can you give me an example of a pig 
script that will not be combinable in this case?  If I do something like:

a = load '/user/pig/tests/data/singlefile/studenttab10k';
b = group a by $0;
c = foreach b generate group, COUNT($1), SUM($1.$2);
d = filter c by $1 > '10';
dump d;

That still makes use of the combiner.

> Optimize execution of algebraic functions
> -----------------------------------------
>
>                 Key: PIG-7
>                 URL: https://issues.apache.org/jira/browse/PIG-7
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: combiner.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X), 
> SUM(X), etc. They can be computed effciently by doing the first level 
> computation using hadoop combiner. This can give a significant (2-3x) speedup 
> for many aggregation queries. 
> Several users asked us for this feature so it is pretty high priority.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to