[ 
https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-7:
-------------------------

    Attachment: combiner2.patch

Here is a second pass at the combiner patch.  I fixed:

1) the possible null pointer exception in EvalSpecVisitor pointed out by 
Utkarsh in pt 2 of his previous comment.

2) I dealt with the overly lax choice of the combiner (pt 1 in Utkarsh's 
previous mail) by changing the way the way combinability is determined.  I have 
removed amenableToCombiner from EvalSpec and it's children.  There is now a 
visitor CombineDeterminer that extends EvalSpecVisitor and is a private class 
of MadreducePlanCompiler.  This visitor will only decide to use the combiner if 
it sees at least one FuncEvalSpec that has an Algebraic function and it does 
not see any eval specs that cannot be combined.  The previous version (using 
amenableToCombiner) only did the latter.

I have rerun the unit tests and all pass.

> Optimize execution of algebraic functions
> -----------------------------------------
>
>                 Key: PIG-7
>                 URL: https://issues.apache.org/jira/browse/PIG-7
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: combiner.patch, combiner2.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X), 
> SUM(X), etc. They can be computed effciently by doing the first level 
> computation using hadoop combiner. This can give a significant (2-3x) speedup 
> for many aggregation queries. 
> Several users asked us for this feature so it is pretty high priority.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to