[
https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548020
]
Utkarsh Srivastava commented on PIG-7:
--------------------------------------
Looks good. But as Alan said, we should rework this code when there is more
time.
2 comments one major, and one minor, both in PigCombine.java
Major:
Lines 93.94: Don't add the indexed tuple directly to the bag. We had a nasty
bug a while back regarding this. Convert it into a regular tuple before adding
it. see lines 148,149 in PigMapReduce.java
Minor: It would be nice to clean up the comments from PigCombine. Also, there
are some fragments that don't make sense given the restricted setting we are
applying the cominer in.
For example,
for (int i = 0; i < inputCount; i++) { // XXX: shouldn't we only do this if
INNER flag is set?
if (t.getBagField(1 + i).isEmpty())
return;
}
Since we are currently running for the case when inputCount == 1, the bag will
never be empty. (If the bag is empty, that group would never have been created).
> Optimize execution of algebraic functions
> -----------------------------------------
>
> Key: PIG-7
> URL: https://issues.apache.org/jira/browse/PIG-7
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Olga Natkovich
> Assignee: Alan Gates
> Attachments: combiner.patch, combiner2.patch, combiner3.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X),
> SUM(X), etc. They can be computed effciently by doing the first level
> computation using hadoop combiner. This can give a significant (2-3x) speedup
> for many aggregation queries.
> Several users asked us for this feature so it is pretty high priority.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.