[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2888:
-----------------------------------

    Attachment: partialagg_patch_2.patch

Attaching a second version. It's ready for review.

This takes care of memory estimation (and actually looks at number of 
operators, doesn't just hardcode a magic "3"), and turns off if reduction is 
insufficient.

Would love to get a 3-rd party verification of the speed improvements. Maybe 
someone who has recent PigMix results can rerun with this patch?

One of the test cases (TestPOPartialAgg.testPartialMultiInput1HashMemEmpty) 
still fails, because it assumes that even if no memory is allocated to internal 
cached bags, consecutive keys still get aggregated. That's an assumption that's 
pretty specific to the old implementation. Does anyone think that feature is 
critical? If not, I would like to remove the test.
                
> Improve performance of POPartialAgg
> -----------------------------------
>
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to