[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2888:
-----------------------------------

    Attachment: partialagg_patch_4.patch

Significant improvements to transitions from raw to processed map. Better mem 
utilization estimation. Better logging.

While profiling, also noticed an inordinate amount of time being spent in 
Distinct$Initial's bag registration, fixed that.

The task that I cited as taking 57 seconds with this patch earlier? It now 
takes 30 seconds. Also saw 40% speed improvement vs older version of this patch 
on a production job.

Please review :).
                
> Improve performance of POPartialAgg
> -----------------------------------
>
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch, partialagg_patch_4.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to