[ https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2888: ----------------------------------- Attachment: partialagg_patch_4.patch Significant improvements to transitions from raw to processed map. Better mem utilization estimation. Better logging. While profiling, also noticed an inordinate amount of time being spent in Distinct$Initial's bag registration, fixed that. The task that I cited as taking 57 seconds with this patch earlier? It now takes 30 seconds. Also saw 40% speed improvement vs older version of this patch on a production job. Please review :). > Improve performance of POPartialAgg > ----------------------------------- > > Key: PIG-2888 > URL: https://issues.apache.org/jira/browse/PIG-2888 > Project: Pig > Issue Type: Improvement > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, > partialagg_patch_3.patch, partialagg_patch_4.patch > > > During performance testing, we found that POPartialAgg can cause performance > degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't > well suited to the operator's assumptions. Changing the implementation to a > more flexible hash-based model can provide significant performance > improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira