[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443793#comment-13443793
 ] 

Dmitriy V. Ryaboy commented on PIG-2888:
----------------------------------------

bq. There's a "pig.exec.nocombiner" that was not replaced by a constant.

Fixed.

bq. It would be nice to have a consistent way of getting booleans (and floats) 
from the conf

Feels like scope creep.. maybe in another ticket? I don't want to get into how 
to design that around Properties, Configurations, and PigConfigurations.

bq. some of the class description was still applicable
Added better docs.

bq. what is the reason for this particular value?

Bad math :). Fixed the math and added an explanation of how I got there.

bq. Don't you want a visitor to just list them all once and set the count? That 
way you would not have to worry about keeping a reference on them.

I could do that, but this feels much cleaner -- no visitors, no serialization, 
no changes to the MRCompiler/JCCompiler, very self-contained, and works at 
runtime instead of having to be preset by the planner.

bq. +0.5 so that it is never 0 ? Math.min(1, ...) is more readable.

No, +0.5 so that it's a round() instead of floor()

bq. LOG.info() should be wrapped in if (LOG.isInfoEnabled()) { ... } for perf
Done for places where it matters (functions invoked more than once and messages 
where args are not constants)

bq.in aggregateSecondLevel() can't the processedInputMap be reused?

No -- aggregate() adds to the list of tuples in the target map, we want to 
overwrite in this case.

bq. in getMinOutputReductionFromProp(), if minReduction <= 0 it should throw an 
exception.

Added a log message instead. 
                
> Improve performance of POPartialAgg
> -----------------------------------
>
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch, partialagg_patch_4.patch, partialagg_patch_5.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to