[ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442974#comment-13442974
 ] 

Julien Le Dem commented on PIG-2888:
------------------------------------

Awesome. Stop now or it will start to be negative soon.
Comments:
* There's a "pig.exec.nocombiner" that was not replaced by a constant.
* It would be nice to have a consistent way of getting booleans (and floats) 
from the conf. Something like:
{noformat}
PigConfiguration.getBoolean(Properties p, key) {
  return "true".equals(p.getProperty(key, "false"));
}
{noformat}
* some of the class description was still applicable
{noformat}
/**
 * Do partial aggregation in map plan. It uses a hash-map to aggregate. 
 * ...
 */
 public class POPartialAgg extends PhysicalOperator {
{noformat}
* what is the reason for this particular value?
{noformat}
 private static final int MAX_LIST_SIZE = 1 << 13 - 1;
{noformat}
* It looks like this could be a HashSet as the value never gets used (but 
there's no WeakHashSet so I gues I got my answer). It could be as well 
WeakHashMap<POPartialAgg, ?>. Don't you want a visitor to just list them all 
once and set the count? That way you would not have to worry about keeping a 
reference on them. 
{noformat}
private static final WeakHashMap<POPartialAgg, Byte> ALL_POPARTS = new 
WeakHashMap<POPartialAgg, Byte>();
{noformat}
* +0.5 so that it is never 0 ? Math.min(1, ...) is more readable. 
{noformat}
 firstTierThreshold = (int) (0.5 + totalTuples * (1f - (1f / sizeReduction)));
 secondTierThreshold = (int) (0.5 + totalTuples *  (1f / sizeReduction));
{noformat}
* LOG.info() should be wrapped in if (LOG.isInfoEnabled()) { ... } for perf
* in aggregateSecondLevel() can't the processedInputMap be reused?
* in getMinOutputReductionFromProp(), if minReduction <= 0 it should throw an 
exception.


                
> Improve performance of POPartialAgg
> -----------------------------------
>
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
> partialagg_patch_3.patch, partialagg_patch_4.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance 
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
> well suited to the operator's assumptions. Changing the implementation to a 
> more flexible hash-based model can provide significant performance 
> improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to