[ 
https://issues.apache.org/jira/browse/FLINK-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201482#comment-15201482
 ] 

ASF GitHub Bot commented on FLINK-3179:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1553#issuecomment-198364412
  
    Hi @ramkrish86, I thought about this PR and came to the conclusion that we 
should not continue. The optimizer's design does not allow to modify operators 
in or inject operators into enumerated subplans. This might cause invalid 
execution plans and in worst case wrong results without somebody noticing it.
    
    I would simply log a WARN message that a combiner was not added if the 
optimizer identifies a Partition operator in front of a Reduce or combinable 
GroupReduce operator and give a hint that an explicit CombinerFunction can be 
added with groupCombine in front of the partition operator.
    
    Sorry again @ramkrish86 that I lead you into a dead end with this PR.


> Combiner is not injected if Reduce or GroupReduce input is explicitly 
> partitioned
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-3179
>                 URL: https://issues.apache.org/jira/browse/FLINK-3179
>             Project: Flink
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 0.10.1
>            Reporter: Fabian Hueske
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 1.0.0, 0.10.2
>
>
> The optimizer does not inject a combiner if the input of a Reducer or 
> GroupReducer is explicitly partitioned as in the following example
> {code}
> DataSet<Tuple2<String,Integer>> words = ...
> DataSet<Tuple2<String,Integer>> counts = words
>   .partitionByHash(0)
>   .groupBy(0)
>   .sum(1);
> {code}
> Explicit partitioning can be useful to enforce partitioning on a subset of 
> keys or to use a different partitioning method (custom or range partitioning).
> This issue should be fixed by changing the {{instantiate()}} methods of the 
> {{ReduceProperties}} and {{GroupReduceWithCombineProperties}} classes such 
> that a combine is injected in front of a {{PartitionPlanNode}} if it is the 
> input of a Reduce or GroupReduce operator. This should only happen, if the 
> Reducer is the only successor of the Partition operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to