[ 
https://issues.apache.org/jira/browse/FLINK-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879297#comment-15879297
 ] 

Fabian Hueske commented on FLINK-5888:
--------------------------------------

Yes, [~ggevay] is right. The hash partitioning is moved between the two maps 
(which is semantically OK due to the annotations).
Without debugging the optimizer, I see two reasons why that happened:

1. Both plans have identical unknown estimated costs because no stats are 
available.
2. The plan without Combiner has lower estimated costs (the combiner is not 
injected before the Reduce, because the data is already partitioned).

One more thing. Adding a {{CombineHint.HASH}} to the {{CentroidAccumulator}} as 
follows: 
{code}
.groupBy(0).reduce(new 
CentroidAccumulator()).setCombineHint(ReduceOperatorBase.CombineHint.HASH)
{code}

might speed up things as well.

> ForwardedFields annotation is not generating optimised execution plan in 
> example KMeans job
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5888
>                 URL: https://issues.apache.org/jira/browse/FLINK-5888
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API, Examples, Java API
>    Affects Versions: 1.1.3
>            Reporter: Ziyad Muhammed Mohiyudheen
>
> Flink KMeans java example [1] shows the usage of ForwardedFields function 
> annotation. How ever, the example job was taking more time than expected on 
> medium sized data itself. By merely removing the function annotation from the 
> example code (with out any other change), a better execution plan and run 
> time was obtained. The execution plan shows that no combiner is used and the 
> two Map tasks are not chained when ForwardedFields is enabled. The experiment 
> is documented in [2]
> [1] 
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java
> [2] https://drive.google.com/open?id=0B0IlZv0uHBuvVEZ5ZmNpN19jVVU



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to