[ https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096403#comment-16096403 ]
ASF GitHub Bot commented on FLINK-7234: --------------------------------------- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/4372 I think you are right @greghogan. It's not about the ratio of #distinct keys to size of the dataset. But it's also not only the ratio of #distinct keys to size of the memory. The skew of the key distribution has an effect as well (hash-based combiners should better handle skew than sort-based combiners). > Fix CombineHint documentation > ----------------------------- > > Key: FLINK-7234 > URL: https://issues.apache.org/jira/browse/FLINK-7234 > Project: Flink > Issue Type: Bug > Components: Documentation > Affects Versions: 1.2.2, 1.4.0, 1.3.2 > Reporter: Greg Hogan > Assignee: Greg Hogan > > The {{CombineHint}} > [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html] > applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be > note for {{DataSet#distinct}}. It is also set with > {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)