[ https://issues.apache.org/jira/browse/FLINK-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379606#comment-15379606 ]
Greg Hogan commented on FLINK-3279: ----------------------------------- I fixed the wording of my comment. I think Fabian's suggestion was to investigate changing {{DistinctOperator}} from using {{GroupReduce}} to using {{Reduce}}. Then we could add {{setCombineHint}} to {{DistinctOperator}} rather than my suggestion above. > Optionally disable DistinctOperator combiner > -------------------------------------------- > > Key: FLINK-3279 > URL: https://issues.apache.org/jira/browse/FLINK-3279 > Project: Flink > Issue Type: New Feature > Components: DataSet API > Affects Versions: 1.0.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > Priority: Minor > > Calling {{DataSet.distinct()}} executes {{DistinctOperator.DistinctFunction}} > which is a combinable {{RichGroupReduceFunction}}. Sometimes we know that > there will be few duplicate records and disabling the combine would improve > performance. > I propose adding {{public DistinctOperator<T> setCombinable(boolean > combinable)}} to {{DistinctOperator}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)