Fabian Hueske created FLINK-5722:
------------------------------------

             Summary: Implement DISTINCT as dedicated operator
                 Key: FLINK-5722
                 URL: https://issues.apache.org/jira/browse/FLINK-5722
             Project: Flink
          Issue Type: Improvement
          Components: Table API & SQL
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Fabian Hueske


DISTINCT is currently implemented for batch Table API / SQL as an aggregate 
which groups on all fields. Grouped aggregates are implemented as GroupReduce 
with sort-based combiner.

This operator can be more efficiently implemented by using ReduceFunction and 
hinting a HashCombine strategy. The same ReduceFunction can be used for all 
DISTINCT operations and can be assigned with appropriate forward field 
annotations.

We would need a custom conversion rule which translates distinct aggregations 
(grouping on all fields and returning all fields) into a custom DataSetRelNode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to