Stamatis Zampetakis created HIVE-29339:
------------------------------------------

             Summary: Remove DISTINCT indicator from SqlAggFunctions
                 Key: HIVE-29339
                 URL: https://issues.apache.org/jira/browse/HIVE-29339
             Project: Hive
          Issue Type: Task
          Components: CBO
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


The 
[CanAggregateDistinct|https://github.com/apache/hive/blob/d9ec04156d84bedbaa9f8dc40c27dbb88a3b9f49/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/CanAggregateDistinct.java]
 interface provides an extra indicator to aggregate functions allowing them to 
indicate if they use DISTINCT or not. 

However, this indicator is redundant at the operator level cause the 
information is already present in the query plan (AggregateCall/RexOver). 

Having the indicator in multiple places is also problematic cause the 
information between the call and the operator may be misaligned that may lead 
to bugs depending on which field the rules/planner will check.

Finally, the presence of  the DISTINCT indicator at the operator level 
essentially means that for each aggregate function there are two operators (one 
that supports DISTINCT and one that doesn't) so essentially we double the 
available operators.

Removing the DISTINCT indicator from all aggregate functions, leads to simpler 
and more generic code, increases code coverage, and facilitate maintenance 
since it removes Hive specific interfaces.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to