Stamatis Zampetakis created HIVE-29339:
------------------------------------------
Summary: Remove DISTINCT indicator from SqlAggFunctions
Key: HIVE-29339
URL: https://issues.apache.org/jira/browse/HIVE-29339
Project: Hive
Issue Type: Task
Components: CBO
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
The
[CanAggregateDistinct|https://github.com/apache/hive/blob/d9ec04156d84bedbaa9f8dc40c27dbb88a3b9f49/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/CanAggregateDistinct.java]
interface provides an extra indicator to aggregate functions allowing them to
indicate if they use DISTINCT or not.
However, this indicator is redundant at the operator level cause the
information is already present in the query plan (AggregateCall/RexOver).
Having the indicator in multiple places is also problematic cause the
information between the call and the operator may be misaligned that may lead
to bugs depending on which field the rules/planner will check.
Finally, the presence of the DISTINCT indicator at the operator level
essentially means that for each aggregate function there are two operators (one
that supports DISTINCT and one that doesn't) so essentially we double the
available operators.
Removing the DISTINCT indicator from all aggregate functions, leads to simpler
and more generic code, increases code coverage, and facilitate maintenance
since it removes Hive specific interfaces.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)