[
https://issues.apache.org/jira/browse/CALCITE-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090175#comment-18090175
]
Stamatis Zampetakis commented on CALCITE-7612:
----------------------------------------------
This seems like a duplicate of CALCITE-6461. If that's the case then this
should be closed and we shall continue the discussion under CALCITE-6461.
> Track whether a column origin is derived from an aggregate
> ----------------------------------------------------------
>
> Key: CALCITE-7612
> URL: https://issues.apache.org/jira/browse/CALCITE-7612
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.38.0
> Reporter: yibo wen
> Priority: Major
>
> Description:
> RelColumnOrigin currently exposes whether an output column is derived from
> an origin column via isDerived(), but it does not distinguish ordinary
> expression derivation from
> aggregate derivation.
> For example:
> SELECT a + b AS c FROM t
> and
> SELECT SUM(a) AS s FROM t
> both produce derived column origins, but downstream lineage or
> impact-analysis tools may need to distinguish whether the output column was
> derived by an aggregate call.
> Expected behavior:
> Column-origin metadata should be able to tell whether an origin is derived
> from an aggregate expression.
> Motivation:
> For column lineage, data governance, and impact analysis, aggregate-derived
> columns often need to be handled differently from ordinary expression-derived
> columns. For example,
> SUM(a), COUNT(a), AVG(a) and a + b all depend on source columns, but their
> semantic lineage is different.
> Possible design direction:
> Extend RelColumnOrigin or related metadata to expose aggregate-derived
> information. This may require discussion because RelColumnOrigin is part of
> Calcite's public metadata API.
> Open questions:
> Should aggregate derivation be represented as a new boolean flag, a
> derivation kind enum, or a separate metadata API?
> Should this information affect equals/hashCode semantics of RelColumnOrigin?
> How should aggregate calls with zero arguments, such as COUNT(*), be
> represented?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)