[ 
https://issues.apache.org/jira/browse/CALCITE-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090175#comment-18090175
 ] 

Stamatis Zampetakis commented on CALCITE-7612:
----------------------------------------------

This seems like a duplicate of CALCITE-6461. If that's the case then this 
should be closed and we shall continue the discussion under CALCITE-6461.

> Track whether a column origin is derived from an aggregate
> ----------------------------------------------------------
>
>                 Key: CALCITE-7612
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7612
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.38.0
>            Reporter: yibo wen
>            Priority: Major
>
> Description:
>   RelColumnOrigin currently exposes whether an output column is derived from 
> an origin column via isDerived(), but it does not distinguish ordinary 
> expression derivation from
>   aggregate derivation.
>   For example:
>   SELECT a + b AS c FROM t
>   and
>   SELECT SUM(a) AS s FROM t
>   both produce derived column origins, but downstream lineage or 
> impact-analysis tools may need to distinguish whether the output column was 
> derived by an aggregate call.
>   Expected behavior:
>   Column-origin metadata should be able to tell whether an origin is derived 
> from an aggregate expression.
>   Motivation:
>   For column lineage, data governance, and impact analysis, aggregate-derived 
> columns often need to be handled differently from ordinary expression-derived 
> columns. For example,
>   SUM(a), COUNT(a), AVG(a) and a + b all depend on source columns, but their 
> semantic lineage is different.
>   Possible design direction:
>   Extend RelColumnOrigin or related metadata to expose aggregate-derived 
> information. This may require discussion because RelColumnOrigin is part of 
> Calcite's public metadata API.
>   Open questions:
>   Should aggregate derivation be represented as a new boolean flag, a 
> derivation kind enum, or a separate metadata API?
>   Should this information affect equals/hashCode semantics of RelColumnOrigin?
>   How should aggregate calls with zero arguments, such as COUNT(*), be 
> represented?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to