*Hi Apache IoTDB Community,* I would like to propose a complete fix for the following issue:
https://github.com/apache/iotdb/issues/17797 This issue aims to support reuse of SELECT output aliases in IoTDB table model SQL. The goal is to support the following three cases in one integrated implementation: 1. GROUP BY <select_alias> 2. ORDER BY <select_alias> 3. Later SELECT items referencing earlier SELECT aliases, also known as lateral column alias references This change does not require parser changes. The current SQL grammar can already parse these statements. The required work is in analyzer-stage name resolution, expression rewriting, Analysis recording, and planner consistency. Functional Definition The intended syntax is: select_item := expression [AS] alias group_by_item := expression | ordinal_position | select_alias order_by_item := expression [ASC | DESC] [NULLS FIRST | NULLS LAST] | ordinal_position | select_alias lateral_column_alias_reference := unqualified_identifier Here, expression means any expression currently supported by the table model, including column references, scalar functions, arithmetic expressions, comparisons, aggregate functions, window functions, date_bin(...), and date_bin_gapfill(...). alias means an explicit SELECT item output alias, for example: s1 + 1 AS x AVG(s1) AS avg_s1 Only explicit SingleColumn aliases should be included in SELECT alias metadata. SELECT *, columns(*), and implicit output names should not become lateral column aliases. select_alias means an output alias defined in the current SELECT list. It is visible only in GROUP BY, ORDER BY, and later SELECT items. It is not visible in WHERE or HAVING. ordinal_position means the existing positional reference behavior, such as GROUP BY 1 and ORDER BY 2. This behavior must remain unchanged. unqualified_identifier means an identifier such as x or avg_s1. Qualified names such as t.x, table1.x, and db.table1.x must not be treated as lateral column alias references. Semantic Rules GROUP BY alias should be expanded to the corresponding SELECT expression and then validated by the existing GROUP BY rules. The expanded expression must be comparable and must not contain aggregate functions, window functions, or grouping functions. For example: SELECT date_bin(1h, time) AS hour_time, AVG(s1) AS avg_s1 FROM table1 GROUP BY hour_time ORDER BY hour_time; Here, GROUP BY hour_time is equivalent to GROUP BY date_bin(1h, time). The following query must fail: SELECT AVG(s1) AS avg_s1 FROM table1 GROUP BY avg_s1; It is equivalent to GROUP BY AVG(s1), which is invalid because an aggregate expression cannot be used as a grouping key. ORDER BY alias should resolve to the output column. The output type must be orderable, and the existing sort direction and null ordering behavior should remain unchanged. For example: SELECT s1 + 1 AS x FROM table1 ORDER BY x DESC NULLS LAST; Lateral column alias references should allow later SELECT items to reference aliases defined earlier in the same SELECT list. For example: SELECT s1 + 1 AS x, x * 2 AS y FROM table1; SELECT AVG(s1) AS avg_s1, avg_s1 + 10 AS shifted_avg FROM table1; Chained lateral column aliases should also be supported: SELECT s1 + 1 AS x, x * 2 AS y, y + 3 AS z FROM table1; This should be analyzed as if y were expanded from the already rewritten expression for x. Name Resolution Rules In ORDER BY, if an unqualified name matches both a SELECT alias and an input column, the SELECT alias takes precedence. In GROUP BY, if an unqualified name matches both a SELECT alias and an input column, the input column takes precedence. In lateral column alias resolution, if an unqualified name matches both an input column and a previous SELECT alias, the input column takes precedence. Lateral column aliases may only reference aliases defined earlier in the SELECT list. They cannot reference themselves or aliases defined later. For example, the following query must fail because x is referenced before it is defined: SELECT x + 1 AS y, s1 AS x FROM table1; If multiple previous aliases have the same name, referencing that name should produce an ambiguity error unless an input column with the same name takes precedence. For example: SELECT s1 AS x, s2 AS x, x + 1 FROM table1; If the input table does not contain a column named x, the third x should be reported as ambiguous. Similarly: SELECT s1 AS x, s2 AS x FROM table1 ORDER BY x; ORDER BY x should be reported as ambiguous. Subqueries must have independent SELECT alias scopes. Outer SELECT aliases must not be visible inside subqueries, and aliases defined inside a subquery must not affect the outer query scope. Unsupported Cases The following cases should remain unsupported: SELECT s1 AS x FROM table1 WHERE x > 1; SELECT AVG(s1) AS avg_s1 FROM table1 HAVING avg_s1 > 1; WHERE and HAVING should continue to use the existing name resolution rules and should not resolve SELECT aliases. Implementation Plan The implementation should add SELECT alias metadata in StatementAnalyzer. For each explicit SELECT alias, the analyzer should record: canonical alias name rewritten expression SELECT item position The canonical alias name should use Identifier.getCanonicalValue() to preserve existing quoted and unquoted identifier behavior. analyzeSelect should process SELECT items in order. For each SingleColumn, it should first rewrite the expression using previously defined aliases, then analyze the rewritten expression, and then record the rewritten expression in both outputExpressions and Analysis.SelectExpression. After the current SELECT item is analyzed, if it has an explicit alias, that alias should be registered with the rewritten expression. Lateral column alias rewriting should be implemented with the existing ExpressionTreeRewriter. The rewriter should replace only unqualified Identifier nodes. It should first try to resolve the identifier as an input column in the source scope. If the input column exists, no alias replacement should occur. If the input column does not exist, the rewriter should look up previous SELECT aliases. A single match should be replaced by the corresponding rewritten expression. Multiple matches should produce an ambiguity error. No match should leave the identifier unchanged so that the normal analyzer can report unresolved columns. The rewriter must not traverse into SubqueryExpression. GROUP BY analysis should be updated so that a simple Identifier first resolves against the source scope. If it resolves as an input column, the existing behavior is preserved. If it does not resolve as an input column, it should be resolved against the SELECT alias metadata. A unique alias match should be replaced with the alias’s rewritten expression before existing GROUP BY validation. After alias expansion, GROUP BY should continue to use the existing checks: no aggregate functions no window functions no grouping functions expression analysis comparable type validation gap-fill grouping key detection GroupingSetAnalysis.originalExpressions, complexExpressions, grouping expressions, and gap-fill grouping keys should all store the expanded expression, not the original alias Identifier. ORDER BY should keep output alias precedence. The current order-by scope already exposes output fields before source fields, but the implementation should ensure that duplicate aliases, input-column conflicts, aggregation queries, and ordinal references are handled correctly. analysis.setOrderByExpressions should record expressions that the planner can translate reliably. computeAndAssignOutputScope must use rewritten SELECT expressions rather than the original SingleColumn.getExpression(). Output field type, source-column analysis, and origin-column inference must be based on the same rewritten expression stored in Analysis.SelectExpression. This avoids inconsistencies where an expression has been analyzed after rewriting but the output scope still refers to the original AST. The planner should not introduce new alias semantics. It should consume the already rewritten expressions from: analysis.getSelectExpressions() analysis.getOrderByExpressions() analysis.getGroupingSets() This keeps projection, aggregation, sorting, gap-fill, distinct, and window validation consistent with the analyzer result. Validation Plan The fix should include analyzer tests and table-model integration tests for the following cases: GROUP BY alias ORDER BY alias lateral column alias with scalar expressions lateral column alias with aggregate aliases chained lateral column aliases duplicate alias ambiguity input column precedence over alias in GROUP BY output alias precedence in ORDER BY input column precedence over alias in LCA qualified names not rewritten as aliases subquery alias scope isolation WHERE alias rejected HAVING alias rejected GROUP BY ordinal unchanged ORDER BY ordinal unchanged full expression GROUP BY unchanged full expression ORDER BY unchanged date_bin behavior unchanged date_bin_gapfill behavior unchanged The implementation should be verified with: mvn spotless:apply -pl iotdb-core/datanode mvn compile -pl iotdb-core/datanode mvn test -pl iotdb-core/datanode -Dtest=<new-or-modified-test-class> Expected Result After this fix, IoTDB table model SQL should fully support SELECT alias reuse in GROUP BY, ORDER BY, and later SELECT items. The implementation should preserve existing ordinal behavior, existing expression semantics, and existing query scope boundaries. *Best regards,* *Bryan Yang(杨易达)*
