[ 
https://issues.apache.org/jira/browse/FLINK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramin Gharib updated FLINK-40039:
---------------------------------
    Summary: CTAS/RTAS do not work for PTFs with set semantics  (was: CTAS/RTAS 
do not work for PTFs with set semantics and PARTITION BY )

> CTAS/RTAS do not work for PTFs with set semantics
> -------------------------------------------------
>
>                 Key: FLINK-40039
>                 URL: https://issues.apache.org/jira/browse/FLINK-40039
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API, Table SQL / Planner
>            Reporter: Ramin Gharib
>            Priority: Major
>
> {{CREATE TABLE ... AS SELECT}} and REPLACE TABLE ... AS over a set-semantic 
> process table function (a PTF whose table argument uses PARTITION BY) crashed 
> during planning with IllegalStateException: must call validate first.
> h3. Steps to re-produce
> {code:java}
> CREATE TABLE sink_tbl 
> WITH ('connector' = 'values') 
> AS 
>   SELECT * FROM f(r => TABLE t PARTITION BY name, i => 1); {code}
> f beinig a PTF with set-semantics. This will ouput:
> {code:java}
> java.lang.IllegalStateException: must call validate first  
> at IdentifierNamespace.resolve(...)
> at SqlToRelConverter.substituteSubQueryOfSetSemanticsInputTable(...)  
> ...  
> at SqlCreateTableAsConverter.convertSqlNode(...) {code}
> h3. Root Cause
> Planning a query is a pipeline: parse → validate → convert (sql-to-rel). 
> During validate, Calcite attaches a "namespace" to each table. During 
> convert, the engine needs the namespace. 
> CTAS and RTAS validated and convert the AS-query {*}twice{*}:
>  # once to figure out the new table's columns, and
>  # again inside a helper ({{{}MergeTableAsUtil.maybeRewriteQuery{}}}) that 
> *rebuilt the query as new SQL and re-converted it* to reconcile columns with 
> the sink.
> On that second pass the set-semantic input lost its namespace, so sql-to-rel 
> hit an un-namespaced table and threw {{{}must call validate first{}}}.
> Plain {{INSERT and MT-AS}} validates once, so it was always fine.
> h3. The Fix
> {{maybeRewriteQuery}} reconciles the query's columns with the target table 
> (reorder columns, add NULL for columns the query doesn't produce). It used to 
> do this by rewriting the SQL text and re-validating:
> {code:java}
> // BEFORE: build a new SQL query and validate + convert it again  <-- second 
> validation
> SqlCall newSelect = rewriteCall(origQueryNode, ...);
> return convert(newSelect); {code}
> Now it reshapes the columns as a relational projection bolted on top of the 
> already-converted plan — no second validation:
> {code:java}
> // AFTER: add one projection over the plan we already built
> // existing column -> RexInputRef;  missing column -> NULL literal
> RelNode projected = relBuilder.push(queryRelNode).project(projects, 
> fieldNames, true).build();
> return new PlannerQueryOperation(projected, ...); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to