[
https://issues.apache.org/jira/browse/FLINK-39558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079259#comment-18079259
]
Sergey Nuyanzin commented on FLINK-39558:
-----------------------------------------
Merged as
[0d0a8977d561b628bcf1dc7aa84ca9b4c8e58a61|https://github.com/apache/flink/commit/0d0a8977d561b628bcf1dc7aa84ca9b4c8e58a61]
> LogicalUnnestRule produces incorrect TableFunctionScan rowType, masking
> nullability and breaking LEFT JOIN UNNEST
> -----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39558
> URL: https://issues.apache.org/jira/browse/FLINK-39558
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 2.0.1, 2.1.1
> Reporter: Jim Hughes
> Assignee: Jim Hughes
> Priority: Major
> Labels: pull-request-available
>
> h2. Problem
> {{LogicalUnnestRule}} converts {{Correlate(Uncollect)}} into
> {{Correlate(LogicalTableFunctionScan)}}. When deriving the TFS rowType, the
> rule round-trips through Flink's logical type system:
> Calcite RelDataType
> -> Flink LogicalType (via FlinkTypeFactory.toLogicalType)
> -> UnnestRowsFunctionBase.getUnnestedType(...)
> -> RowType wrapper
> -> Calcite RelDataType (via createFieldTypeFromLogicalType)
> This round-trip drops per-field nullability that Calcite's upstream Correlate
> has derived, and synthesizes field names ({{f0}}, {{EXPR$0}}) instead of
> using Calcite's source-derived names. The most visible consequence is the
> {{LEFT JOIN UNNEST}} crash documented in FLINK-33217, but the round-trip is
> broader than that fix addresses.
> h3. Reproducer
> For a table with a nullable array of NOT-NULL elements:
> {code:sql}
> CREATE TABLE nested_not_null (
> business_data ARRAY<STRING NOT NULL>,
> nested ROW<`data` ARRAY<STRING NOT NULL>>,
> nested_array ARRAY<ROW<`data` ARRAY<STRING NOT NULL>> NOT NULL>
> );
> -- (a) Bare Uncollect under LEFT correlate
> SELECT * FROM nested_not_null
> LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd ON TRUE;
> -- (b) ON-predicate adds a Filter between Correlate and Uncollect
> SELECT * FROM nested_not_null
> LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd
> ON exploded_bd <> 'debug';
> {code}
> Both queries fail compilation with:
> {noformat}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(... VARCHAR(2147483647) business_data0) NOT NULL
> expression type is RecordType(... VARCHAR(2147483647) NOT NULL f0) NOT NULL
> Difference:
> business_data0: VARCHAR(2147483647) -> VARCHAR(2147483647) NOT NULL
> {noformat}
> h2. Root Cause
> The TFS rowType produced by the rule does not match the rowType Calcite's
> Correlate derives for the original {{Correlate(Uncollect)}} tree.
> Specifically:
> * Calcite's {{Uncollect.deriveUncollectRowType}} produces field names derived
> from the source array column (e.g. {{business_data0}}) and propagates
> container nullability.
> * Flink's {{UnnestRowsFunctionBase.getUnnestedType}} returns just the element
> type, which is then wrapped into a synthetic {{RowType.of(...)}} with default
> field names ({{f0}}). The container's nullability is dropped.
> When {{LogicalUnnestRule}} replaces the {{Uncollect}} with a
> {{LogicalTableFunctionScan}} carrying this synthesised rowType, the new
> {{Correlate}}'s derived rowType diverges from the original.
> {{RelOptUtil.verifyTypeEquivalence}} then fails.
> h3. Why the previous fix is incomplete
> FLINK-33217 added {{getLogicalProjectWithAdjustedNullability}}, which inserts
> {{CAST}}-to-nullable wrappers on each projection in the LEFT-correlate
> Project branch. This patches the symptom but leaves the underlying round-trip
> in place, so it covers only one of the four shapes the rule matches:
> || Shape || Pre-fix behavior ||
> | {{Correlate(LEFT) -> Project(Uncollect)}} | Patched by FLINK-33217 |
> | {{Correlate(LEFT) -> Uncollect}} | *Crashes* — no Project to attach CASTs
> to |
> | {{Correlate(LEFT) -> Filter(Uncollect)}} | *Crashes* — same reason |
> | {{Correlate(LEFT) -> Filter(Project(Uncollect))}} | Coincidentally fine on
> master, but documented in earlier investigation as a gap |
> h2. Fix
> Remove the LogicalType round-trip and use Calcite's
> {{Uncollect.getRowType()}} directly as the TFS rowType. The new
> {{Correlate}}'s rowType then matches the original byte-for-byte regardless of
> the rule's intermediate shape.
> This also lets the dead {{getLogicalProjectWithAdjustedNullability}} helper
> and its dependent imports be removed.
> h3. Field naming
> Calcite's Uncollect derives names from the source array, which means plan
> output for UNNEST changes:
> * {{ARRAY<T>}} / {{MULTISET<T>}}: the unnested column is named after the
> source array column (e.g., {{tags0}} instead of synthetic {{f0}} or
> {{EXPR$0}}).
> * {{MAP<K,V>}}: key/value columns are named {{KEY}} and {{VALUE}} instead of
> {{f0}}/{{f1}}.
> * {{WITH ORDINALITY}}: the ordinality column is {{ORDINALITY}} (unchanged).
> Multiple unnests in the same query are auto-disambiguated by Calcite's outer
> Correlate (e.g. two MAP unnests produce {{KEY, VALUE, KEY0, VALUE0}}).
> The {{INTERNAL_UNNEST_ROWS}} runtime function is positional, so persisted
> CompiledPlans continue to restore correctly. Verified by
> {{CorrelateRestoreTest}}.
> h2. Plan-fixture impact
> Recorded {{verifyRelPlan}} fixtures change uniformly to reflect the new
> naming:
> * {{flink-table/flink-table-planner/src/test/resources/.../UnnestTest.xml}}
> (batch + stream)
> *
> {{flink-table/flink-table-planner/src/test/resources/.../LogicalUnnestRuleTest.xml}}
> *
> {{flink-table/flink-table-planner/src/test/resources/.../MultiJoinTest.xml}}
> (one entry)
> *
> {{flink-table/flink-table-planner/src/test/resources/.../JavaCatalogTableTest.xml}}
> (one entry)
> h2. Test coverage
> Two reproducers added to {{UnnestTestBase}} (commit 1, intentionally failing
> without the fix):
> * {{testNullMismatchLeftJoinNoAliasList}}: bare {{Uncollect}} under LEFT
> correlate
> * {{testNullMismatchLeftJoinOnPredicate}}: {{Filter(Uncollect)}} under LEFT
> correlate
> Generated-by: Claude Code
--
This message was sent by Atlassian Jira
(v8.20.10#820010)