Jim Hughes created FLINK-39558:
----------------------------------
Summary: LogicalUnnestRule produces incorrect TableFunctionScan
rowType, masking nullability and breaking LEFT JOIN UNNEST
Key: FLINK-39558
URL: https://issues.apache.org/jira/browse/FLINK-39558
Project: Flink
Issue Type: Bug
Components: Table SQL / Planner
Affects Versions: 2.1.1, 2.0.1
Reporter: Jim Hughes
h2. Problem
{{LogicalUnnestRule}} converts {{Correlate(Uncollect)}} into
{{Correlate(LogicalTableFunctionScan)}}. When deriving the TFS rowType, the
rule round-trips through Flink's logical type system:
Calcite RelDataType
-> Flink LogicalType (via FlinkTypeFactory.toLogicalType)
-> UnnestRowsFunctionBase.getUnnestedType(...)
-> RowType wrapper
-> Calcite RelDataType (via createFieldTypeFromLogicalType)
This round-trip drops per-field nullability that Calcite's upstream Correlate
has derived, and synthesizes field names ({{f0}}, {{EXPR$0}}) instead of using
Calcite's source-derived names. The most visible consequence is the {{LEFT JOIN
UNNEST}} crash documented in FLINK-33217, but the round-trip is broader than
that fix addresses.
h3. Reproducer
For a table with a nullable array of NOT-NULL elements:
{code:sql}
CREATE TABLE nested_not_null (
business_data ARRAY<STRING NOT NULL>,
nested ROW<`data` ARRAY<STRING NOT NULL>>,
nested_array ARRAY<ROW<`data` ARRAY<STRING NOT NULL>> NOT NULL>
);
-- (a) Bare Uncollect under LEFT correlate
SELECT * FROM nested_not_null
LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd ON TRUE;
-- (b) ON-predicate adds a Filter between Correlate and Uncollect
SELECT * FROM nested_not_null
LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd
ON exploded_bd <> 'debug';
{code}
Both queries fail compilation with:
{noformat}
java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(... VARCHAR(2147483647) business_data0) NOT NULL
expression type is RecordType(... VARCHAR(2147483647) NOT NULL f0) NOT NULL
Difference:
business_data0: VARCHAR(2147483647) -> VARCHAR(2147483647) NOT NULL
{noformat}
h2. Root Cause
The TFS rowType produced by the rule does not match the rowType Calcite's
Correlate derives for the original {{Correlate(Uncollect)}} tree. Specifically:
* Calcite's {{Uncollect.deriveUncollectRowType}} produces field names derived
from the source array column (e.g. {{business_data0}}) and propagates container
nullability.
* Flink's {{UnnestRowsFunctionBase.getUnnestedType}} returns just the element
type, which is then wrapped into a synthetic {{RowType.of(...)}} with default
field names ({{f0}}). The container's nullability is dropped.
When {{LogicalUnnestRule}} replaces the {{Uncollect}} with a
{{LogicalTableFunctionScan}} carrying this synthesised rowType, the new
{{Correlate}}'s derived rowType diverges from the original.
{{RelOptUtil.verifyTypeEquivalence}} then fails.
h3. Why the previous fix is incomplete
FLINK-33217 added {{getLogicalProjectWithAdjustedNullability}}, which inserts
{{CAST}}-to-nullable wrappers on each projection in the LEFT-correlate Project
branch. This patches the symptom but leaves the underlying round-trip in place,
so it covers only one of the four shapes the rule matches:
|| Shape || Pre-fix behavior ||
| {{Correlate(LEFT) -> Project(Uncollect)}} | Patched by FLINK-33217 |
| {{Correlate(LEFT) -> Uncollect}} | *Crashes* — no Project to attach CASTs to |
| {{Correlate(LEFT) -> Filter(Uncollect)}} | *Crashes* — same reason |
| {{Correlate(LEFT) -> Filter(Project(Uncollect))}} | Coincidentally fine on
master, but documented in earlier investigation as a gap |
h2. Fix
Remove the LogicalType round-trip and use Calcite's {{Uncollect.getRowType()}}
directly as the TFS rowType. The new {{Correlate}}'s rowType then matches the
original byte-for-byte regardless of the rule's intermediate shape.
This also lets the dead {{getLogicalProjectWithAdjustedNullability}} helper and
its dependent imports be removed.
h3. Field naming
Calcite's Uncollect derives names from the source array, which means plan
output for UNNEST changes:
* {{ARRAY<T>}} / {{MULTISET<T>}}: the unnested column is named after the source
array column (e.g., {{tags0}} instead of synthetic {{f0}} or {{EXPR$0}}).
* {{MAP<K,V>}}: key/value columns are named {{KEY}} and {{VALUE}} instead of
{{f0}}/{{f1}}.
* {{WITH ORDINALITY}}: the ordinality column is {{ORDINALITY}} (unchanged).
Multiple unnests in the same query are auto-disambiguated by Calcite's outer
Correlate (e.g. two MAP unnests produce {{KEY, VALUE, KEY0, VALUE0}}).
The {{INTERNAL_UNNEST_ROWS}} runtime function is positional, so persisted
CompiledPlans continue to restore correctly. Verified by
{{CorrelateRestoreTest}}.
h2. Plan-fixture impact
Recorded {{verifyRelPlan}} fixtures change uniformly to reflect the new naming:
* {{flink-table/flink-table-planner/src/test/resources/.../UnnestTest.xml}}
(batch + stream)
*
{{flink-table/flink-table-planner/src/test/resources/.../LogicalUnnestRuleTest.xml}}
* {{flink-table/flink-table-planner/src/test/resources/.../MultiJoinTest.xml}}
(one entry)
*
{{flink-table/flink-table-planner/src/test/resources/.../JavaCatalogTableTest.xml}}
(one entry)
h2. Test coverage
Two reproducers added to {{UnnestTestBase}} (commit 1, intentionally failing
without the fix):
* {{testNullMismatchLeftJoinNoAliasList}}: bare {{Uncollect}} under LEFT
correlate
* {{testNullMismatchLeftJoinOnPredicate}}: {{Filter(Uncollect)}} under LEFT
correlate
Generated-by: Claude Code
--
This message was sent by Atlassian Jira
(v8.20.10#820010)