jnh5y opened a new pull request, #28053:
URL: https://github.com/apache/flink/pull/28053
## What is the purpose of the change
Fixes an internal-error class in `LogicalUnnestRule` where the
`TableFunctionScan` rowType diverges from what Calcite derives for the original
`Correlate(Uncollect)` tree, causing `RelOptUtil.verifyTypeEquivalence` to fail
for `LEFT JOIN UNNEST` shapes that don't fit the FLINK-33217 patch path.
Repro (fails on `release-2.0` / master, passes with this PR):
```sql
CREATE TABLE nested_not_null (
business_data ARRAY<STRING NOT NULL>,
nested ROW<`data` ARRAY<STRING NOT NULL>>,
nested_array ARRAY<ROW<`data` ARRAY<STRING NOT NULL>> NOT NULL>
);
-- Bare Uncollect under LEFT correlate
SELECT * FROM nested_not_null
LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd ON TRUE;
-- ON-predicate adds a Filter between Correlate and Uncollect
SELECT * FROM nested_not_null
LEFT JOIN UNNEST(nested_not_null.business_data) AS exploded_bd
ON exploded_bd <> 'debug';
```
Both crash with `java.lang.AssertionError: Cannot add expression of
different type to set`.
Jira: [FLINK-39558](https://issues.apache.org/jira/browse/FLINK-39558)
## Brief change log
- Replace `UnnestRowsFunctionBase.getUnnestedType(...)` round-trip in
`LogicalUnnestRule` with Calcite's `uncollect.getRowType()`. This makes the
rewritten `Correlate`'s derived rowType match the original byte-for-byte.
- Remove the now-dead `getLogicalProjectWithAdjustedNullability` and
`createNullableType` helpers (FLINK-33217's CAST-to-nullable patchwork is no
longer needed because the divergence at the source is gone).
- Add reproducers for the two LEFT-JOIN-UNNEST shapes that were not covered
by FLINK-33217 (bare `Uncollect`, `Filter(Uncollect)`).
- Re-record plan fixtures in `UnnestTest.xml` (batch + stream),
`LogicalUnnestRuleTest.xml`, `MultiJoinTest.xml`, and
`JavaCatalogTableTest.xml` to reflect Calcite-derived field naming.
### Field naming change
Calcite's `Uncollect` derives field names from the source array, so plan
output for UNNEST columns changes:
- `ARRAY<T>` / `MULTISET<T>`: the unnested column is named after the source
array column (e.g., `tags0` instead of synthetic `f0` / `EXPR$0`).
- `MAP<K,V>`: key/value columns are named `KEY` and `VALUE` instead of `f0`
/ `f1`.
- `WITH ORDINALITY`: ordinality column is named `ORDINALITY` (unchanged).
Multiple unnests in the same query are auto-disambiguated by Calcite's outer
Correlate (e.g. two MAP unnests produce `KEY, VALUE, KEY0, VALUE0`).
The runtime `INTERNAL_UNNEST_ROWS` function is positional, so persisted
`CompiledPlan` instances continue to restore correctly (verified by
`CorrelateRestoreTest`).
## Verifying this change
This change adds 2 reproducer tests in commit 1, which fail at HEAD~1 with
`RelOptUtil.verifyTypeEquivalence` and pass with the fix in commit 2:
- `UnnestTestBase.testNullMismatchLeftJoinNoAliasList`
- `UnnestTestBase.testNullMismatchLeftJoinOnPredicate`
Full `flink-table-planner` suite passes (10691 tests, 0 failures, 41
skipped). `CorrelateRestoreTest` passes, demonstrating that persisted
CompiledPlans containing UNNEST continue to restore correctly.
## Does this pull request potentially affect one of the following parts?
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? n/a
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]