[
https://issues.apache.org/jira/browse/FLINK-38444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gustavo de Morais updated FLINK-38444:
--------------------------------------
Description:
The planner currently considers the union of both the upsert keys from the left
and from the right to be a valid resulting upsert key. That's true for inner
joins but for left joins that leads to a resulting upsert key that contains
columns that can be null, which is not valid.
Example and simplified repro steps:
{code:java}
-- Table 'orders' with an incomplete primary key
CREATE TABLE `orders_with_composite_key` (
`order_id` BIGINT NOT NULL,
`user_id` INT NOT NULL,
`item_name` STRING,
CONSTRAINT `PRIMARY` PRIMARY KEY (`order_id`, `user_id`) NOT ENFORCED
);
-- Table 'users'
CREATE TABLE `users` (
`user_id` INT NOT NULL,
`other_data` STRING,
CONSTRAINT `PRIMARY` PRIMARY KEY (`user_id`) NOT ENFORCED
);
SELECT
o.user_id,
o.order_id,
u.user_id
FROM `users` AS u
LEFT JOIN `orders_with_composite_key` AS o
ON o.user_id = u.user_id
-- Bug: this incorrectly infers the following upsert join from the join
(user_id,order_id,user_id0)
{code}
This is not valid since the columns on the right can be null and can't be used
as part of an upsert key.
This is where we calculate the unique keys and where we need to consider the
nullability of the fields as we do for the other cases below
[https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/metadata/FlinkRelMdUniqueKeys.scala#L593]
was:
The planner currently considers a union of both the upsert keys from the left
and from the right to be a valid resulting upsert key. That's true for inner
joins but for left joins that leads to a resulting upsert key that contains
columns that can be null, which is not valid.
Example and simplified repro steps:
{code:java}
-- Table 'orders' with an incomplete primary key
CREATE TABLE `orders_with_composite_key` (
`order_id` BIGINT NOT NULL,
`user_id` INT NOT NULL,
`item_name` STRING,
CONSTRAINT `PRIMARY` PRIMARY KEY (`order_id`, `user_id`) NOT ENFORCED
);
-- Table 'users'
CREATE TABLE `users` (
`user_id` INT NOT NULL,
`other_data` STRING,
CONSTRAINT `PRIMARY` PRIMARY KEY (`user_id`) NOT ENFORCED
);
SELECT
o.user_id,
o.order_id,
u.user_id
FROM `users` AS u
LEFT JOIN `orders_with_composite_key` AS o
ON o.user_id = u.user_id
-- Bug: this incorrectly infers the following upsert join from the join
(user_id,order_id,user_id0)
{code}
This is not valid since the columns on the right can be null and can't be used
as part of an upsert key.
This is where we calculate the unique keys and where we need to consider the
nullability of the fields as we do for the other cases below
[https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/metadata/FlinkRelMdUniqueKeys.scala#L593]
> Join calculates upsert keys with null keys for left join
> --------------------------------------------------------
>
> Key: FLINK-38444
> URL: https://issues.apache.org/jira/browse/FLINK-38444
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Reporter: Gustavo de Morais
> Priority: Major
> Fix For: 2.2.0
>
>
> The planner currently considers the union of both the upsert keys from the
> left and from the right to be a valid resulting upsert key. That's true for
> inner joins but for left joins that leads to a resulting upsert key that
> contains columns that can be null, which is not valid.
> Example and simplified repro steps:
> {code:java}
> -- Table 'orders' with an incomplete primary key
> CREATE TABLE `orders_with_composite_key` (
> `order_id` BIGINT NOT NULL,
> `user_id` INT NOT NULL,
> `item_name` STRING,
> CONSTRAINT `PRIMARY` PRIMARY KEY (`order_id`, `user_id`) NOT ENFORCED
> );
>
> -- Table 'users'
> CREATE TABLE `users` (
> `user_id` INT NOT NULL,
> `other_data` STRING,
> CONSTRAINT `PRIMARY` PRIMARY KEY (`user_id`) NOT ENFORCED
> );
> SELECT
> o.user_id,
> o.order_id,
> u.user_id
> FROM `users` AS u
> LEFT JOIN `orders_with_composite_key` AS o
> ON o.user_id = u.user_id
> -- Bug: this incorrectly infers the following upsert join from the join
> (user_id,order_id,user_id0)
> {code}
> This is not valid since the columns on the right can be null and can't be
> used as part of an upsert key.
> This is where we calculate the unique keys and where we need to consider the
> nullability of the fields as we do for the other cases below
> [https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/metadata/FlinkRelMdUniqueKeys.scala#L593]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)