[
https://issues.apache.org/jira/browse/FLINK-39125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gustavo de Morais updated FLINK-39125:
--------------------------------------
Description:
When users cast a VARCHAR key column to VARBINARY, the upsert key uniqueness is
lost because the cast is not recognized as injective.UTF-8 encoding is itself
injective - distinct strings always produce distinct byte sequences - so we can
safely mark these casts as injective when the binary target has sufficient
capacity. The cast is injective under the following conditions:
* VARCHAR(MAX) → VARBINARY(MAX): both sides are unbounded
* VARCHAR → VARBINARY where y >= x * 4: target can hold the worst-case UTF-8
encoding (4 bytes per character)
* Bounded source to unbounded (MAX) target: always fits
This applies to all four cross-family combinations: CHAR/VARCHAR to
BINARY/VARBINARY.
was:
When users cast a VARCHAR key column to VARBINARY, the upsert key uniqueness is
lost because the cast is not recognized as injective.UTF-8 encoding is itself
injective - distinct strings always produce distinct byte sequences - so we can
safely mark these casts as injective when the binary target has sufficient
capacity. The cast is injective under the following conditions:
* VARCHAR(MAX) → VARBINARY(MAX): both sides are unbounded
* VARCHAR(x) → VARBINARY(y) where y >= x * 4: target can hold the worst-case
UTF-8 encoding (4 bytes per character)
* Bounded source to unbounded (MAX) target: always fits
This applies to all four cross-family combinations: CHAR/VARCHAR to
BINARY/VARBINARY.
> Support injective casts from CHAR/VARCHAR to BINARY/VARBINARY for upsert key
> preservation
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-39125
> URL: https://issues.apache.org/jira/browse/FLINK-39125
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Planner
> Affects Versions: 2.2.0
> Reporter: Gustavo de Morais
> Assignee: Gustavo de Morais
> Priority: Major
> Fix For: 2.3.0
>
>
> When users cast a VARCHAR key column to VARBINARY, the upsert key uniqueness
> is lost because the cast is not recognized as injective.UTF-8 encoding is
> itself injective - distinct strings always produce distinct byte sequences -
> so we can safely mark these casts as injective when the binary target has
> sufficient capacity. The cast is injective under the following conditions:
> * VARCHAR(MAX) → VARBINARY(MAX): both sides are unbounded
> * VARCHAR → VARBINARY where y >= x * 4: target can hold the worst-case UTF-8
> encoding (4 bytes per character)
> * Bounded source to unbounded (MAX) target: always fits
> This applies to all four cross-family combinations: CHAR/VARCHAR to
> BINARY/VARBINARY.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)