zjjiang created FLINK-38024:
-------------------------------
Summary: Unify the timestamp types to use TimestampData with time
zone information.
Key: FLINK-38024
URL: https://issues.apache.org/jira/browse/FLINK-38024
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Affects Versions: cdc-3.4.0, cdc-3.5.0
Reporter: zjjiang
In the current FlinkCDC implementation of timestamp types, the definition of
TIMESTAMP and TIMESTAMP_LTZ types and the corresponding internal cdc
implementation classes have some semantic confusion and practical use
deviation, which brings difficulties to users' understanding, development,
debugging and maintenance.
Meanwhile, the lack of time zone information leads to the possibility of time
offset problem when cross-synchronizing these two types, as described in
[FLINK-36806|https://issues.apache.org/jira/browse/FLINK-36806].
*1. Semantic deviation of TimestampData class*
* Semantics: `TimestampData` was originally designed to represent a “timestamp
of UTC+0”.
* Practical use: In code, it is widely used as the carrier structure of
TIMESTAMP WITHOUT TIME ZONE, which is supposed to represent local time, i.e.,
without the semantics of time zone, and is not equivalent to UTC.
* Confusion points:
** Users could mistakenly believe that TimestampData represents UTC time;
** What is actually stored is a time stamp with no time zone (e.g. “2025-06-27
10:00:00”), not UTC;
** Ambiguity may occur during time zone conversion or cross-system
synchronization (e.g. Kafka -> Iceberg).
*2. LocalZonedTimestampData class usage bias*
* Semantics: theoretically, it should represent an arbitrary epoch timestamp,
i.e., the local time interpreted according to the specified time zone.
* Practical use: actually carries a value of type TIMESTAMP_LTZ, which
represents a timestamp that has been converted to UTC.
* Confusion points:
** LocalZoned in the name can be easily misinterpreted as “local time with
time zone”, but it is already UTC in reality;
** There is semantic inconsistency between the class and the use of
TIMESTAMP_LTZ in the FlinkCDC type system;
** Type mismatch is easy to be generated when interacting with Flink Planner
or Table API.
There are obvious inconsistencies and confusion in both type design and actual
semantics. In order to avoid the problems of time offset and cross-system
ambiguity, it is necessary to unify all timestamp types into expressions that
contain time zone information.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)