[I] [Bug] [connector-jdbc] [connector-iceberg] All NTZ timestamp columns are stored as LTZ in iceberg table [seatunnel]

via GitHub Tue, 31 Mar 2026 21:45:21 -0700


doyong365 opened a new issue, #10685:
URL: https://github.com/apache/seatunnel/issues/10685


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   NTZ timestamp columns from source databases are stored as LTZ in the target 
without timezone conversion. SeaTunnel's JDBC connectors do not distinguish NTZ 
from LTZ — both are mapped to the same internal type (`SqlType.TIMESTAMP` / 
`LOCAL_DATE_TIME_TYPE`) and unconditionally written as LTZ to the target (e.g. 
Iceberg `TimestampType.withZone()`). When reading, `rs.getTimestamp()` → 
`.toLocalDateTime()` applies the JVM timezone, and 
`Timestamp.valueOf(LocalDateTime).toInstant()` in the Spark layer reverses it — 
the two offsets cancel out, leaving the wall-clock value stored directly as UTC 
regardless of the source timezone.
   
   **Affected source types per database:**
   
   | Source DB | Type | NTZ/LTZ | Result |
   |---|---|---|---|
   | MySQL | `DATETIME` | NTZ | ⛔ Stored as LTZ |
   | MySQL | `TIMESTAMP` | LTZ | ✅ Correct |
   | MariaDB | `DATETIME` | NTZ | ⛔ Stored as LTZ |
   | PostgreSQL | `timestamp` | NTZ | ⛔ Stored as LTZ |
   | PostgreSQL | `timestamptz` | LTZ | ✅ Correct |
   | Oracle | `DATE` | NTZ | ⛔ Stored as LTZ |
   | Oracle | `TIMESTAMP` | NTZ | ⛔ Stored as LTZ |
   | MSSQL | `DATETIME` | NTZ | ⛔ Stored as LTZ |
   | DB2 | `TIMESTAMP` | NTZ | ⛔ Stored as LTZ |
   
   **Verified with the following test cases (Spark engine, JVM=KST+9):**
   
   | Source TZ | Source type | Source value | Result |
   |---|---|---|---|
   | KST(+9) | TIMESTAMP_NTZ | 2026-01-01 00:00:00 | 2026-01-01 00:00:00 UTC ❌ |
   | KST(+9) | TIMESTAMP_LTZ | 2026-01-01 00:00:00 KST | 2025-12-31 15:00:00 
UTC ✅ |
   | UTC | TIMESTAMP_NTZ | 2026-01-01 00:00:00 | 2026-01-01 00:00:00 UTC ➖ |
   | CST(-6) | TIMESTAMP_NTZ | 2026-01-01 00:00:00 | 2026-01-01 00:00:00 UTC ❌ |
   | CST(-6) | TIMESTAMP_LTZ | 2026-01-01 00:00:00 CST | 2026-01-01 06:00:00 
UTC ✅ |
   
   The NTZ case at UTC timezone appears correct only by coincidence (zero 
offset), not because of correct handling.
   
   ### Root cause
   
   The core issue is that SeaTunnel's internal type system has no way to 
distinguish NTZ from LTZ timestamp types. All DB-specific TypeConverters 
collapse both into the same internal type (`SqlType.TIMESTAMP` / 
`LOCAL_DATE_TIME_TYPE`), and the entire pipeline unconditionally writes every 
timestamp as LTZ to the target.
   
   **1. TypeConverters — NTZ and LTZ both mapped to `LOCAL_DATE_TIME_TYPE`**
   All DB-specific TypeConverters map NTZ and LTZ timestamp types to the 
identical SeaTunnel internal type, making it structurally impossible to 
distinguish them downstream.
   - `MySqlTypeConverter.java` `convert(BasicTypeDefine)` — `DATETIME` (NTZ) 
and `TIMESTAMP` (LTZ) both map to `LOCAL_DATE_TIME_TYPE`
   - `PostgresTypeConverter.java` `convert(BasicTypeDefine)` — `timestamp` 
(NTZ) and `timestamptz` (LTZ) both map to `LOCAL_DATE_TIME_TYPE`
   - `OracleTypeConverter.java` `convert(BasicTypeDefine)` — `DATE`, 
`TIMESTAMP` (NTZ) and `TIMESTAMP WITH TIME ZONE`, `TIMESTAMP WITH LOCAL TIME 
ZONE` (LTZ) all map to `LOCAL_DATE_TIME_TYPE`
   - `SqlServerTypeConverter.java` `convert(BasicTypeDefine)` — `DATETIME`, 
`DATETIME2` (NTZ) and `DATETIMEOFFSET` (LTZ) all map to `LOCAL_DATE_TIME_TYPE`
   - `GenericTypeConverter.java` `convert(BasicTypeDefine)` — `Types.TIMESTAMP` 
(covers both NTZ and LTZ reported by JDBC driver) maps to `LOCAL_DATE_TIME_TYPE`
   
   **2. `AbstractJdbcRowConverter.java` — `toInternal()`**
   NTZ and LTZ columns are read through the same code path. `rs.getTimestamp()` 
applies the JVM default timezone, which is correct for LTZ but corrupts NTZ 
wall-clock values. There is no separate `TIMESTAMP_TZ` case.
   
   **3. `AbstractJdbcRowConverter.java` — `setValueToStatementByDataType()`**
   On write, `Timestamp.valueOf(LocalDateTime)` re-applies the JVM default 
timezone, shifting the epoch value regardless of whether the original source 
type was NTZ or LTZ.
   
   **4. `InternalRowConverter.java` — `convert()` / `reconvert()`**
   In the Spark engine translation layer, `toLocalDateTime()` in 
`AbstractJdbcRowConverter` first maps the NTZ wall-clock value to a 
`LocalDateTime` while applying the JVM timezone offset (e.g. 
`2026-01-01T00:00:00Z` epoch → KST+9 → `2026-01-01T09:00:00`), and then 
`Timestamp.valueOf()` in `InternalRowConverter` immediately reverses it (e.g. 
KST-9 → epoch `2026-01-01T00:00:00Z`). The two JVM timezone offsets cancel each 
other out, so the NTZ wall-clock numeric value is preserved as-is in UTC — 
meaning a value that represents `2026-01-01 00:00:00 KST` is incorrectly stored 
as `2026-01-01 00:00:00 UTC` instead of the correct `2025-12-31 15:00:00 UTC`.
   
   **5. `TypeConverterUtils.java` — `convert(SeaTunnelDataType)`**
   SeaTunnel's `TIMESTAMP` (NTZ) is mapped to Spark's `TimestampType` which is 
always UTC-based (LTZ). There is no NTZ-aware handling anywhere in the codebase.
   
   **6. `IcebergTypeMapper.java` — `toIcebergType()`**
   SeaTunnel's `TIMESTAMP` is unconditionally mapped to 
`Types.TimestampType.withZone()` (LTZ). `withoutZone()` (NTZ) is never called.
   
   ### Expected behavior
   - NTZ source columns should be stored as NTZ in the target (e.g. Iceberg 
TimestampType.withoutZone())
   - LTZ source columns should be stored as LTZ in the target (e.g. Iceberg 
TimestampType.withZone())
   - The distinction should be preserved end-to-end through the SeaTunnel type 
system using SqlType.TIMESTAMP (NTZ) vs SqlType.TIMESTAMP_TZ (LTZ)
   
   ### Actual behavior
   All NTZ timestamp columns are stored as LTZ in the target. The wall-clock 
value is written directly as UTC without timezone conversion, because SeaTunnel 
has no mechanism to carry NTZ/LTZ distinction through its internal type system.
   
   ### SeaTunnel Version
   
   2.3.10
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 1
     job.mode = "BATCH"
   }
   
   source {
     Jdbc {
       url = "jdbc:mysql://localhost:3306/testdb"
       driver = "com.mysql.cj.jdbc.Driver"
       user = "root"
       password = "password"
       query = "SELECT datetime_col, timestamp_col FROM test_table"
     }
   }
   
   sink {
     Iceberg {
       catalog_name = "local"
       namespace = "default"
       table = "test_iceberg"
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark-3-connector-v2.sh --config config/test.conf
   ```
   
   ### Error Exception
   
   ```log
   No explicit exception is thrown. The job completes successfully but NTZ 
timestamp values are stored with incorrect timezone semantics in the target.
   ```
   
   ### Zeta or Flink or Spark Version
   
   zeta 2.3.10
   flink 1.17.2
   spark 3.3.3
   
   ### Java or Scala Version
   
   Java 11
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] [connector-jdbc] [connector-iceberg] All NTZ timestamp columns are stored as LTZ in iceberg table [seatunnel]

Reply via email to