JNSimba opened a new pull request, #63219:
URL: https://github.com/apache/doris/pull/63219
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
PG streaming jobs with a temporal chunk-key column (e.g. `DATE` PK) fail
during snapshot read with:
```
PSQLException: ERROR: operator does not exist: date <= character varying
Hint: No operator matches the given name and argument types.
```
**Root cause**
cdc_client reports snapshot splits to FE; FE persists the offset as JSON via
Spring Jackson, where `SqlDateSerializer` writes `java.sql.Date` as a
`"yyyy-MM-dd"` string. When the offset is later read back,
`ObjectMapper.convertValue(offset, SnapshotSplit.class)` lands the bound values
back into `Object[]` as `String` (type info is gone). At read time
`PostgresQueryUtils.readTableSplitDataStatement` calls bare
`statement.setObject(idx, splitEnd[i])`; PG JDBC sends `String` as VARCHAR oid,
and PostgreSQL — strict about operator resolution — refuses `date <= varchar`
(no implicit cast from varchar to date).
The same FE round-trip happens for MySQL too. MySQL server is lenient enough
to implicitly coerce the bound, so the surface error does not appear, but
`SplitKeyUtils.compareObjects` falls back to `toString()` comparison whenever
the restored type doesn't match the Debezium connect-schema type — a latent
issue worth keeping consistent.
**Fix**
Restore the original Java types at the cdc_client side, where the loss
happens:
- `AbstractCdcSourceReader` adds:
- abstract `probeSplitKeyClass(TableId, Column, JobBaseConfig)` —
dialect-specific lookup
- `resolveSplitKeyClass(...)` — per-column cached wrapper (1 probe per
table.column, reused across splits)
- static `convertBounds(Object[], Class<?>, ObjectMapper)` — restores
Object[] elements to the target class
- `PostgresSourceReader` / `MySqlSourceReader` override
`probeSplitKeyClass`: run `SELECT col FROM table WHERE 1=0` and use
`ResultSetMetaData.getColumnClassName(1)` so the JDBC driver itself decides the
Java type. Probe failure throws (no silent fallback — silent fallback would let
the original bug recur).
- `JdbcIncrementalSourceReader.createSnapshotSplit` / `createStreamSplit`
and `MySqlSourceReader.createSnapshotSplit` / `createBinlogSplit` (4 sites
total) apply `convertBounds` before constructing Flink CDC's `SnapshotSplit` /
`FinishedSnapshotSplitInfo`.
- `convertBounds` special-cases `java.sql.Date` / `Timestamp` / `Time` via
`valueOf` to match the JVM-default-TZ semantics of `rs.getObject` (Jackson's
default `SqlDateDeserializer` hard-codes GMT, which would shift the value in
non-UTC TZs). Other types fall through to `ObjectMapper.convertValue`.
### Release note
Fix PG/MySQL streaming snapshot failing on temporal chunk-key columns after
FE offset JSON round-trip strips the original Java type.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [x] No.
- [ ] Yes.
- Does this need documentation?
- [x] No.
- [ ] Yes.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]