davidzollo commented on issue #10641:
URL: https://github.com/apache/seatunnel/issues/10641#issuecomment-4148240742
Hi @ricky2129 , thanks for the incredibly detailed design doc. This is
indeed a critical pain point for the Enterprise CDC-to-Data-Lake pipeline, and
relying on strict transaction log coordinates instead of `EventTime` is the
absolute right way to go.
Overall, your design looks very elegant and safe:
1. Putting these extra descriptors in `SeaTunnelRow.options` without
interfering with core layout and serialization logic is correct and protects
the checkpoint safety.
2. Generating `null` gracefully for non-MySQL sources and keeping backward
compatibility is fully acknowledged.
A few points/suggestions before we proceed with the PR:
1. **About Field Naming Generalization:** Since different databases have
different position terminologies (e.g., LSN for PostgreSQL, SCN for Oracle),
maybe we could think abstractly whether we should use generalized names like
`LogFile`, `LogPos`, or `LogSequence` if we want to make the `Metadata`
transform generic across JDBC sources down the road. If you prefer sticking
with explicit MySQL `BinlogXX` terminologies for this PR due to exactness, that
is perfectly fine with me. We can document it specifically.
2. **Snapshot Phase Null Handling:** As you mentioned, `startup.mode` for
historical snapshots will produce `null` for these binlog positions. Just make
sure we can provide a small best-practice note in the config documentation on
how users should write their `COALESCE` query in downstream engines to
correctly sequence snapshot data vs incremental binlog data.
3. **E2E Testing:** Please ensure that we cover this new metadata
extraction via both Unit Tests and a test case in our E2E module verifying both
the value population and the `null` fallback logic.
The design doc looks great to me. Feel free to assign this to yourself and
open a PR !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]