davidzollo commented on issue #10641:
URL: https://github.com/apache/seatunnel/issues/10641#issuecomment-4148240742

   Hi @ricky2129 , thanks for the incredibly detailed design doc. This is 
indeed a critical pain point for the Enterprise CDC-to-Data-Lake pipeline, and 
relying on strict transaction log coordinates instead of `EventTime` is the 
absolute right way to go.
   
   Overall, your design looks very elegant and safe:
   1. Putting these extra descriptors in `SeaTunnelRow.options` without 
interfering with core layout and serialization logic is correct and protects 
the checkpoint safety.
   2. Generating `null` gracefully for non-MySQL sources and keeping backward 
compatibility is fully acknowledged.
   
   A few points/suggestions before we proceed with the PR:
   
    1. **About Field Naming Generalization:** Since different databases have 
different position terminologies (e.g., LSN for PostgreSQL, SCN for Oracle), 
maybe we could think abstractly whether we should use generalized names like 
`LogFile`, `LogPos`, or `LogSequence` if we want to make the `Metadata` 
transform generic across JDBC sources down the road. If you prefer sticking 
with explicit MySQL `BinlogXX` terminologies for this PR due to exactness, that 
is perfectly fine with me. We can document it specifically.
    2. **Snapshot Phase Null Handling:** As you mentioned, `startup.mode` for 
historical snapshots will produce `null` for these binlog positions. Just make 
sure we can provide a small best-practice note in the config documentation on 
how users should write their `COALESCE` query in downstream engines to 
correctly sequence snapshot data vs incremental binlog data.
    3. **E2E Testing:** Please ensure that we cover this new metadata 
extraction via both Unit Tests and a test case in our E2E module verifying both 
the value population and the `null` fallback logic.
   
   The design doc looks great to me. Feel free to assign this to yourself and 
open a PR ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to