JNSimba opened a new pull request, #63490: URL: https://github.com/apache/doris/pull/63490
## Summary - Add an optional `server_id` source property for MySQL CDC streaming jobs. Accepts a single value (e.g. `5400`) or a range (e.g. `5400-5408`). When unset, the value is derived from the jobId hash so existing jobs keep their current server_id when `snapshot_parallelism = 1`. - Fix a latent collision: when `snapshot_parallelism > 1` and source-side DML happens during snapshot, all parallel `SnapshotSplitReader` instances previously shared the same server_id and their backfill BinaryLogClient connections kicked each other out of MySQL's dump-thread slot, dropping binlog events between low and high watermark. Each subtask now gets a distinct server_id from the resolved range; the single binlog reader uses the range start. - Cross-field check: reject `server_id` range width smaller than `snapshot_parallelism` at job startup with a clear fix-it suggestion. ## Test plan - [x] `ConfigUtilTest`: 15 cases covering default-derive (Integer.MAX_VALUE hash clamp, hash=0 bump), user single value / range, malformed input, blank input, range-vs-parallelism width check, non-positive parallelism. - [ ] `test_streaming_mysql_job_server_id.groovy`: 4 rejection cases (format / zero / backward / width) under `offset=latest` for synchronous CREATE JOB feedback, plus 3 happy-path cases verifying snapshot data syncs under single value / range / default-derive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
