JNSimba opened a new pull request, #63490:
URL: https://github.com/apache/doris/pull/63490

   ## Summary
   
   - Add an optional `server_id` source property for MySQL CDC streaming jobs. 
Accepts a single value (e.g. `5400`) or a range (e.g. `5400-5408`). When unset, 
the value is derived from the jobId hash so existing jobs keep their current 
server_id when `snapshot_parallelism = 1`.
   - Fix a latent collision: when `snapshot_parallelism > 1` and source-side 
DML happens during snapshot, all parallel `SnapshotSplitReader` instances 
previously shared the same server_id and their backfill BinaryLogClient 
connections kicked each other out of MySQL's dump-thread slot, dropping binlog 
events between low and high watermark. Each subtask now gets a distinct 
server_id from the resolved range; the single binlog reader uses the range 
start.
   - Cross-field check: reject `server_id` range width smaller than 
`snapshot_parallelism` at job startup with a clear fix-it suggestion.
   
   ## Test plan
   
   - [x] `ConfigUtilTest`: 15 cases covering default-derive (Integer.MAX_VALUE 
hash clamp, hash=0 bump), user single value / range, malformed input, blank 
input, range-vs-parallelism width check, non-positive parallelism.
   - [ ] `test_streaming_mysql_job_server_id.groovy`: 4 rejection cases (format 
/ zero / backward / width) under `offset=latest` for synchronous CREATE JOB 
feedback, plus 3 happy-path cases verifying snapshot data syncs under single 
value / range / default-derive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to