JNSimba opened a new pull request, #64511:
URL: https://github.com/apache/doris/pull/64511

   ### What problem does this PR solve?
   
   The cdc_client builds debezium's `ChangeEventQueue` with only a count-based 
bound (`max.queue.size=8192`) while the byte bound (`max.queue.size.in.bytes`) 
defaults to `0` (disabled). With wide rows (e.g. ~2MB each), the in-memory 
queue can grow to `2MB * 8192 ≈ 16GB` and OOM the process. Both PostgreSQL and 
MySQL paths build the queue from `getMaxQueueSizeInBytes()`, so a single 
property covers both, and it applies to both the snapshot and streaming phases.
   
   ### What this PR does
   
   Set a heap-adaptive byte cap on the queue buffer in 
`ConfigUtil.getDefaultDebeziumProps()`, which is shared by the Postgres and 
MySQL source readers:
   
   - Default cap is `clamp(heap/16, 64MB, 256MB)`: heap 1G -> 64MB, 2G -> 
128MB, >= 4G -> 256MB.
   - The cap is intentionally conservative because a single cdc_client JVM can 
run many queues concurrently (one per split, across multiple jobs), and the 
real batching/backpressure happens downstream in the sink rather than in this 
queue.
   - Escape hatch: `-Dcdc.max.queue.size.in.bytes=<bytes>` overrides the 
adaptive value (absolute bytes; `<= 0` disables the byte bound).
   
   Narrow tables are unaffected: 8192 rows stay well under 64MB, so the count 
bound is reached first and behavior is unchanged.
   
   ### Release note
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to