JNSimba opened a new pull request, #63833:
URL: https://github.com/apache/doris/pull/63833

   ## Summary
   - Default-skip flink-cdc's in-snapshot backfill on the from-to path so large 
splits no longer accumulate the entire chunk + backfill stream in the fetcher's 
outputBuffer; from-to is at-least-once and tolerates the duplicates this 
introduces. TVF (job-driven and standalone) keeps the standard `false` default 
for exactly-once via per-task offset commit.
   - Expose `skip_snapshot_backfill` as a user-facing property with strict 
`true`/`false` validation on both from-to (CREATE JOB) and TVF (SELECT FROM 
cdc_stream(...)) entry points.
   - Fix snapshot completion under `pollWithoutBuffer`: a split is now marked 
complete only after its high-watermark event has been consumed 
(`splitState.getHighWatermark() != null`), not on the first non-empty fetcher 
batch. Without this, enabling the new default truncates any split larger than 
debezium's `max.batch.size` and yields an NPE on offset extraction.
   - Read `streaming_task_timeout_multiplier` live in 
`StreamingMultiTblTask.isTimeout()` so `admin set frontend config` affects 
already-running tasks, matching the `@ConfField(mutable=true)` contract.
   
   ## Test plan
   - [ ] \`mvn compile\` passes for \`fe-core\` and \`cdc_client\`
   - [ ] New \`test_streaming_postgres_job_snapshot_fat_split\` / 
\`test_streaming_mysql_job_snapshot_fat_split\` pass: 2100 rows with 
\`snapshot_split_size=3000\` (single split exceeds \`max.batch.size=2048\`), 
asserting count=2100, distinct=2100, \`id BETWEEN 2049 AND 2100\`=52, and 
post-snapshot DML still flows
   - [ ] Existing \`test_streaming_*_id_gap_completeness\` / 
\`test_streaming_*_snapshot\` / \`test_streaming_*_async_split*\` regressions 
still pass
   - [ ] Validator rejects \`skip_snapshot_backfill=foo\` at SQL analysis on 
both CREATE JOB and \`cdc_stream\` TVF
   - [ ] \`admin set frontend config 
("streaming_task_timeout_multiplier"="N")\` while a from-to task is running 
takes effect on the running task


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to