[I] [Bug](load) INSERT SELECT data invisible after quorum success with cancelled node channel [doris]

via GitHub Mon, 30 Mar 2026 20:28:41 -0700


xiaobijuan2026 opened a new issue, #61916:
URL: https://github.com/apache/doris/issues/61916


   ### Problem
   When `INSERT INTO ... SELECT ...` writes to multiple replicas, if one node 
channel is slow and times out during `close_wait`, it gets cancelled but NOT 
marked as failed. This causes:
   
   1. `close_wait` returns OK even though a node was cancelled
   2. FE is unaware of the failure, commits the transaction
   3. PUBLISH_VERSION task is sent to ALL nodes including the cancelled one
   4. Cancelled node can't find the rowset → publish fails
   5. Data stays COMMITTED but not VISIBLE for a long time (30+ minutes until 
retry)
   
   ### Root Cause
   In `IndexChannel::close_wait()` (vtablet_writer.cpp), when unfinished node 
channels are cancelled due to timeout, `mark_as_failed()` is not called. FE 
receives no error tablet info for the cancelled replicas.
   
   ### Fix
   After cancelling unfinished node channels in `close_wait` timeout:
   1. Call `mark_as_failed()` to record failed tablets
   2. Call `check_intolerable_failure()` - if failures exceed tolerance, fail 
the entire load
   3. Call `set_error_tablet_in_state()` to propagate error info to FE
   
   This allows FE to:
   - Skip failed replicas during PUBLISH_VERSION
   - Data becomes visible immediately on healthy replicas
   - Background TabletScheduler auto-repairs the failed replica
   
   ### Behavior after fix
   | Scenario | Replicas | Result |
   |----------|----------|--------|
   | 3 replicas, 1 timeout | 2/3 success | ✅ Publish succeeds, failed replica 
auto-repairs |
   | 3 replicas, 2 timeout | 1/3 success | ❌ Load fails, user gets error |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug](load) INSERT SELECT data invisible after quorum success with cancelled node channel [doris]

Reply via email to