walterzhaoJR opened a new issue, #3243: URL: https://github.com/apache/brpc/issues/3243
**Describe the bug** release_tls_block() and release_tls_block_chain() in the IOBuf TLS block caching layer do not guard against a block being returned to TLS when it is already the TLS list head. This can create a self-referencing cycle (b->portal_next == b), causing any subsequent traversal of the TLS chain — such as remove_tls_block_chain() (registered via thread_atexit) or share_tls_block() — to loop infinitely, hanging the thread permanently. In src/butil/iobuf_inl.h, release_tls_block(): <img width="705" height="195" alt="Image" src="https://github.com/user-attachments/assets/98135c5d-5066-4228-8fe2-77811d25881e" /> When b is already tls_data->block_head, the assignment b->u.portal_next = tls_data->block_head becomes b->u.portal_next = b, forming a single-node cycle. Similarly, in src/butil/iobuf.cpp, release_tls_block_chain(): <img width="699" height="147" alt="Image" src="https://github.com/user-attachments/assets/65f10488-428f-409e-ac6f-1846a99b8325" /> If the chain being returned contains blocks that overlap with the existing TLS head, last_b->portal_next can point back to first_b (which may be last_b itself), again forming an infinite cycle. How the Double-Return Happens IOBufAsZeroCopyOutputStream::BackUp() calls iobuf::release_tls_block(_cur_block) to eagerly return the block to TLS so other code can reuse it: <img width="705" height="110" alt="Image" src="https://github.com/user-attachments/assets/0e1a9f13-113c-4942-9865-b5be8c06e63b" /> After BackUp(), the block is now tls_data.block_head. If a subsequent operation (e.g., _release_block() during destruction of IOBufAsZeroCopyOutputStream, or a BackUp in IOBufAsSnappySink) calls release_tls_block() again with the same block pointer (obtained from a still-live BlockRef), the block is returned a second time — triggering the self-loop. Impact - Thread hangs permanently in remove_tls_block_chain() (called at thread exit via thread_atexit), or in share_tls_block() / release_tls_block_chain() during normal I/O. - The hang is silent — no crash, no log, no error — making it extremely difficult to diagnose in production. - Any brpc application using protobuf serialization over IOBuf (which internally uses IOBufAsZeroCopyOutputStream) is potentially affected. **To Reproduce** **Expected behavior** **Versions** OS: Compiler: brpc: protobuf: **Additional context/screenshots** ** Suggested Fix ** 1. Guard release_tls_block() against double-return <img width="709" height="466" alt="Image" src="https://github.com/user-attachments/assets/5d8b7da1-1ff2-44ae-b63e-2283fcc3038b" /> 2. Guard release_tls_block_chain() against self-loop after linking <img width="683" height="279" alt="Image" src="https://github.com/user-attachments/assets/75b1f6bc-36e1-474f-af61-84ae76636994" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
