[
https://issues.apache.org/jira/browse/RATIS-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze updated RATIS-2415:
------------------------------
Attachment: 1356_review.patch
> Fix queue corruption in NettyRpcProxy when request sending fails
> -----------------------------------------------------------------
>
> Key: RATIS-2415
> URL: https://issues.apache.org/jira/browse/RATIS-2415
> Project: Ratis
> Issue Type: Bug
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Major
> Attachments: 1356_review.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> *Summary*
> NettyRpcProxy.Connection.offer() has a bug where a CompletableFuture is
> added to the replies queue before calling writeAndFlush(). If writeAndFlush()
> throws an AlreadyClosedException (or fails asynchronously), the future
> remains
> in the queue, causing memory leaks and reply mismatches.
>
> *Root Cause*
> {code:java}
> synchronized ChannelFuture offer(...) {
> replies.offer(reply); // Step 1: enqueue
> return client.writeAndFlush(request); // Step 2: may throw exception
> } {code}
> If Step 2 fails, Step 1 is not rolled back, leaving the queue corrupted.
> *Reproduction Senario*
> 1. Send request1 → success, queue=[future1], network=[request1]
> 2. Send request2 → writeAndFlush throws exception, queue=[future1,future2],
> network=[request1]
> 3. Send request3 → success, queue=[future1,future2,future3],
> network=[request1,request3]
> 4. Server returns response1, response3
> 5. Client receives response1 → pollReply() gets future1 ✅
> 6. Client receives response3 → pollReply() gets future2 ❌ (mismatch!)
> 7. future3 never completes (timeout)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)