wwbmmm commented on issue #2395:
URL: https://github.com/apache/brpc/issues/2395#issuecomment-1770370595

   找到原因了。
   要使Socket进入健康检查流程,首先要满足引用计数=2,在这里的WaitAndReset方法,会一直循环等待引用计数=2
   
https://github.com/apache/brpc/blob/master/src/brpc/details/health_check.cpp#L190
   
   但是在这个issue的场景中,引用计数一直是3,导致无法进入健康检查流程。
   这三个引用分别来自:
   1. 
SocketMap对Socket的引用。参考:https://github.com/apache/brpc/blob/master/src/brpc/socket_map.cpp#L258
   2. 
健康检查流程对Socket的引用。参考:https://github.com/apache/brpc/blob/master/src/brpc/details/health_check.cpp#L78
   3. 
Socket连接流程对Socket的引用。参考:https://github.com/apache/brpc/blob/master/src/brpc/socket.cpp#L1382
   
   这里不正常的是第3个引用,按理说连接失败之后这个引用就应该释放,从代码上看,在连接失败后,会回调到
   
Socket::AfterAppConnected,在这个方法里,创建了一个SocketUniquePtr,当AfterAppConnected方法返回时这个SocketUniquePtr就会自动析构,释放引用计数
   https://github.com/apache/brpc/blob/master/src/brpc/socket.cpp#L1469
   
   但在这个问题中,实际的调用链是这个样的:
   Socket::AfterAppConnected
   -> Socket::ReleaseAllFailedWriteRequests
   -> Socket::ReturnFailedWriteRequest
   -> bthread_id_error2
   -> Controller::HandleSocketFailed
   -> Controller::OnVersionedRPCReturned
   -> Controller::IssueRPC (重试)
   -> Controller::HandleSendFailed
   -> Controller::OnVersionedRPCReturned
   …… (不断重试,直至重试耗尽)
   
   由于Controller一直在递归的重试,导致Socket::AfterAppConnected方法无法退出,从而无法释放引用计数。
   当重试次数耗尽后,Socket::AfterAppConnected方法退出,Socket的引用计数恢复成2,才开始进入健康检查。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to