hzhaop opened a new pull request, #777:
URL: https://github.com/apache/skywalking-java/pull/777

   This commit addresses two issues related to gRPC connection stability and 
recovery.
   
   1.  **Half-open connections:** In unstable network environments, the agent 
could encounter half-open TCP connections where the server-side connection is 
terminated, but the client-side remains. This would cause the send-queue to 
grow indefinitely without automatic recovery. To resolve this, this change 
introduces gRPC keepalive probes. The agent will now send keepalive pings to 
the collector, ensuring that dead connections are detected and pruned in a 
timely manner. Two new configuration parameters, 
`collector.grpc_keepalive_time` and `collector.grpc_keepalive_timeout`, have 
been added to control this behavior.
   
![sw1](https://github.com/user-attachments/assets/030c2108-4178-42f5-b836-81c15f561384)
   
   
   3.  **Reconnect logic:** The existing reconnection logic did not immediately 
re-establish a connection if the same backend instance was selected during a 
reconnect attempt. This could lead to a delay of up to an hour before the 
connection was re-established. The logic has been updated to ensure that the 
channel is always shut down and recreated, forcing an immediate reconnection 
attempt regardless of which backend is selected.
   
![sw2](https://github.com/user-attachments/assets/3621b3a6-cba3-4318-abef-3b06a04be331)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to