hzhaop commented on PR #777:
URL: https://github.com/apache/skywalking-java/pull/777#issuecomment-3498721552
Thanks for the feedback @wu-sheng! I've revised the implementation based on
your suggestions:
## Changes Made
### 1. Restored original reconnection logic + Added TRANSIENT_FAILURE
monitoring
**I agree with your point** - the original logic is correct: only force
reconnect when selecting a **different server**. When the **same server** is
selected, we should rely on gRPC's auto-reconnect.
**What I've added:**
- Monitor `ConnectivityState.TRANSIENT_FAILURE` state continuously
- If channel stays in TRANSIENT_FAILURE for too long, force rebuild channel
- Dual safeguard: force reconnect if either `reconnectCount` OR
`transientFailureCount` exceeds threshold
This solves the half-open connection issue without disrupting gRPC's
auto-reconnect mechanism.
**Key code:**
```java
if (index == selectedIdx) {
reconnectCount++;
boolean forceReconnect = reconnectCount > threshold
|| transientFailureCount > threshold;
if (forceReconnect) {
createNewChannel(...); // Force rebuild
} else if (managedChannel.isConnected(false)) {
markAsConnected(); // Let gRPC auto-reconnect work
}
}
```
Also added `keepAliveWithoutCalls(true)` to detect half-open connections.
### 2. Fixed race condition between reportError() and run()
**Issue from heap dump analysis:**
My production heap dump showed `reconnect = false` while listeners were in
DISCONNECT state, proving the race condition exists. **This causes SkyWalking
Agent to permanently stop uploading data.**
**Fix:**
Wrapped all state changes in `synchronized(statusLock)` to make flag updates
and notifications atomic.
```java
synchronized (statusLock) {
reconnect = true;
notify(GRPCChannelStatus.DISCONNECT);
}
```
## Additional Changes
- Adjusted keepalive default from 60s to 120s (reduces overhead)
- Added `getState()` method to GRPCChannel for monitoring
Let me know if this approach looks good!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]