It seems that the issue of the stopped message stream with the long-lived 
RPC has been resolved.

What I have changed:
- The callback handling in `OnReadDone` now runs on a separate thread as 
described in 
https://github.com/grpc/proposal/blob/master/L67-cpp-callback-api.md.
- I have added a channel argument for keep alive messaging:
  args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, 60 * 1000 /*60 sec*/);

I still see channel state changes to TRANSIENT_FAILURE every hour, but gRPC 
recovers from it without any further action on my part. I keep receiving 
messages from the long-lived RPC that was started earlier.

>From the logging (program logging and gRPC logging merged):

Program start:
[2023-10-25 10:17:56.869 Debug] RPC PayloadEventStream: [thread-id=2416]
I1025 10:17:56.872000000  2416 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: READY

After 1 hour:
[2023-10-25 11:17:56.073 Debug] RPC Authenticate: [thread-id=12240]
I1025 11:17:56.075000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: READY

After 2 hours:
[2023-10-25 12:17:55.301 Debug] RPC Authenticate: [thread-id=12240]
I1025 12:17:55.304000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: READY
I1025 12:17:55.316000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB5DF7E8]: READY -> 
TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received)
I1025 12:17:55.321000000 13072 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB5DF7E8]: notifying 
watcher 000001A8EB596AC0: READY -> TRANSIENT_FAILURE
I1025 12:17:55.323000000 13072 connectivity_state.cc:80] watcher 
000001A8EB596AC0: delivering async notification for TRANSIENT_FAILURE 
(UNAVAILABLE: GOAWAY received)
I1025 12:17:55.328000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE 
(helper, OK)

After 3 hours:
[2023-10-25 13:17:54.459 Debug] RPC Authenticate: [thread-id=12240]
I1025 13:17:54.461000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 13:17:54.464000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 13:17:54.559000000 12240 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> 
CONNECTING (helper, OK)
I1025 13:17:54.597000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> 
READY (helper, OK)
I1025 13:17:54.602000000 13072 connectivity_state.cc:123] 
ConnectivityStateTracker client_transport[000001A8EB86EB98]: add watcher 
000001A8EB81EFD0
I1025 13:19:55.156000000 13680 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB86EB98]: READY -> 
TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received)
I1025 13:19:55.160000000 13680 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB86EB98]: notifying 
watcher 000001A8EB81EFD0: READY -> TRANSIENT_FAILURE
I1025 13:19:55.162000000 13680 connectivity_state.cc:80] watcher 
000001A8EB81EFD0: delivering async notification for TRANSIENT_FAILURE 
(UNAVAILABLE: GOAWAY received)
I1025 13:19:55.165000000 13680 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE 
(helper, OK)
I1025 13:19:55.167000000 13680 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB86EB98]: 
TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK)
I1025 13:19:55.169000000 13680 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB86EB98]: notifying 
watcher 000001A8EB81EFD0: TRANSIENT_FAILURE -> SHUTDOWN
I1025 13:19:55.171000000 13680 connectivity_state.cc:80] watcher 
000001A8EB81EFD0: delivering async notification for SHUTDOWN (OK)

After 4 hours:
[2023-10-25 14:17:53.612 Debug] RPC Authenticate: [thread-id=12240]
I1025 14:17:53.614000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 14:17:53.618000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 14:17:53.709000000 12240 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> 
CONNECTING (helper, OK)
I1025 14:17:53.745000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> 
READY (helper, OK)
I1025 14:17:53.748000000 13072 connectivity_state.cc:123] 
ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: add watcher 
000001A8EB8844A0
I1025 14:19:54.364000000  6644 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: READY -> 
TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received)
I1025 14:19:54.368000000  6644 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: notifying 
watcher 000001A8EB8844A0: READY -> TRANSIENT_FAILURE
I1025 14:19:54.371000000  6644 connectivity_state.cc:80] watcher 
000001A8EB8844A0: delivering async notification for TRANSIENT_FAILURE 
(UNAVAILABLE: GOAWAY received)
I1025 14:19:54.374000000  6644 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE 
(helper, OK)
I1025 14:19:54.376000000  6644 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: 
TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK)
I1025 14:19:54.378000000  6644 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: notifying 
watcher 000001A8EB8844A0: TRANSIENT_FAILURE -> SHUTDOWN
I1025 14:19:54.383000000  6644 connectivity_state.cc:80] watcher 
000001A8EB8844A0: delivering async notification for SHUTDOWN (OK)

After 5 hours:
[2023-10-25 15:17:52.843 Debug] RPC Authenticate: [thread-id=12240]
I1025 15:17:52.844000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 15:17:52.847000000 12240 connectivity_state.cc:183] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current 
state: IDLE
I1025 15:17:52.936000000 12240 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> 
CONNECTING (helper, OK)
I1025 15:17:52.975000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> 
READY (helper, OK)
I1025 15:17:52.978000000 13072 connectivity_state.cc:123] 
ConnectivityStateTracker client_transport[000001A8EB918008]: add watcher 
000001A8E890C760
I1025 15:19:53.547000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB918008]: READY -> 
TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received)
I1025 15:19:53.552000000 13072 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB918008]: notifying 
watcher 000001A8E890C760: READY -> TRANSIENT_FAILURE
I1025 15:19:53.555000000 13072 connectivity_state.cc:80] watcher 
000001A8E890C760: delivering async notification for TRANSIENT_FAILURE 
(UNAVAILABLE: GOAWAY received)
I1025 15:19:53.558000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE 
(helper, OK)
I1025 15:19:53.561000000 13072 connectivity_state.cc:160] 
ConnectivityStateTracker client_transport[000001A8EB918008]: 
TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK)
I1025 15:19:53.565000000 13072 connectivity_state.cc:168] 
ConnectivityStateTracker client_transport[000001A8EB918008]: notifying 
watcher 000001A8E890C760: TRANSIENT_FAILURE -> SHUTDOWN
I1025 15:19:53.569000000 13072 connectivity_state.cc:80] watcher 
000001A8E890C760: delivering async notification for SHUTDOWN (OK)

I still need to find out why that is, but at least I keep receiving 
messages from the stream
Op maandag 23 oktober 2023 om 16:18:02 UTC+2 schreef Paul van der Linden:

> My C++ gRPC client makes a long-lived RPC call at program start. The 
> messages I receive from this through the asynchronous callback API are 
> handled in a separate thread. In the main thread, I make synchronous unary 
> RPC calls with a deadline. After exactly an hour, I see a TRANSIENT_FAILURE 
> in the GRPC_TRACE log on the channel, and then the unary RPC succeeds 
> (likely due to a retry). In my application, I don't notice any issues. 
> However, there is now a problem with the long-lived RPC I had previously 
> started; I am no longer receiving messages from it.
>
> My question is how to handle this situation, as I can find little 
> information about it in the documentation. Should the gRPC library 
> automatically resolve this, or should I monitor the channel state in my 
> application and, in the case of a TRANSIENT_FAILURE, stop and restart the 
> long-lived RPC? Why does this problem always occur after exactly one hour, 
> is this some default client or server timeout?
>
> By the way, I see with 'netstat' that an additional socket has been added 
> after the TRANSIENT_FAILURE.
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/e7af6bc1-e158-44bd-86b9-44288c809bcen%40googlegroups.com.

Reply via email to