It seems that the issue of the stopped message stream with the long-lived RPC has been resolved.
What I have changed: - The callback handling in `OnReadDone` now runs on a separate thread as described in https://github.com/grpc/proposal/blob/master/L67-cpp-callback-api.md. - I have added a channel argument for keep alive messaging: args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, 60 * 1000 /*60 sec*/); I still see channel state changes to TRANSIENT_FAILURE every hour, but gRPC recovers from it without any further action on my part. I keep receiving messages from the long-lived RPC that was started earlier. >From the logging (program logging and gRPC logging merged): Program start: [2023-10-25 10:17:56.869 Debug] RPC PayloadEventStream: [thread-id=2416] I1025 10:17:56.872000000 2416 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: READY After 1 hour: [2023-10-25 11:17:56.073 Debug] RPC Authenticate: [thread-id=12240] I1025 11:17:56.075000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: READY After 2 hours: [2023-10-25 12:17:55.301 Debug] RPC Authenticate: [thread-id=12240] I1025 12:17:55.304000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: READY I1025 12:17:55.316000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB5DF7E8]: READY -> TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received) I1025 12:17:55.321000000 13072 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB5DF7E8]: notifying watcher 000001A8EB596AC0: READY -> TRANSIENT_FAILURE I1025 12:17:55.323000000 13072 connectivity_state.cc:80] watcher 000001A8EB596AC0: delivering async notification for TRANSIENT_FAILURE (UNAVAILABLE: GOAWAY received) I1025 12:17:55.328000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE (helper, OK) After 3 hours: [2023-10-25 13:17:54.459 Debug] RPC Authenticate: [thread-id=12240] I1025 13:17:54.461000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 13:17:54.464000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 13:17:54.559000000 12240 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> CONNECTING (helper, OK) I1025 13:17:54.597000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> READY (helper, OK) I1025 13:17:54.602000000 13072 connectivity_state.cc:123] ConnectivityStateTracker client_transport[000001A8EB86EB98]: add watcher 000001A8EB81EFD0 I1025 13:19:55.156000000 13680 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB86EB98]: READY -> TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received) I1025 13:19:55.160000000 13680 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB86EB98]: notifying watcher 000001A8EB81EFD0: READY -> TRANSIENT_FAILURE I1025 13:19:55.162000000 13680 connectivity_state.cc:80] watcher 000001A8EB81EFD0: delivering async notification for TRANSIENT_FAILURE (UNAVAILABLE: GOAWAY received) I1025 13:19:55.165000000 13680 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE (helper, OK) I1025 13:19:55.167000000 13680 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB86EB98]: TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK) I1025 13:19:55.169000000 13680 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB86EB98]: notifying watcher 000001A8EB81EFD0: TRANSIENT_FAILURE -> SHUTDOWN I1025 13:19:55.171000000 13680 connectivity_state.cc:80] watcher 000001A8EB81EFD0: delivering async notification for SHUTDOWN (OK) After 4 hours: [2023-10-25 14:17:53.612 Debug] RPC Authenticate: [thread-id=12240] I1025 14:17:53.614000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 14:17:53.618000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 14:17:53.709000000 12240 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> CONNECTING (helper, OK) I1025 14:17:53.745000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> READY (helper, OK) I1025 14:17:53.748000000 13072 connectivity_state.cc:123] ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: add watcher 000001A8EB8844A0 I1025 14:19:54.364000000 6644 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: READY -> TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received) I1025 14:19:54.368000000 6644 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: notifying watcher 000001A8EB8844A0: READY -> TRANSIENT_FAILURE I1025 14:19:54.371000000 6644 connectivity_state.cc:80] watcher 000001A8EB8844A0: delivering async notification for TRANSIENT_FAILURE (UNAVAILABLE: GOAWAY received) I1025 14:19:54.374000000 6644 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE (helper, OK) I1025 14:19:54.376000000 6644 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK) I1025 14:19:54.378000000 6644 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB8B9DE8]: notifying watcher 000001A8EB8844A0: TRANSIENT_FAILURE -> SHUTDOWN I1025 14:19:54.383000000 6644 connectivity_state.cc:80] watcher 000001A8EB8844A0: delivering async notification for SHUTDOWN (OK) After 5 hours: [2023-10-25 15:17:52.843 Debug] RPC Authenticate: [thread-id=12240] I1025 15:17:52.844000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 15:17:52.847000000 12240 connectivity_state.cc:183] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: get current state: IDLE I1025 15:17:52.936000000 12240 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: IDLE -> CONNECTING (helper, OK) I1025 15:17:52.975000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: CONNECTING -> READY (helper, OK) I1025 15:17:52.978000000 13072 connectivity_state.cc:123] ConnectivityStateTracker client_transport[000001A8EB918008]: add watcher 000001A8E890C760 I1025 15:19:53.547000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB918008]: READY -> TRANSIENT_FAILURE (got_goaway, UNAVAILABLE: GOAWAY received) I1025 15:19:53.552000000 13072 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB918008]: notifying watcher 000001A8E890C760: READY -> TRANSIENT_FAILURE I1025 15:19:53.555000000 13072 connectivity_state.cc:80] watcher 000001A8E890C760: delivering async notification for TRANSIENT_FAILURE (UNAVAILABLE: GOAWAY received) I1025 15:19:53.558000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_channel[000001A8EB07F0D8]: READY -> IDLE (helper, OK) I1025 15:19:53.561000000 13072 connectivity_state.cc:160] ConnectivityStateTracker client_transport[000001A8EB918008]: TRANSIENT_FAILURE -> SHUTDOWN (close_transport, OK) I1025 15:19:53.565000000 13072 connectivity_state.cc:168] ConnectivityStateTracker client_transport[000001A8EB918008]: notifying watcher 000001A8E890C760: TRANSIENT_FAILURE -> SHUTDOWN I1025 15:19:53.569000000 13072 connectivity_state.cc:80] watcher 000001A8E890C760: delivering async notification for SHUTDOWN (OK) I still need to find out why that is, but at least I keep receiving messages from the stream Op maandag 23 oktober 2023 om 16:18:02 UTC+2 schreef Paul van der Linden: > My C++ gRPC client makes a long-lived RPC call at program start. The > messages I receive from this through the asynchronous callback API are > handled in a separate thread. In the main thread, I make synchronous unary > RPC calls with a deadline. After exactly an hour, I see a TRANSIENT_FAILURE > in the GRPC_TRACE log on the channel, and then the unary RPC succeeds > (likely due to a retry). In my application, I don't notice any issues. > However, there is now a problem with the long-lived RPC I had previously > started; I am no longer receiving messages from it. > > My question is how to handle this situation, as I can find little > information about it in the documentation. Should the gRPC library > automatically resolve this, or should I monitor the channel state in my > application and, in the case of a TRANSIENT_FAILURE, stop and restart the > long-lived RPC? Why does this problem always occur after exactly one hour, > is this some default client or server timeout? > > By the way, I see with 'netstat' that an additional socket has been added > after the TRANSIENT_FAILURE. > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/e7af6bc1-e158-44bd-86b9-44288c809bcen%40googlegroups.com.