It looks like there is a bug around handling of flow control windows either 
in the client or the server. Based on the error log, I would presume that 
the server's flow control implementation is buggy. To dig deeper, we would 
need to look at what flow control updates are being sent.

On Monday, November 22, 2021 at 10:47:08 PM UTC-8 fli...@gmail.com wrote:

> I have a gRPC service with a bidirectional streaming method.
>
>    - Client: python grpcio 1.41.1.
>    - Server: akka-grpc 2.1.0.
>
> The client is a slow consumer (the server could potentially perform at a 
> higher rate).
>
> Occasionally (with some random delay after method call), client logs 
> message like the following:
> * E1122 13:42:55.763763501 108048 flow_control.cc:240] Incoming frame of 
> size 317205 exceeds local window size of 0. The (un-acked, future) window 
> size would be 1708209 which is not exceeded. This would usually cause a 
> disconnection, but allowing it due tobroken HTTP2 implementations in the 
> wild. See (for example) https://github.com/netty/netty/issues/6520 
> <https://github.com/netty/netty/issues/6520>. * 
>
> Sometimes this message is followed by exception:
>
> *Exception in thread Thread-2: *
>
> *Traceback (most recent call last): *
>
> *  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner *
>
> *    self.run() *
> *  File "/usr/lib/python3.8/threading.py", line 870, in run*
>
> *    self._target(*self._args, **self._kwargs) *
>
> *  File "[...]/client.py", line 107, in fetch *
>
> *    for response in responses: *
>
> *  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 
> 426, in __next__ *
>
> *    return self._next() *
>
> *  File "[...]/venv/lib/python3.8/site-packages/grpc/_channel.py", line 
> 826, in _next *
>
> *    raise self *
> *grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC 
> that terminated with: status = StatusCode.UNKNOWN details = "Stream 
> removed" debug_error_string = 
> "{"created":"@1637649068.837642637","description":"Error received from peer 
> ipv4:***.***.***.***:****","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream
>  
> removed","grpc_status":2}" * 
>
> But sometimes overall call succeeds with no exception.
>
> Some research:
>
>    - Disabling BDP by setting grpc.http2.bdp_probe = 0 seems to resolve 
>    the problem, but I suppose it's just a side effect of overall throughput 
>    decrease.
>    - There is somewhat similar issue 
>    <https://github.com/grpc/grpc/issues/22889> on GitHub, but it looks 
>    like it's about an *unary* call. In that case, server starts to use 
>    increased initial window size immediately after receiving client's 
> SETTINGS 
>    frame and before sending SETTINGS ack (if I understood right). In my case, 
>    frame ordering looks correct.
>    - Exploring captured network packets and client-side gRPC tracing logs 
>    (GRPC_VERBOSITY=DEBUG, GRPC_TRACE=flowctl) doesn't give me any 
>    insights.
>
> I'll greatly appreciate any ideas on how to resolve or diagnose the 
> problem.
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/64540ae4-0574-4eaf-a4f4-dae5d07e69f8n%40googlegroups.com.

Reply via email to