I created a PR for this: https://github.com/grpc/grpc/pull/28506
Op donderdag 6 januari 2022 om 15:02:11 UTC+1 schreef John Forensics: > Still getting closed: > > On windows gRPC is handled using WSASend asynchronous IO. When reaching a > TCP congested situation WSASend should report *WSAEWOULDBLOCK *however > in WINE, the gRPC usage of WSASend always reports "0" > > grep "Ret ws2_32.WSASend" /tmp/stderr | sort | uniq -c > 1 00000d4:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > 36 00d0:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > 15 00d4:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > 1 gen00d0:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > 1 _manager00d0:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > 1 MES00d0:Ret ws2_32.WSASend() retval=00000000 ret=027f06db > > Also all calls to WSASend have LPWSAOVERLAPPED and > LPWSAOVERLAPPED_COMPLETION_ROUTINE > set to zero. > > Call > ws2_32.WSASend(000000fc,03cbf428,00000001,03cbf40c,00000000,00000000,00000000) > > ret=027f06db > > int WSAAPI WSASend( [in] SOCKET s, [in] LPWSABUF lpBuffers, [in] DWORD > dwBufferCount, [out] LPDWORD lpNumberOfBytesSent, [in] DWORD dwFlags, [in] > LPWSAOVERLAPPED lpOverlapped, [in] LPWSAOVERLAPPED_COMPLETION_ROUTINE > lpCompletionRoutine ); > > This matches the code in /src/core/lib/iomgr/tcp_windows.cc > > /* First, let's try a synchronous, non-blocking write. */ > status = WSASend(socket->socket, buffers, > (DWORD)tcp->write_slices->count, > &bytes_sent, 0, NULL, NULL); > info->wsa_error = status == 0 ? 0 : WSAGetLastError(); > > /* We would kind of expect to get a WSAEWOULDBLOCK here, especially on a > busy > connection that has its send queue filled up. But if we don't, then > we can > avoid doing an async write operation at all. */ > if (info->wsa_error != WSAEWOULDBLOCK) { > grpc_error_handle error = status == 0 > ? GRPC_ERROR_NONE > : GRPC_WSA_ERROR(info->wsa_error, > "WSASend"); > grpc_core::ExecCtx::Run(DEBUG_LOCATION, cb, error); > if (allocated) gpr_free(allocated); > return; > } > > > So the "dispute" for this API probably is, is WSASend on Overlapped IO > allowed to send partially and report the lpNumberOfBytesSent < number of > bytes in the buffer. > > Anyone got a opinion about that? > > > Op woensdag 5 januari 2022 om 16:53:34 UTC+1 schreef John Forensics: > >> >> Could one of the developers please elaborate on the case where: >> * The sender has been blocked (shortly) on a closed TCP Window leaving a >> large set of gRPC chunks waiting at the HTTP2 stream sender side >> * The receiver has not send a OUTBOUND WINDOW_UPDATE to signal its OK to >> send more data. >> >> My current impression is that the python (sender) will not continue >> sending although the TCP Congestion window is open again. >> >> >> Op woensdag 5 januari 2022 om 14:52:09 UTC+1 schreef John Forensics: >> >>> So the stream is stalled as the congestion window is closed. Just >>> verified with tcpdump and tcptrace, and can confirm thats the problem. >>> >>> >>> https://github.com/grpc/grpc/blob/2e3e3ccc3d5c40010a6a5555d6f824a5024176a7/src/core/ext/transport/chttp2/transport/writing.cc#L449 >>> >>> This explains why excessive logging and making my program inefficient >>> helps (e.g. adding debug logging of gRPC itself). >>> >>> Which changes the question: why isn't the stream unstalled when the >>> congestion-window opens again.. >>> >>> This also seems something that would be different in terms of low level >>> API in Linux, Windows and WINE. >>> >>> Op woensdag 5 januari 2022 om 14:32:20 UTC+1 schreef John Forensics: >>> >>>> After sending some data (about 4 chunks of 16k of a single 1Mb gRPC >>>> message) the HTTP2-stream seems to be put on the "stalled" list. >>>> (Whatever >>>> that may be). >>>> >>>> >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello W >>>> orld!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello Worl >>>> d!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello >>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.' >>>> I0105 12:30:25.218000000 216 src/core/lib/iomgr/timer_generic.cc:558] >>>> .. shard[13] popped 0 >>>> I0105 12:30:25.258000000 208 >>>> src/core/lib/surface/completion_queue.cc:1074] RETURN_EVENT[00484AA0]: >>>> QUEUE_TIMEOUT >>>> I0105 12:30:25.267000000 212 src/core/lib/iomgr/executor.cc:294] >>>> EXECUTOR (default-executor) try to schedule 00499258 (short) to thread 0 >>>> I0105 12:30:25.267000000 216 src/core/lib/iomgr/timer_generic.cc:613] >>>> .. result --> 1, shard[13]->min_deadline 10496 --> 12695, now=11765 >>>> I0105 12:30:25.267000000 208 >>>> src/core/lib/surface/completion_queue.cc:970] >>>> grpc_completion_queue_next(cq=00484AA0, deadline=gpr_timespec { tv_sec: >>>> 1641385825, tv_nsec: 341971712, clock_type: 1 }, reserved=00000000) >>>> I0105 12:30:25.268000000 196 src/core/lib/iomgr/executor.cc:242] >>>> EXECUTOR (default-executor) [0]: execute >>>> I0105 12:30:25.268000000 216 src/core/lib/iomgr/timer_generic.cc:517] >>>> .. shard[14]: heap_empty=true >>>> I0105 12:30:25.268000000 196 src/core/lib/iomgr/executor.cc:123] >>>> EXECUTOR (default-executor) run 00499258 >>>> I0105 12:30:25.268000000 216 src/core/lib/iomgr/timer_generic.cc:492] >>>> .. shard[14]->queue_deadline_cap --> 12694 >>>> I0105 12:30:25.268000000 196 >>>> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] >>>> W:00497270 >>>> SERVER [ipv4:10.1.0.1:41200] state WRITING+MORE -> WRITING [continue >>>> writing] >>>> I0105 12:30:25.269000000 216 src/core/lib/iomgr/timer_generic.cc:558] >>>> .. shard[14] popped 0 >>>> I0105 12:30:25.269000000 196 >>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] >>>> 00497270[5][svr]: pop from writing >>>> I0105 12:30:25.269000000 216 src/core/lib/iomgr/timer_generic.cc:613] >>>> .. result --> 1, shard[14]->min_deadline 10496 --> 12695, now=11765 >>>> I0105 12:30:25.269000000 196 >>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] >>>> 00497270[5][svr]: pop from writable >>>> I0105 12:30:25.269000000 216 src/core/lib/iomgr/timer_generic.cc:517] >>>> .. shard[7]: heap_empty=true >>>> I0105 12:30:25.270000000 196 >>>> src/core/ext/transport/chttp2/transport/writing.cc:443] W:00497270 >>>> SERVER[5] im-(sent,send)=(1,0) announce=0 >>>> I0105 12:30:25.270000000 216 src/core/lib/iomgr/timer_generic.cc:492] >>>> .. shard[7]->queue_deadline_cap --> 12764 >>>> D0105 12:30:25.270000000 196 >>>> src/core/ext/transport/chttp2/transport/flow_control.cc:117] >>>> 00498C78[5][svr] | s updt sent | trw: 0, tlw: >>>> >>>> 4183485, taw: 4187469, srw: >>>> >>>> 288, slw: 4187521, saw: >>>> >>>> 4187521 >>>> I0105 12:30:25.270000000 216 src/core/lib/iomgr/timer_generic.cc:558] >>>> .. shard[7] popped 0 >>>> D0105 12:30:25.270000000 196 >>>> src/core/ext/transport/chttp2/transport/writing.cc:163] >>>> ipv4:10.1.0.1:41200:00497270 stream 5 moved to stalled list by transport. >>>> This is FULLY expected to happen in a healthy program that is not seeing >>>> flow control stalls. However, if you know that there are unwanted stalls, >>>> here is some helpful data: >>>> [fc:pending=656:pending-compressed=0:flowed=1048288:peer_initwin=4187516:t_win=0:s_win=288:s_delta=-1048288] >>>> I0105 12:30:25.271000000 216 src/core/lib/iomgr/timer_generic.cc:613] >>>> .. result --> 1, shard[7]->min_deadline 10626 --> 12765, now=11765 >>>> I0105 12:30:25.271000000 196 >>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:125] >>>> 00497270[5][svr]: add to stalled_by_transport >>>> I0105 12:30:25.271000000 216 src/core/lib/iomgr/timer_generic.cc:517] >>>> .. shard[9]: heap_empty=true >>>> D0105 12:30:25.271000000 196 >>>> src/core/ext/transport/chttp2/transport/flow_control.cc:117] >>>> 00498C78[0][svr] | t updt sent | trw: 0, tlw: >>>> >>>> 4183485, taw: 4187469, srw: >>>> >>>> , slw: , saw: >>>> >>>> >>>> I0105 12:30:25.271000000 216 src/core/lib/iomgr/timer_generic.cc:492] >>>> .. shard[9]->queue_deadline_cap --> 12764 >>>> I0105 12:30:25.272000000 196 >>>> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] >>>> W:00497270 >>>> SERVER [ipv4:10.1.0.1:41200] state WRITING -> IDLE [begin writing >>>> nothing] >>>> I0105 12:30:25.272000000 216 src/core/lib/iomgr/timer_generic.cc:558] >>>> .. shard[9] popped 0 >>>> >>>> >>>> Op woensdag 5 januari 2022 om 11:54:13 UTC+1 schreef John Forensics: >>>> >>>>> I managed to extract some logging, the python side seems to send a too >>>>> large frame. >>>>> >>>>> [id: 0x0e0792ca, L:/127.0.0.1:43718 - R:localhost/127.0.0.1:60630] >>>>> Sent GOAWAY: lastStreamId '2147483647 <(214)%20748-3647>', errorCode >>>>> '6', debugData 'Frame length: 2165320 exceeds maximum: 16384'. Forcing >>>>> shutdown of the connection. >>>>> >>>>> >>>>> Op dinsdag 4 januari 2022 om 16:42:12 UTC+1 schreef John Forensics: >>>>> >>>>>> I am running identical Python code from either >>>>>> >>>>>> WINE python.exe versus Linux native python >>>>>> >>>>>> The behaviour is slightly different, leading eventually to* losing >>>>>> the communicatie with a WINE python variant (messages/calls not received >>>>>> at >>>>>> the java client side)* >>>>>> >>>>>> The most visible difference I see is the number of HTTP2 Streams that >>>>>> are created. The linux native runs stay at a low number, and most >>>>>> communication is done within one stream, while the windows variant seems >>>>>> to >>>>>> distribute it evenly over more streams. >>>>>> >>>>>> I do not understand why the gRPC layer would behave differently. I >>>>>> currently try to understand the problem using wireshark, but thats not >>>>>> the >>>>>> right way to do it. >>>>>> >>>>>> Any tips on how to debug this would be appreciated? I do not see any >>>>>> exceptions explaining why a message isn't received at the java side. >>>>>> >>>>>> If people have some hypotheses for me to test, that would be nice as >>>>>> well. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Ruud >>>>>> >>>>>> >>>>>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/7fc1c587-bbc0-4e58-a68d-7ecdf81ba549n%40googlegroups.com.