[grpc-io] Re: gRPC python behaves differently from WINE Python vs Linux Python

John Forensics Mon, 10 Jan 2022 23:39:05 -0800

I created a PR for this:

https://github.com/grpc/grpc/pull/28506



Op donderdag 6 januari 2022 om 15:02:11 UTC+1 schreef John Forensics:

> Still getting closed:
>
> On windows gRPC is handled using WSASend asynchronous IO. When reaching a 
> TCP congested situation WSASend should report  *WSAEWOULDBLOCK *however 
> in WINE, the gRPC usage of WSASend always reports "0" 
>
> grep "Ret  ws2_32.WSASend" /tmp/stderr | sort | uniq -c
>       1 00000d4:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>      36 00d0:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>      15 00d4:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>       1 gen00d0:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>       1 _manager00d0:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>       1 MES00d0:Ret  ws2_32.WSASend() retval=00000000 ret=027f06db
>
> Also all calls to WSASend have LPWSAOVERLAPPED and 
> LPWSAOVERLAPPED_COMPLETION_ROUTINE 
> set to zero.
>
> Call 
> ws2_32.WSASend(000000fc,03cbf428,00000001,03cbf40c,00000000,00000000,00000000)
>  
> ret=027f06db
>
> int WSAAPI WSASend( [in] SOCKET s, [in] LPWSABUF lpBuffers, [in] DWORD 
> dwBufferCount, [out] LPDWORD lpNumberOfBytesSent, [in] DWORD dwFlags, [in] 
> LPWSAOVERLAPPED lpOverlapped, [in] LPWSAOVERLAPPED_COMPLETION_ROUTINE 
> lpCompletionRoutine );
>
> This matches the code in /src/core/lib/iomgr/tcp_windows.cc
>
>   /* First, let's try a synchronous, non-blocking write. */
>   status = WSASend(socket->socket, buffers, 
> (DWORD)tcp->write_slices->count,
>                    &bytes_sent, 0, NULL, NULL);
>   info->wsa_error = status == 0 ? 0 : WSAGetLastError();
>
>   /* We would kind of expect to get a WSAEWOULDBLOCK here, especially on a 
> busy
>      connection that has its send queue filled up. But if we don't, then 
> we can
>      avoid doing an async write operation at all. */
>   if (info->wsa_error != WSAEWOULDBLOCK) {
>     grpc_error_handle error = status == 0
>                                   ? GRPC_ERROR_NONE
>                                   : GRPC_WSA_ERROR(info->wsa_error, 
> "WSASend");
>     grpc_core::ExecCtx::Run(DEBUG_LOCATION, cb, error);
>     if (allocated) gpr_free(allocated);
>     return;
>   }
>
>
> So the "dispute" for this API probably is, is WSASend on Overlapped IO 
> allowed to send partially and report the lpNumberOfBytesSent < number of 
> bytes in the buffer.
>
> Anyone got a opinion about that?
>
>
> Op woensdag 5 januari 2022 om 16:53:34 UTC+1 schreef John Forensics:
>
>>
>> Could one of the developers please elaborate on the case where:
>> * The sender has been blocked (shortly) on a closed TCP Window leaving a 
>> large set of gRPC chunks waiting at the HTTP2 stream sender side
>> * The receiver has not send a OUTBOUND WINDOW_UPDATE to signal its OK to 
>> send more data.
>>
>> My current impression is that the python (sender) will not continue 
>> sending although the TCP Congestion window is open again.
>>
>>
>> Op woensdag 5 januari 2022 om 14:52:09 UTC+1 schreef John Forensics:
>>
>>> So the stream is stalled as the congestion window is closed. Just 
>>> verified with tcpdump and tcptrace, and can confirm thats the problem.
>>>
>>>
>>> https://github.com/grpc/grpc/blob/2e3e3ccc3d5c40010a6a5555d6f824a5024176a7/src/core/ext/transport/chttp2/transport/writing.cc#L449
>>>
>>> This explains why excessive logging and making my program inefficient 
>>> helps (e.g. adding debug logging of gRPC itself).
>>>
>>> Which changes the question: why isn't the stream unstalled when the 
>>> congestion-window opens again..
>>>
>>> This also seems something that would be different in terms of low level 
>>> API  in Linux, Windows and WINE.
>>>
>>> Op woensdag 5 januari 2022 om 14:32:20 UTC+1 schreef John Forensics:
>>>
>>>> After sending some data (about 4 chunks of 16k of a single 1Mb gRPC 
>>>> message)  the HTTP2-stream seems to be put on the "stalled" list. 
>>>> (Whatever 
>>>> that may be). 
>>>>
>>>>
>>>>  World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello W
>>>> orld!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello Worl
>>>> d!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
>>>> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.'
>>>> I0105 12:30:25.218000000   216 src/core/lib/iomgr/timer_generic.cc:558] 
>>>>   .. shard[13] popped 0
>>>> I0105 12:30:25.258000000   208 
>>>> src/core/lib/surface/completion_queue.cc:1074] RETURN_EVENT[00484AA0]: 
>>>> QUEUE_TIMEOUT
>>>> I0105 12:30:25.267000000   212 src/core/lib/iomgr/executor.cc:294] 
>>>> EXECUTOR (default-executor) try to schedule 00499258 (short) to thread 0
>>>> I0105 12:30:25.267000000   216 src/core/lib/iomgr/timer_generic.cc:613] 
>>>>   .. result --> 1, shard[13]->min_deadline 10496 --> 12695, now=11765
>>>> I0105 12:30:25.267000000   208 
>>>> src/core/lib/surface/completion_queue.cc:970] 
>>>> grpc_completion_queue_next(cq=00484AA0, deadline=gpr_timespec { tv_sec: 
>>>> 1641385825, tv_nsec: 341971712, clock_type: 1 }, reserved=00000000)
>>>> I0105 12:30:25.268000000   196 src/core/lib/iomgr/executor.cc:242] 
>>>> EXECUTOR (default-executor) [0]: execute
>>>> I0105 12:30:25.268000000   216 src/core/lib/iomgr/timer_generic.cc:517] 
>>>>   .. shard[14]: heap_empty=true
>>>> I0105 12:30:25.268000000   196 src/core/lib/iomgr/executor.cc:123] 
>>>> EXECUTOR (default-executor) run 00499258
>>>> I0105 12:30:25.268000000   216 src/core/lib/iomgr/timer_generic.cc:492] 
>>>>   .. shard[14]->queue_deadline_cap --> 12694
>>>> I0105 12:30:25.268000000   196 
>>>> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] 
>>>> W:00497270 
>>>> SERVER [ipv4:10.1.0.1:41200] state WRITING+MORE -> WRITING [continue 
>>>> writing]
>>>> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:558] 
>>>>   .. shard[14] popped 0
>>>> I0105 12:30:25.269000000   196 
>>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] 
>>>> 00497270[5][svr]: pop from writing
>>>> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:613] 
>>>>   .. result --> 1, shard[14]->min_deadline 10496 --> 12695, now=11765
>>>> I0105 12:30:25.269000000   196 
>>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] 
>>>> 00497270[5][svr]: pop from writable
>>>> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:517] 
>>>>   .. shard[7]: heap_empty=true
>>>> I0105 12:30:25.270000000   196 
>>>> src/core/ext/transport/chttp2/transport/writing.cc:443] W:00497270 
>>>> SERVER[5] im-(sent,send)=(1,0) announce=0
>>>> I0105 12:30:25.270000000   216 src/core/lib/iomgr/timer_generic.cc:492] 
>>>>   .. shard[7]->queue_deadline_cap --> 12764
>>>> D0105 12:30:25.270000000   196 
>>>> src/core/ext/transport/chttp2/transport/flow_control.cc:117] 
>>>> 00498C78[5][svr] | s updt sent | trw:                             0, tlw:  
>>>>  
>>>>                     4183485, taw:                       4187469, srw:      
>>>>  
>>>>                     288, slw:                       4187521, saw:          
>>>>  
>>>>             4187521
>>>> I0105 12:30:25.270000000   216 src/core/lib/iomgr/timer_generic.cc:558] 
>>>>   .. shard[7] popped 0
>>>> D0105 12:30:25.270000000   196 
>>>> src/core/ext/transport/chttp2/transport/writing.cc:163] 
>>>> ipv4:10.1.0.1:41200:00497270 stream 5 moved to stalled list by transport. 
>>>> This is FULLY expected to happen in a healthy program that is not seeing 
>>>> flow control stalls. However, if you know that there are unwanted stalls, 
>>>> here is some helpful data: 
>>>> [fc:pending=656:pending-compressed=0:flowed=1048288:peer_initwin=4187516:t_win=0:s_win=288:s_delta=-1048288]
>>>> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:613] 
>>>>   .. result --> 1, shard[7]->min_deadline 10626 --> 12765, now=11765
>>>> I0105 12:30:25.271000000   196 
>>>> src/core/ext/transport/chttp2/transport/stream_lists.cc:125] 
>>>> 00497270[5][svr]: add to stalled_by_transport
>>>> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:517] 
>>>>   .. shard[9]: heap_empty=true
>>>> D0105 12:30:25.271000000   196 
>>>> src/core/ext/transport/chttp2/transport/flow_control.cc:117] 
>>>> 00498C78[0][svr] | t updt sent | trw:                             0, tlw:  
>>>>  
>>>>                     4183485, taw:                       4187469, srw:      
>>>>  
>>>>                        , slw:                              , saw:          
>>>>  
>>>>                    
>>>> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:492] 
>>>>   .. shard[9]->queue_deadline_cap --> 12764
>>>> I0105 12:30:25.272000000   196 
>>>> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] 
>>>> W:00497270 
>>>> SERVER [ipv4:10.1.0.1:41200] state WRITING -> IDLE [begin writing 
>>>> nothing]
>>>> I0105 12:30:25.272000000   216 src/core/lib/iomgr/timer_generic.cc:558] 
>>>>   .. shard[9] popped 0
>>>>
>>>>
>>>> Op woensdag 5 januari 2022 om 11:54:13 UTC+1 schreef John Forensics:
>>>>
>>>>> I managed to extract some logging, the python side seems to send a too 
>>>>> large frame.
>>>>>
>>>>> [id: 0x0e0792ca, L:/127.0.0.1:43718 - R:localhost/127.0.0.1:60630] 
>>>>> Sent GOAWAY: lastStreamId '2147483647 <(214)%20748-3647>', errorCode 
>>>>> '6', debugData 'Frame length: 2165320 exceeds maximum: 16384'. Forcing 
>>>>> shutdown of the connection.
>>>>>
>>>>>
>>>>> Op dinsdag 4 januari 2022 om 16:42:12 UTC+1 schreef John Forensics:
>>>>>
>>>>>> I am running identical Python code from either 
>>>>>>
>>>>>> WINE  python.exe   versus Linux native python 
>>>>>>
>>>>>> The behaviour is slightly different, leading eventually to* losing 
>>>>>> the communicatie with a WINE python variant (messages/calls not received 
>>>>>> at 
>>>>>> the java client side)*
>>>>>>
>>>>>> The most visible difference I see is the number of HTTP2 Streams that 
>>>>>> are created. The linux native runs stay at a low number, and most 
>>>>>> communication is done within one stream, while the windows variant seems 
>>>>>> to 
>>>>>> distribute it evenly over more streams. 
>>>>>>
>>>>>> I do not understand why the gRPC layer would behave differently. I 
>>>>>> currently try to understand the problem using wireshark, but thats not 
>>>>>> the 
>>>>>> right way to do it.
>>>>>>
>>>>>> Any tips on how to debug this would be appreciated? I do not see any 
>>>>>> exceptions explaining why a message isn't received at the java side.
>>>>>>
>>>>>> If people have some hypotheses for me to test, that would be nice as 
>>>>>> well.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Ruud
>>>>>>
>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/7fc1c587-bbc0-4e58-a68d-7ecdf81ba549n%40googlegroups.com.

[grpc-io] Re: gRPC python behaves differently from WINE Python vs Linux Python

Reply via email to