So the stream is stalled as the congestion window is closed. Just verified 
with tcpdump and tcptrace, and can confirm thats the problem.

https://github.com/grpc/grpc/blob/2e3e3ccc3d5c40010a6a5555d6f824a5024176a7/src/core/ext/transport/chttp2/transport/writing.cc#L449

This explains why excessive logging and making my program inefficient helps 
(e.g. adding debug logging of gRPC itself).

Which changes the question: why isn't the stream unstalled when the 
congestion-window opens again..

This also seems something that would be different in terms of low level 
API  in Linux, Windows and WINE.

Op woensdag 5 januari 2022 om 14:32:20 UTC+1 schreef John Forensics:

> After sending some data (about 4 chunks of 16k of a single 1Mb gRPC 
> message)  the HTTP2-stream seems to be put on the "stalled" list. (Whatever 
> that may be). 
>
>
>  World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello W
> orld!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello Worl
> d!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.Hello 
> World!!!!.Hello World!!!!.Hello World!!!!.Hello World!!!!.'
> I0105 12:30:25.218000000   216 src/core/lib/iomgr/timer_generic.cc:558]   
> .. shard[13] popped 0
> I0105 12:30:25.258000000   208 
> src/core/lib/surface/completion_queue.cc:1074] RETURN_EVENT[00484AA0]: 
> QUEUE_TIMEOUT
> I0105 12:30:25.267000000   212 src/core/lib/iomgr/executor.cc:294] 
> EXECUTOR (default-executor) try to schedule 00499258 (short) to thread 0
> I0105 12:30:25.267000000   216 src/core/lib/iomgr/timer_generic.cc:613]   
> .. result --> 1, shard[13]->min_deadline 10496 --> 12695, now=11765
> I0105 12:30:25.267000000   208 
> src/core/lib/surface/completion_queue.cc:970] 
> grpc_completion_queue_next(cq=00484AA0, deadline=gpr_timespec { tv_sec: 
> 1641385825, tv_nsec: 341971712, clock_type: 1 }, reserved=00000000)
> I0105 12:30:25.268000000   196 src/core/lib/iomgr/executor.cc:242] 
> EXECUTOR (default-executor) [0]: execute
> I0105 12:30:25.268000000   216 src/core/lib/iomgr/timer_generic.cc:517]   
> .. shard[14]: heap_empty=true
> I0105 12:30:25.268000000   196 src/core/lib/iomgr/executor.cc:123] 
> EXECUTOR (default-executor) run 00499258
> I0105 12:30:25.268000000   216 src/core/lib/iomgr/timer_generic.cc:492]   
> .. shard[14]->queue_deadline_cap --> 12694
> I0105 12:30:25.268000000   196 
> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] W:00497270 
> SERVER [ipv4:10.1.0.1:41200] state WRITING+MORE -> WRITING [continue 
> writing]
> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:558]   
> .. shard[14] popped 0
> I0105 12:30:25.269000000   196 
> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] 
> 00497270[5][svr]: pop from writing
> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:613]   
> .. result --> 1, shard[14]->min_deadline 10496 --> 12695, now=11765
> I0105 12:30:25.269000000   196 
> src/core/ext/transport/chttp2/transport/stream_lists.cc:71] 
> 00497270[5][svr]: pop from writable
> I0105 12:30:25.269000000   216 src/core/lib/iomgr/timer_generic.cc:517]   
> .. shard[7]: heap_empty=true
> I0105 12:30:25.270000000   196 
> src/core/ext/transport/chttp2/transport/writing.cc:443] W:00497270 
> SERVER[5] im-(sent,send)=(1,0) announce=0
> I0105 12:30:25.270000000   216 src/core/lib/iomgr/timer_generic.cc:492]   
> .. shard[7]->queue_deadline_cap --> 12764
> D0105 12:30:25.270000000   196 
> src/core/ext/transport/chttp2/transport/flow_control.cc:117] 
> 00498C78[5][svr] | s updt sent | trw:                             0, tlw:   
>                     4183485, taw:                       4187469, srw:       
>                     288, slw:                       4187521, saw:           
>             4187521
> I0105 12:30:25.270000000   216 src/core/lib/iomgr/timer_generic.cc:558]   
> .. shard[7] popped 0
> D0105 12:30:25.270000000   196 
> src/core/ext/transport/chttp2/transport/writing.cc:163] 
> ipv4:10.1.0.1:41200:00497270 stream 5 moved to stalled list by transport. 
> This is FULLY expected to happen in a healthy program that is not seeing 
> flow control stalls. However, if you know that there are unwanted stalls, 
> here is some helpful data: 
> [fc:pending=656:pending-compressed=0:flowed=1048288:peer_initwin=4187516:t_win=0:s_win=288:s_delta=-1048288]
> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:613]   
> .. result --> 1, shard[7]->min_deadline 10626 --> 12765, now=11765
> I0105 12:30:25.271000000   196 
> src/core/ext/transport/chttp2/transport/stream_lists.cc:125] 
> 00497270[5][svr]: add to stalled_by_transport
> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:517]   
> .. shard[9]: heap_empty=true
> D0105 12:30:25.271000000   196 
> src/core/ext/transport/chttp2/transport/flow_control.cc:117] 
> 00498C78[0][svr] | t updt sent | trw:                             0, tlw:   
>                     4183485, taw:                       4187469, srw:       
>                        , slw:                              , saw:           
>                    
> I0105 12:30:25.271000000   216 src/core/lib/iomgr/timer_generic.cc:492]   
> .. shard[9]->queue_deadline_cap --> 12764
> I0105 12:30:25.272000000   196 
> src/core/ext/transport/chttp2/transport/chttp2_transport.cc:800] W:00497270 
> SERVER [ipv4:10.1.0.1:41200] state WRITING -> IDLE [begin writing nothing]
> I0105 12:30:25.272000000   216 src/core/lib/iomgr/timer_generic.cc:558]   
> .. shard[9] popped 0
>
>
> Op woensdag 5 januari 2022 om 11:54:13 UTC+1 schreef John Forensics:
>
>> I managed to extract some logging, the python side seems to send a too 
>> large frame.
>>
>> [id: 0x0e0792ca, L:/127.0.0.1:43718 - R:localhost/127.0.0.1:60630] Sent 
>> GOAWAY: lastStreamId '2147483647 <(214)%20748-3647>', errorCode '6', 
>> debugData 'Frame length: 2165320 exceeds maximum: 16384'. Forcing shutdown 
>> of the connection.
>>
>>
>> Op dinsdag 4 januari 2022 om 16:42:12 UTC+1 schreef John Forensics:
>>
>>> I am running identical Python code from either 
>>>
>>> WINE  python.exe   versus Linux native python 
>>>
>>> The behaviour is slightly different, leading eventually to* losing the 
>>> communicatie with a WINE python variant (messages/calls not received at the 
>>> java client side)*
>>>
>>> The most visible difference I see is the number of HTTP2 Streams that 
>>> are created. The linux native runs stay at a low number, and most 
>>> communication is done within one stream, while the windows variant seems to 
>>> distribute it evenly over more streams. 
>>>
>>> I do not understand why the gRPC layer would behave differently. I 
>>> currently try to understand the problem using wireshark, but thats not the 
>>> right way to do it.
>>>
>>> Any tips on how to debug this would be appreciated? I do not see any 
>>> exceptions explaining why a message isn't received at the java side.
>>>
>>> If people have some hypotheses for me to test, that would be nice as 
>>> well.
>>>
>>> Best regards,
>>>
>>> Ruud
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/450579ad-ad74-41b5-b0e0-3ba037e56d0fn%40googlegroups.com.

Reply via email to