mlibbey opened a new issue #7544:
URL: https://github.com/apache/trafficserver/issues/7544


   Issue
   
   Large file, cache miss, transfer-encoding: chunked download (2+Gb) 
consistently starts transferring data, stalls, then eventually gets timeout on 
HTTP/2 connection. When forcing the download to use HTTP/1.1 the stall stops 
reproducing.
   After the reporter disabled Transfer-Encoding: chunked, we can’t reproduce 
the issue.
   Also note that 2GB is not a magic number here -- file size has not been part 
of testing.
   
   Conditions
   
   * 8.1.x ( 8.1.1-28, 8.1.2-30 )
   * HTTP/2
   * Transfer-Encoding: chunked
   * Large File (over 2GB)
   * http.flow_control (probably)
   
   Observations
   
   Repro 1
   
   Debug Logs
   
   debug.tags = http2
   ```
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.349] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944271024
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.349] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [5331335] [1] Not write avail
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.371] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944303784
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.371] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [5331335] [1] Not write avail
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.392] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944336544
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.392] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [5331335] [1] Not write avail
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.413] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944369304
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.413] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [5331335] [1] Not write avail
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.434] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944402064
   Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 
03:03:11.434] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [5331335] [1] Not write avail
   ```
   * write_vio.ndone=240686922 and Not write avail repeats many times. 
Definitely, ATS is not sending any DATA frame. 
   * reader.read_avail is increasing. 
   
   HTTP/2 vs HTTP/1.1
   
   On HTTP/1.1 connection, user agent and cache write are setup as http tunnel 
consumer.
   OTOH, HTTP2 connection, only user agent is setup. 
   
   Questions
   
   Q1. Why the MIOBuffer becomes unwritable?
   
   ua_session->write_avail() keep returning 0. This means the 
Http2ClientSession::write_buffer is not consumed at all. When the write 
operation (SSL_write from load_buffer_and_write) succeeded, the buffer is 
consumed. 
   
   Actually, it looks like the write operation is not enabled.
   
   debug.tags = “ssl|http2” on machinename
   ```
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982305152
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [1237] [1] Not write avail
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=32768
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=24576
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=16384
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:318 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] bytes_read=32768
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> 
(http_tunnel) [3616] producer_handler [http server VC_EVENT_READ_READY]
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> 
(http_tunnel) [3616] producer_handler_chunked [http server VC_EVENT_READ_READY]
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:253 (read_chunk)> (http_chunk) 
completed read of chunk of 16384 bytes
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:177 (read_size)> (http_chunk) 
read chunk size of 32768 bytes
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:257 (read_chunk)> (http_chunk) 
read 25616 bytes of an 32768 chunk
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1201 (producer_handler)> 
(http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] 
event: 100, state: 0
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982337912
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.858] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [1237] [1] Not write avail
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=32768
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=24576
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=16384
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:318 (ssl_read_from_net)> 
(ssl) [SSL_NetVConnection::ssl_read_from_net] bytes_read=32768
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> 
(http_tunnel) [3616] producer_handler [http server VC_EVENT_READ_READY]
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> 
(http_tunnel) [3616] producer_handler_chunked [http server VC_EVENT_READ_READY]
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:253 (read_chunk)> (http_chunk) 
completed read of chunk of 32768 bytes
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:177 (read_size)> (http_chunk) 
read chunk size of 32768 bytes
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:257 (read_chunk)> (http_chunk) 
read 25608 bytes of an 32768 chunk
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1201 (producer_handler)> 
(http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] 
event: 100, state: 0
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982370672
   Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 
05:41:05.859] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 
(send_a_data_frame)> (http2_con) [1237] [1] Not write avail
   ```
   While a ton of Not write avail logs appear, ATS keeps reading from origin 
server and no write operation.
   Reading data from origin server is blocking writing to the client by any 
chance?
   
   Q2. Why ATS keeps receiving response from origin server? 
   
   This might be key question. To avoid buffering response body too much, it 
suppose to be that read operation is disabled if write operation is 
unavailable. Remember how the slow client problem happens.
   
   Q3. http flow control is related?
   
   This might be red herring, but some “Throttle” debug logs are observed.
   ```
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.934] [ET_NET 8] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> 
(http_tunnel) [6398] producer_handler [http server VC_EVENT_READ_READY]
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.934] [ET_NET 8] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> 
(http_tunnel) [6398] producer_handler_chunked [http server VC_EVENT_READ_READY]
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.935] [ET_NET 8] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [2178] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=744251821, write_vio.write_avail=154, 
reader.read_avail=1209261936
   
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.935] [ET_NET 8] DEBUG: <HttpTunnel.cc:1363 (consumer_handler)> 
(http_tunnel) [6398] consumer_handler [user agent VC_EVENT_WRITE_READY]
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1303 (consumer_reenable)> 
(http_tunnel) Throttle   0x7f902da4f198 755372784 / 1209261936
   
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> 
(http_tunnel) [6398] producer_handler [http server VC_EVENT_READ_READY]
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> 
(http_tunnel) [6398] producer_handler_chunked [http server VC_EVENT_READ_READY]
   Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 
06:56:20.938] [ET_NET 8] DEBUG: <Http2Stream.cc:583 (update_write_request)> 
(http2_stream) [2178] [1] write_vio.nbytes=9223372036854775807, 
write_vio.ndone=744251821, write_vio.write_avail=162, 
reader.read_avail=1209294696
   ```
   This brings similar question as Q2. - why throttling on WRITE_READY event on 
consumer_handler instead on READ_READY event on producer_handler.
   
   (and thanks to Masaori for this writeup)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to