mlibbey opened a new issue #7544: URL: https://github.com/apache/trafficserver/issues/7544
Issue Large file, cache miss, transfer-encoding: chunked download (2+Gb) consistently starts transferring data, stalls, then eventually gets timeout on HTTP/2 connection. When forcing the download to use HTTP/1.1 the stall stops reproducing. After the reporter disabled Transfer-Encoding: chunked, we can’t reproduce the issue. Also note that 2GB is not a magic number here -- file size has not been part of testing. Conditions * 8.1.x ( 8.1.1-28, 8.1.2-30 ) * HTTP/2 * Transfer-Encoding: chunked * Large File (over 2GB) * http.flow_control (probably) Observations Repro 1 Debug Logs debug.tags = http2 ``` Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.349] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944271024 Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.349] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [5331335] [1] Not write avail Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.371] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944303784 Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.371] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [5331335] [1] Not write avail Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.392] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944336544 Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.392] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [5331335] [1] Not write avail Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.413] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944369304 Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.413] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [5331335] [1] Not write avail Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.434] [ET_NET 23] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [5331335] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=240686922, write_vio.write_avail=0, reader.read_avail=7944402064 Feb 19 03:03:11 s_sys@machinename traffic_manager[2318]: [Feb 19 03:03:11.434] [ET_NET 23] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [5331335] [1] Not write avail ``` * write_vio.ndone=240686922 and Not write avail repeats many times. Definitely, ATS is not sending any DATA frame. * reader.read_avail is increasing. HTTP/2 vs HTTP/1.1 On HTTP/1.1 connection, user agent and cache write are setup as http tunnel consumer. OTOH, HTTP2 connection, only user agent is setup. Questions Q1. Why the MIOBuffer becomes unwritable? ua_session->write_avail() keep returning 0. This means the Http2ClientSession::write_buffer is not consumed at all. When the write operation (SSL_write from load_buffer_and_write) succeeded, the buffer is consumed. Actually, it looks like the write operation is not enabled. debug.tags = “ssl|http2” on machinename ``` Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982305152 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [1237] [1] Not write avail Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=32768 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=24576 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=16384 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:318 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] bytes_read=32768 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> (http_tunnel) [3616] producer_handler [http server VC_EVENT_READ_READY] Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> (http_tunnel) [3616] producer_handler_chunked [http server VC_EVENT_READ_READY] Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:253 (read_chunk)> (http_chunk) completed read of chunk of 16384 bytes Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:177 (read_size)> (http_chunk) read chunk size of 32768 bytes Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:257 (read_chunk)> (http_chunk) read 25616 bytes of an 32768 chunk Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <HttpTunnel.cc:1201 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982337912 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.858] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [1237] [1] Not write avail Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=32768 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=24576 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=16384 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:236 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] b→write_avail()=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:241 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] nread=8192 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <SSLNetVConnection.cc:318 (ssl_read_from_net)> (ssl) [SSL_NetVConnection::ssl_read_from_net] bytes_read=32768 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> (http_tunnel) [3616] producer_handler [http server VC_EVENT_READ_READY] Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> (http_tunnel) [3616] producer_handler_chunked [http server VC_EVENT_READ_READY] Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:253 (read_chunk)> (http_chunk) completed read of chunk of 32768 bytes Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:177 (read_size)> (http_chunk) read chunk size of 32768 bytes Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:257 (read_chunk)> (http_chunk) read 25608 bytes of an 32768 chunk Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <HttpTunnel.cc:1201 (producer_handler)> (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 100, state: 0 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [1237] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=427599765, write_vio.write_avail=0, reader.read_avail=982370672 Feb 19 05:41:05 s_sys@machinename traffic_manager[25561]: [Feb 19 05:41:05.859] [ET_NET 7] DEBUG: <Http2ConnectionState.cc:1514 (send_a_data_frame)> (http2_con) [1237] [1] Not write avail ``` While a ton of Not write avail logs appear, ATS keeps reading from origin server and no write operation. Reading data from origin server is blocking writing to the client by any chance? Q2. Why ATS keeps receiving response from origin server? This might be key question. To avoid buffering response body too much, it suppose to be that read operation is disabled if write operation is unavailable. Remember how the slow client problem happens. Q3. http flow control is related? This might be red herring, but some “Throttle” debug logs are observed. ``` Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.934] [ET_NET 8] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> (http_tunnel) [6398] producer_handler [http server VC_EVENT_READ_READY] Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.934] [ET_NET 8] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> (http_tunnel) [6398] producer_handler_chunked [http server VC_EVENT_READ_READY] Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.935] [ET_NET 8] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [2178] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=744251821, write_vio.write_avail=154, reader.read_avail=1209261936 Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.935] [ET_NET 8] DEBUG: <HttpTunnel.cc:1363 (consumer_handler)> (http_tunnel) [6398] consumer_handler [user agent VC_EVENT_WRITE_READY] Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1303 (consumer_reenable)> (http_tunnel) Throttle 0x7f902da4f198 755372784 / 1209261936 Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1157 (producer_handler)> (http_tunnel) [6398] producer_handler [http server VC_EVENT_READ_READY] Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.937] [ET_NET 8] DEBUG: <HttpTunnel.cc:1092 (producer_handler_chunked)> (http_tunnel) [6398] producer_handler_chunked [http server VC_EVENT_READ_READY] Feb 19 06:56:20 s_sys@machinename traffic_manager[25561]: [Feb 19 06:56:20.938] [ET_NET 8] DEBUG: <Http2Stream.cc:583 (update_write_request)> (http2_stream) [2178] [1] write_vio.nbytes=9223372036854775807, write_vio.ndone=744251821, write_vio.write_avail=162, reader.read_avail=1209294696 ``` This brings similar question as Q2. - why throttling on WRITE_READY event on consumer_handler instead on READ_READY event on producer_handler. (and thanks to Masaori for this writeup) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
