bdgranger opened a new issue, #9274: URL: https://github.com/apache/trafficserver/issues/9274
We are seeing an issue with ATS 8.1.x where "sometimes" when doing a chunked transfer from origin, the read-while-writer readers stall and exit with a "ReadRead retries exhausted, bailing..." message, even though it can be seen that the writer of the content is continuing to write fragments to cache. One problem with this is that if the transfer exits early on a mid-tier cache, the edge cache will cache partial content due to the CTE sending "200 OK" as the initial response before transfer starts. From that point on, the edges will only return partial content to any clients requesting that object. Disabling read_while_writer causes the problem to go away, but then many concurrent requests go to the parent/origin which limits the usefulness of the cache. Size of the requested object does not appear to be a factor, nor does overall load on the system. The issue does not always occur, but when it does it appears to be related to having a collision in the hash bucket where the fragment being waited on by the reader is not the first entry in the linked list. Sample log output from a reproduction of the issue: **The first request starts the writer:** ``` [Dec 16 06:00:38.384] [ET_NET 2] DEBUG: (http_cache_write) [1468394] writing to cache with URL <URL> [Dec 16 06:00:38.384] [ET_NET 2] DEBUG: (cache_new) new 0x7f182c4cb840 ``` **55ms later a rww reader connects:** ``` [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 97D4357 In openReadFromWriter [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 97D4357 eKey: 2A76CBD3 # alts: 0, ndx: 0, # writers: 1 writer: 0x7f182c4cb840 [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 97D4357 http passed stage 1, closed: 0, frag: 22 [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 97D4357 closed: 0, fragment: 22, len: -1 starting first fragment ``` **The writer has already written 22 fragments in this time, so the reader reads through all those first, then it starts looking for a fragment that hasn't been written yet (93E3380A). It finds a non-matching fragment in that hash bucket but can't find the desired fragment in the bucket's linked list, so goes into ReadRead retrying:** (debug message about doc key not matching added by us...) ``` [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read) 0x7f18346b7300: Read complete on fragment 47772C6FAA43A3964E194BBD27239D7A. Length: data payload=23645 this fragment=23717 total doc=964553 prefix=72 [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadDone doc key (96A343AA) doesn't match. Will probe again [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retrying(0): 463831 [Dec 16 06:00:38.495] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retrying(1): 463831 ``` **The writer finally writes the expected fragment:** ```[Dec 16 06:00:38.532] [ET_NET 2] DEBUG: (cache_insert) 0x7f182c4cb840: WriteDone: 93E3380A, 97D4357, 11079``` **But the reader still can't find it until it finally times out 300ms after the fragment was written...** ``` [Dec 16 06:00:38.541] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retrying(2): 463831 [Dec 16 06:00:38.640] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retrying(3): 463831 [Dec 16 06:00:38.744] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retrying(4): 463831 [Dec 16 06:00:38.844] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: key: 93E3380A fKey: 97D4357 ReadRead retries exhausted, bailing..: 463831 ``` **Some time after the timeout, but still before the entire segment is downloaded, another rww reader connects:** ``` [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: 97D4357 In openReadFromWriter [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: 97D4357 eKey: 2A76CBD3 # alts: 0, ndx: 0, # writers: 1 writer: 0x7f182c4cb840 [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: 97D4357 http passed stage 1, closed: 0, frag: 50 [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: 97D4357 closed: 0, fragment: 50, len: -1 starting first fragment ``` **By this point, the writer has written 50 fragments (the "stuck" fragment was 23 or 24). When the new reader gets to that previously stuck fragment, it reads it with no problem:** ``` [Dec 16 06:00:39.739] [ET_NET 2] DEBUG: (cache_read) 0x7f183c0c7a80: Read complete on fragment 6B4CA1FA0A38E3934E995AAF368CEF26. Length: data payload=11079 this fragment=11151 total doc=474910 prefix=72 [Dec 16 06:00:39.739] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: 93E3380A fKey: 97D4357 ReadDone last_collision: 0x7f17462649a4, dir_valid: 1 ``` **The writer finishes writing the object, and this second rww reader is able to completely read the object -- no other fragments get "stuck":** ``` [Dec 16 06:00:40.033] [ET_NET 2] DEBUG: (http_tunnel) [1468394] consumer_handler [cache write VC_EVENT_WRITE_COMPLETE] [Dec 16 06:00:40.033] [ET_NET 2] DEBUG: (cache_close) do_io_close 0x7f182c4cb840 -1 1 [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: key: D3D16139 fKey: 97D4357 ReadMain complete: 1935671 [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (http_tunnel) [1471017] producer_handler [cache read VC_EVENT_EOS] [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (http_tunnel) [1471017] producer_handler_dechunked [cache read VC_EVENT_EOS] [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_close) do_io_close 0x7f183c0c7a80 -1 1 [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_free) free 0x7f183c0c7a80 ``` From this point on, the edge cache that made the request that bailed early will return partial content of 463831 bytes, while the edge from the second request returns the full 1935671 bytes to any clients that connect to it. We have also seen this happen even on smaller files of ~50k in size, when being sent via chunked transfer encoding. Some time later (a couple minutes), we clear the cache on the edge that got the partial content and re-request the object and it is able to retrieve the full content out of the mid cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
