bdgranger opened a new issue, #9274:
URL: https://github.com/apache/trafficserver/issues/9274

   We are seeing an issue with ATS 8.1.x where "sometimes" when doing a chunked 
transfer from origin, the read-while-writer readers stall and exit with a 
"ReadRead retries exhausted, bailing..." message, even though it can be seen 
that the writer of the content is continuing to write fragments to cache. One 
problem with this is that if the transfer exits early on a mid-tier cache, the 
edge cache will cache partial content due to the CTE sending "200 OK" as the 
initial response before transfer starts. From that point on, the edges will 
only return partial content to any clients requesting that object.
   
   Disabling read_while_writer causes the problem to go away, but then many 
concurrent requests go to the parent/origin which limits the usefulness of the 
cache.
   
   Size of the requested object does not appear to be a factor, nor does 
overall load on the system. The issue does not always occur, but when it does 
it appears to be related to having a collision in the hash bucket where the 
fragment being waited on by the reader is not the first entry in the linked 
list.
   
   Sample log output from a reproduction of the issue:
   
   **The first request starts the writer:**
   ```
   [Dec 16 06:00:38.384] [ET_NET 2] DEBUG: (http_cache_write) [1468394] writing 
to cache with URL <URL>
   [Dec 16 06:00:38.384] [ET_NET 2] DEBUG: (cache_new) new 0x7f182c4cb840
   ```
   
   **55ms later a rww reader connects:**
   ```
   [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 97D4357 In openReadFromWriter
   [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 97D4357 eKey: 2A76CBD3 # alts: 0, ndx: 0, # writers: 1 writer: 
0x7f182c4cb840
   [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 97D4357 http passed stage 1, closed: 0, frag: 22
   [Dec 16 06:00:38.439] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 97D4357 closed: 0, fragment: 22, len: -1 starting first fragment
   ```
   
   **The writer has already written 22 fragments in this time, so the reader 
reads through all those first, then it starts looking for a fragment that 
hasn't been written yet (93E3380A). It finds a non-matching fragment in that 
hash bucket but can't find the desired fragment in the bucket's linked list, so 
goes into ReadRead retrying:** (debug message about doc key not matching added 
by us...)
   ```
   [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read) 0x7f18346b7300: Read 
complete on fragment 47772C6FAA43A3964E194BBD27239D7A. Length: data 
payload=23645 this fragment=23717 total doc=964553 prefix=72
   [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadDone doc key (96A343AA) doesn't match. Will 
probe again
   [Dec 16 06:00:38.445] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retrying(0): 463831
   
   [Dec 16 06:00:38.495] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retrying(1): 463831
   ```
   **The writer finally writes the expected fragment:**
   ```[Dec 16 06:00:38.532] [ET_NET 2] DEBUG: (cache_insert) 0x7f182c4cb840: 
WriteDone: 93E3380A, 97D4357, 11079```
   
   **But the reader still can't find it until it finally times out 300ms after 
the fragment was written...**
   ```
   [Dec 16 06:00:38.541] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retrying(2): 463831
   [Dec 16 06:00:38.640] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retrying(3): 463831
   [Dec 16 06:00:38.744] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retrying(4): 463831
   [Dec 16 06:00:38.844] [ET_NET 6] DEBUG: (cache_read_agg) 0x7f18346b7300: 
key: 93E3380A fKey: 97D4357 ReadRead retries exhausted, bailing..: 463831
   ```
   
   **Some time after the timeout, but still before the entire segment is 
downloaded, another rww reader connects:**
   ```
   [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: 97D4357 In openReadFromWriter
   [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: 97D4357 eKey: 2A76CBD3 # alts: 0, ndx: 0, # writers: 1 writer: 
0x7f182c4cb840
   [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: 97D4357 http passed stage 1, closed: 0, frag: 50
   [Dec 16 06:00:39.574] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: 97D4357 closed: 0, fragment: 50, len: -1 starting first fragment
   ```
   
   **By this point, the writer has written 50 fragments (the "stuck" fragment 
was 23 or 24).  When the new reader gets to that previously stuck fragment, it 
reads it with no problem:**
   ```
   [Dec 16 06:00:39.739] [ET_NET 2] DEBUG: (cache_read) 0x7f183c0c7a80: Read 
complete on fragment 6B4CA1FA0A38E3934E995AAF368CEF26. Length: data 
payload=11079 this fragment=11151 total doc=474910 prefix=72
   [Dec 16 06:00:39.739] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: 93E3380A fKey: 97D4357 ReadDone last_collision: 0x7f17462649a4, dir_valid: 
1
   ```
   
   **The writer finishes writing the object, and this second rww reader is able 
to completely read the object -- no other fragments get "stuck":**
   ```
   [Dec 16 06:00:40.033] [ET_NET 2] DEBUG: (http_tunnel) [1468394] 
consumer_handler [cache write VC_EVENT_WRITE_COMPLETE]
   [Dec 16 06:00:40.033] [ET_NET 2] DEBUG: (cache_close) do_io_close 
0x7f182c4cb840 -1 1
   
   [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_read_agg) 0x7f183c0c7a80: 
key: D3D16139 fKey: 97D4357 ReadMain complete: 1935671
   [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (http_tunnel) [1471017] 
producer_handler [cache read VC_EVENT_EOS]
   [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (http_tunnel) [1471017] 
producer_handler_dechunked [cache read VC_EVENT_EOS]
   [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_close) do_io_close 
0x7f183c0c7a80 -1 1
   [Dec 16 06:00:40.063] [ET_NET 2] DEBUG: (cache_free) free 0x7f183c0c7a80
   ```
   
   From this point on, the edge cache that made the request that bailed early 
will return partial content of 463831 bytes, while the edge from the second 
request returns the full 1935671 bytes to any clients that connect to it. We 
have also seen this happen even on smaller files of ~50k in size, when being 
sent via chunked transfer encoding.
   
   Some time later (a couple minutes), we clear the cache on the edge that got 
the partial content and re-request the object and it is able to retrieve the 
full content out of the mid cache.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to