bdgranger opened a new issue, #9275:
URL: https://github.com/apache/trafficserver/issues/9275

   **Configuration**
   - ATS 8.1.x, single-tier CDN (no parent caches, just edge caches and origins)
   - proxy.config.http.parent_proxy_routing_enable set to 0 or 
proxy.config.http.parent_proxy_routing_enable set to 1 with empty parent.config
   - origin is a redirecting origin that always responds with a 307 to the 
"actual" origin
   - hosting.config uses volume 2 for host of original request, but has no 
entry for IP address of redirected host
   
   **The basic gist of what is happening is:**
   - the first request for an asset on a DS comes in and is remapped toorigin, 
which is mapped to volume 2 (ramdisk) by hosting.config
   - the object is not found and results in the expected cache miss
   - ATS opens a write lock on the ramdisk for the asset and issues a request 
to the origin to fetch it
   - the origin responds with a 307
   - since ATS is configured to follow redirections, a new read request is 
issued with the new URL
   - the host of the new URL is not mapped to ramdisk by hosting.config, so ATS 
looks for the object on the "generic" volume 1
   - the object is also not found on volume 1, so ATS takes a write lock on 
that volume and issues a request to the "real origin" to fetch
   - object is fetched successfully, written to volume 1, and streamed to client
   - volume 1 cache write lock is released
   - volume 2 (ramdisk) cache write lock is never released
   - When subsequent requests for the same asset are received 
(read_while_writer enabled):
   - - ATS looks for the object in ramdisk, since that is were the remapped 
origin is supposed to store content
   - - Because of the write lock, read waits for any content to appear in the 
ramdisk cache, which will never happen
   - - read retries the configured number of times, then fails and issues its 
own write request to origin
   - - ATS fails the configured number of times to open the write lock (because 
the first request still has it), then just goes straight to origin
   - - origin returns a 307 response to the same location as before and ATS 
follows it
   - - ATS looks in volume 1 based on the host of the "real origin", finds the 
content in cache and returns a HIT
   - - but by this time the client has already given up and aborted, depending 
on how many retries were configured and how long each waited
   
   We tried setting proxy.config.http.redirect_use_orig_cache_key to 1, but it 
appears to make no difference.
   
   We saw this when a customer upgraded from a system that had previously been 
using ATC 3.x which never generated empty parent.config files.  If we restore 
the following default line to parent.config which actually has no parents in 
it, it appears to solve the issue:
   
   `dest_domain=. parent="" round_robin=consistent_hash go_direct=false 
qstring=ignore`
   
   The difference appears to be that in the case of empty parent.config or 
proxy routing enable set to 0, the 307 is treated via HandleCacheMiss and a new 
ISSUE_WRITE is performed which opens a new CacheVC on the default volume.  When 
the default line is in parent.config, this code is executed instead:
   ```
   else if (s->dns_info.lookup_name[0] <= '9' && s->dns_info.lookup_name[0] >= 
'0' && s->parent_params->parent_table->hostMatch &&
                !s->http_config_param->no_dns_forward_to_parent) {
       // note, broken logic: ACC fudges the OR stmt to always be true,
       // 'AuthHttpAdapter' should do the rev-dns if needed, not here .
       TRANSACT_RETURN(SM_ACTION_DNS_REVERSE_LOOKUP, 
HttpTransact::StartAccessControl);
     }
   ```
   In this case there is a message from decideCacheLookup "will NOT do lookup" 
and ATS just reuses the CacheVC that was opened for the original lookup and 
actually writes to the ramdisk so that future requests for the object get the 
immediate cache hit in the ramdisk as expected. But this looks like it only 
worked because the redirect was to an IP address and not to another fqdn.
   
   It seems either something isn't quite right with the redirect following or 
something that is supposed to free the first CacheVC got missed, leaving the 
write lock in place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to