BBlack has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/364606 )
Change subject: VCL: grace-within-TTL ...................................................................... VCL: grace-within-TTL The normal idea is that "grace" is a time beyond the TTL during which we serve known-stale responses while asynchronously refreshing. Because we're violating the indicated TTL, there's pressure to keep the grace-time very short, and thus also a desire to raise it to a slightly-less-short time when a backend might be unhealthy or slow. We were using a fixed 5m for healthy grace and a fixed 60m for the unhealthy case (for the few backends for which we detect health at all in this sense, which is none of the really important ones anyways). In some cases that 60m may hurt more than help, and in many others the 5m value isn't enough. In addition to the constraining "violating the TTL" problem of this approach, the resulting short grace times on long-TTL objects considerably reduce our chances of getting async refreshes at all, which results in more stalls and user-facing latency as objects expire. At first the answer seems to be to avoid all violations of the TTL and give ourselves broader grace windows by moving the entire grace window *inside* of the original TTL, while also making it larger (percentage-based), and slightly randomized (to swizzle away stampeding refresh effects). However, grace-after-ttl has its purpose and shouldn't be reduced to zero while doing the above. If the TTL being advertised by the application turns out to be accurate (it really changes its content exactly when the original TTL expires, and counts down TTL on its own responses until that point), then the opportunistic attempts at asynchronous grace-within-TTL refreshes turn out to be pointless, as they refresh the same TTL information we already had. And if we have no grace-after-ttl, this will result in a latency/stall-inducing synchronous refresh of a hot object the moment after expiry. The approach in this commit is to create larger, randomized grace-within-TTL windows to capture the cases where the TTL we have is probably inaccurate or capped, while also keeping a fixed 5 minute grace-after-TTL window in place as well for the above case. For TTLs >= 3750s, the grace-within-TTL window is randomly set at 8%-12% of the total TTL (up to ~3h in the maximal case, and down to ~5 mins in the minimal case). For TTLs in the range 3749-600, we choose a random grace-within-TTL value of 5m +/- 15s, which approaches half the total TTL as we approach the 600 mark. For TTLs in the range 599-300, the grace-within-TTL scales down to zero (grace-within-ttl = ttl - 300), and for anything under 300 the grace-within-ttl is zero. After this value is calculated in beresp.grace, it is moved inside the ttl via "beresp.ttl = beresp.ttl - beresp.grace", and then a grace-after-TTL of 5 minutes is added via "beresp.grace = beresp.grace + 5m". Change-Id: Ia09d2cae2dfc3bd02195e35f274dd5b04cd1ff84 --- M modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb 1 file changed, 40 insertions(+), 17 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/06/364606/1 diff --git a/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb b/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb index aa65804..7655eca 100644 --- a/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb +++ b/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb @@ -336,22 +336,12 @@ } sub wm_common_hit_grace { - if (obj.ttl < 0s) { - # TTL expired - if (std.healthy(req.backend_hint) && obj.grace > <%= @vcl_config.fetch("grace_healthy", "5m") %>) { - # Backend is healthy. Limit age to max vcl_config['grace_healthy'] - if (obj.ttl + <%= @vcl_config.fetch("grace_healthy", "5m") %> <= 0s) { - # No candidate for grace. Fetch a fresh object. - return (miss); - } - } else { - # Backend is sick, or object grace was < grace_healthy, so use full grace. - # We set beresp.grace in wm_common_backend_response. - if (obj.ttl + obj.grace <= 0s) { - # No candidate for grace. - return (miss); - } - } + // Grace is managed during backend_response, and we don't have health + // info for the apps that matter the most anyways + if (obj.ttl + obj.grace > 0s) { + return (deliver); + } else { + return (miss); } } @@ -393,7 +383,40 @@ set beresp.ttl = <%= @vcl_config.fetch("ttl_cap", "1d") %>; } - set beresp.grace = <%= @vcl_config.fetch("grace_sick", "60m") %>; + // Set grace-within-ttl at a randomized 8-12% of the TTL value + // for those with 3750s+ TTLs. + // This results in a minimum grace of 3750s * .08 = 300s (5m), + // and a maximum grace of 86400s * .12 = 10368s (~2.9h). + if (beresp.ttl >= 3750s) { + set beresp.grace = beresp.ttl * std.random(0.08, 0.12); + } + + // Our randomized grace-within-TTL floor for objects with + // <3750s but >=600s is 5m +/- 5% (15s), which can be up to + // ~half the TTL in the shortest case: + elsif (beresp.ttl >= 600s) { + set beresp.grace = 300s * std.random(0.95, 1.05); + } + + // In the 599-300 range, scale the above ~5 minutes + // grace-within-ttl down to zero with no randomization + elsif (beresp.ttl >= 300s) { + set beresp.grace = beresp.ttl - 300s; + } + + // Below 300s, no grace-within-TTL, to avoid recursing down by + // halves all the way to an accurate TTL rollover from the + // application. + else { + set beresp.grace = 0s; + } + + // Move the above grace value within the TTL: + set beresp.ttl = beresp.ttl - beresp.grace; + + // Add a standard 5 minute window *after* the TTL, to catch + // cases where the TTL was counting down accurately: + set beresp.grace = beresp.grace + 5m; } // Compress compressible things if the backend didn't already -- To view, visit https://gerrit.wikimedia.org/r/364606 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia09d2cae2dfc3bd02195e35f274dd5b04cd1ff84 Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: BBlack <bbl...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits