Hi, tl;dr: snapshots.debian.org reproducibly truncates around 1-2% of the downloads. While I think that's unusually high, it turns into a real problem in combination with "lazy" clients (or proxies) that do not rigorously check their response.
Longer description: I wrote a test script that tries a lot of HTTP GET requests on random resources from snapshot.debian.org, collecting statistics on the responses. Independent of the origin used [0], this reveals the following patterns: 98% - 99.5% of all requests return a good status code and the downloaded body matches in size and by sha1 hash. 0.5% - 1.5% of responses are truncated, i.e. the connection closes before all data is received < 0.5% other errors like 403, 503 or hash mismatches Interestingly, for immediate retries, the distribution changes a lot for the second attempt: 50% - 76% successes 17.6% - 50% truncated responses up to ca. 5% other errors Further retries show a roughly similar success rate, so that eventually, after enough retries, the client gets the expected response. (With deferred retries, giving varnish a chance to populate the cache, the second attempt stats tend a lot towards 100% success, therefore I conclude that the truncation failures are related to cache misses.) Manual inspection of the HTTP response headers from snapshot.debian.org reveals, that varnish is in use (and that somebody there is a Terry Pratchett fan). It seems varnish delivers cached resources with the usual "Content-Length" field in its headers, but uses "Transfer-Encoding: chunked" for cache misses. (Which sounds like an entirely reasonable thing.) But back to the facts: with an additional apt-cache-ng proxy in between, the success rate for follow-up requests drops to nearly 0%. The result distribution for the second attempt then looks more like: 97.4% hash mismatch 1.2% successes 1.2% error 503 For further attempts, these tend towards 100% hash mismatches, because apt-cache-ng stores an invalid (truncated) copy in its cache. I tested chunk transfer encoding of apt-cacher-ng a bit, but since version 0.8.0_pre2-1 of apt-cacher-ng, it seems handle chunking just fine [1]. And I tried to reproduce the error on the varnish side, but without success so far. So I'm not entirely sure where exactly the problem lies, but I am currently suspecting varnish because of two rather vague observations: a) on traffic intercepted between a apt-cacher-ng and varnish, I saw an incorrect resource length advertized by varnish (only once, by manual interception) [2]. I intend to try to reproduce this automatically to test the varnish side. b) a varnish issue that sounds related and got fixed just a couple days ago: https://github.com/varnishcache/varnish-cache/issues/2035 I very much doubt this fix has already made it to the snapshots infrastructure. It certainly didn't arrive in Debian, yet. I cannot proof either of these is really related or the cause of the hash mismatches for the combination with varnish with apt-cacher-ng. Before continuing to investigate, I simply wanted to share my findings. I'd welcome feedback as well as hints about the infrastructure, that might be relevant. I intend to continue analyzing varnish and apt-cacher-ng to be able to eventually solve this issue. Kind Regards Markus Wanner [0]: I tried from 5 different sources with highly varying bandwidth. Most of those are located in Europe, though. And during European daytime. [1]: Chunked transfer encoding seems to work since this commit: commit eebb3c93fd63ceb479b7100f71861069f5b16914 Author: Eduard Bloch <bl...@debian.org> Date: Sat Aug 30 20:33:49 2014 -0700 Fix for premature abort of chunked transfer [2]: The intercepted request on a file weighting 474 KiB: GET /archive/debian/20140818T083621Z/pool/main/libg/libgnomeui/libgnomeui-dev_2.24.5-2_kfreebsd-amd64.deb HTTP/1.1 User-Agent: Debian Apt-Cacher-NG/0.8.0 Host: snapshot.debian.org Range: bytes=122879- Accept: */* Accept-Encoding: identity Connection: keep-alive HTTP/1.1 206 Partial Content Date: Thu, 08 Dec 2016 14:46:21 GMT Server: Apache Expires: Sun, 18 Dec 2016 14:46:21 GMT Cache-Control: public, max-age=864000 ETag: "1aef9a4d688e1e1a0bd09427b894e47fd37413dc" Last-Modified: Mon, 05 Sep 2011 15:23:40 GMT X-Clacks-Overhead: GNU Terry Pratchett Content-Type: application/x-debian-package X-Varnish: 102013332 Age: 0 Via: 1.1 varnish-v4 Transfer-Encoding: chunked Connection: keep-alive Accept-Ranges: bytes Content-Range: bytes 122879-122879/122880 001 . 0 i.e. varnish answered with a chunk of one byte (which may or may not be correct) but a known incorrect value for the complete-range. I certainly wouldn't blame apt-cacher-ng for truncation of the file in that case.
signature.asc
Description: OpenPGP digital signature