Hi, On Mon, 14 Sep 2020 18:50:44 +0200 Julian Andres Klode <j...@debian.org> wrote: > On Mon, Sep 14, 2020 at 05:18:20PM +0100, James Addison wrote: > > Package: snapshot.debian.org > > Followup-For: Bug #959518 > > X-Debbugs-Cc: j...@jp-hosting.net > > > > The issue appears reproducible at the moment with apt 1.8.2.1 compiled from > > source and the 'x.tar' configuration provided earlier. > > > > # apt source directory, post-build > > $ cmdline/apt -o Dir=$PWD/x -o Dir::Bin::Methods=$PWD/methods update && \ > > $ cmdline/apt -o Dir=$PWD/x -o Dir::Bin::Methods=$PWD/methods install -y > > openjdk-11-jdk > > > > ... > > > > Get:261 http://snapshot.debian.org/archive/debian-security/20200502T085134Z > > buster/updates/main amd64 openjdk-11-jdk-headless amd64 11.0.7+10-3~deb10u1 > > [215 MB] > > Err:261 http://snapshot.debian.org/archive/debian-security/20200502T085134Z > > buster/updates/main amd64 openjdk-11-jdk-headless amd64 11.0.7+10-3~deb10u1 > > Undetermined Error [IP: 193.62.202.27 80] > > > > This has occurred for a couple of different server IP addresses, including > > 185.17.185.185. > > We only care about unstable for this bug. There is a whole bunch of > changes in http code and they won't be backported to stable releases. > > Also, the previous comment by Alex Thiessen indicated that this is not a > bug in apt, but the server seems to close the connection, which means > there is nothing actionable here. > > If you can produce an issue with the version of apt in unstable, > and it does not reproduce with wget or curl, please open a new bug report for > it.
I'm very familiar with snapshot.d.o from the client perspective. Julian is correct, that it's the server closing the connection. But that doesn't mean that it's not at least a wishlist bug or feature request in apt. Let me explain a bit more. For several projects (debrebuild, debbisect, buildprofile QA, bootstrap.debian.net...) I regularly interact with snapshot.d.o. Doing this plainly with apt is deemed to fail miserably with errors like: # E: Failed to fetch [...] Error reading from server. Remote end closed connection # E: Failed to fetch [...] Hash Sum mismatch # E: Failed to fetch [...] Bad header line Bad header data # Err:118 [...] Connection timed out Yes, this is because of how snapshot.d.o throttles connections. For example without additional measures, the following will fail: $ curl http://snapshot.debian.org/archive/debian/20200909T084102Z/pool/main/q/qtwebengine-opensource-src/qtwebengine-opensource-src_5.14.2+dfsg1.orig.tar.xz >/dev/null curl: (18) transfer closed with 217347024 bytes remaining to read There are a couple of things that can be done to work around this problem when using curl by adding options like: --limit-rate=800k # this has the biggest effect --retry 10 --retry-connrefused --resolve snapshot.debian.org:80:193.62.202.27 But even those are not sufficient as snapshot.d.o will also cut the connection early enough such that curl will fail with "network unreachable" which is not a transient error, so curl will not retry establishing the connection. The only thing that reliably worked for me with snapshot.d.o was the pycurl based Python code at the end of this E-Mail. With that code, I can even download for a full day reliably from snapshot.d.o without ever having hit the Exception in the last line. But as things stand, it is impossible to reliably use apt together with snapshot.d.o. I'm not sure how to solve this problem. One way could surely be to approach snapshot.d.o and ask them to somehow lift their very heavy throttling policies. But another way to solve this problem would be to make apt more resilient about mirrors with heavy throttling policies. I can think of these wishlist bugs against apt: - allow to specify a maximum bytes per second value for downloads (this has the largest effect if set low enough) - allow to set an option that makes apt automatically retry when a transient error occurs - allow to set custom resolve addresses for domains like done in my code below I'm not saying that we shouldn't look into maybe making snapshot.d.o throttle less, because as things stand, it's impossible to use it together with apt. But there certainly also some things that apt can do and which will not only benefit people working with snapshot.d.o but also people who are otherwise using a mirror or proxy with heavy throttling. Thanks! cheers, josch def download(url): f = BytesIO() maxretries = 10 for retrynum in range(maxretries): try: c = pycurl.Curl() c.setopt( c.URL, url, ) # even 100 kB/s is too much sometimes c.setopt(c.MAX_RECV_SPEED_LARGE, 800 * 1024) # bytes per second c.setopt(c.CONNECTTIMEOUT, 30) # the default is 300 # sometimes, curl stalls forever and even ctrl+c doesn't work start = time.time() def progress(*data): # a download must not last more than 5 minutes # with 100 kB/s this means files cannot be larger than 31MB if time.time() - start > 5 * 60: print("transfer took too long") return 0 c.setopt(pycurl.NOPROGRESS, 0) c.setopt(pycurl.XFERINFOFUNCTION, progress) # $ host snapshot.debian.org # snapshot.debian.org has address 185.17.185.185 # snapshot.debian.org has address 193.62.202.27 # c.setopt(c.RESOLVE, ["snapshot.debian.org:80:185.17.185.185"]) if f.tell() != 0: c.setopt(pycurl.RESUME_FROM, f.tell()) c.setopt(c.WRITEDATA, f) c.perform() assert c.getinfo(c.RESPONSE_CODE) in [200, 206], c.getinfo(c.RESPONSE_CODE) c.close() return f.getvalue() except pycurl.error as e: code, message = e.args if code in [pycurl.E_PARTIAL_FILE, pycurl.E_COULDNT_CONNECT]: if retrynum == maxretries - 1: break print("retrying...") time.sleep(2 ** retrynum) continue else: raise raise Exception("failed too often...")
signature.asc
Description: signature