Bug#986749: apt-cacher-ng: acng still stops working with APOLL ADD failing and Bad file descriptor

2021-04-13 Thread Christian Meyer
Am Montag, dem 12.04.2021 um 21:13 +0200 schrieb Eduard Bloch:
> Hallo,
> * Christian Meyer [Sun, Apr 11 2021, 01:26:01PM]:
> > Package: apt-cacher-ng
> > Version: 3.6.3-1
> > Severity: important
> > X-Debbugs-Cc: c2h...@web.de
>
> What you describe here does, all in one, not make much sense. Too
> many symptoms coming together which are not adding up. Are you sure
> that there is no bad memory or system instability involved?

Until now I have no hint to that direction, but I will test memory and
/ or replace the machine.

> Please send the output of
> journalctl -b0 -u apt-cacher-ng

For today (and cron driven restarts on errors every 3 minutes) it is:

Apr 13 07:57:37 121-My apt-cacher-ng[202463]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 08:00:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 08:00:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 08:00:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 08:00:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
55.182s CPU time.
Apr 13 08:00:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 08:00:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 08:00:40 121-My apt-cacher-ng[389015]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 08:03:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 08:03:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 08:03:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 08:03:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
2.402s CPU time.
Apr 13 08:03:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 08:03:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 10:42:12 121-My apt-cacher-ng[391899]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 10:43:25 121-My apt-cacher-ng[391899]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 10:45:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 10:45:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 10:45:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 10:45:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
56.112s CPU time.
Apr 13 10:45:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 10:45:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 11:36:26 121-My apt-cacher-ng[414217]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 11:39:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 11:39:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 11:39:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 11:39:02 121-My systemd[1]: apt-cacher-ng.service: Consumed 2min
47.837s CPU time.
Apr 13 11:39:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 11:39:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.


> and also please set debug=7 in acng.conf or similar conf file.

I did so an now I'm waiting for new errors.

> If that's a crash, please install systemd-coredump service and we can
> debug the corefiles later.


> Oh, that's probably our "nice" apt hiding the actual error string and
> showing this dummy instead. It should match some error in the error
> log,
> though.
>
> This doesn't make sense. ACNG does not render this message string
> explicitly.
> Is there some proxy server involved?


Yes, there is an elderly squid upstream proxy:
Proxy: http://user:pass@myProxy:8080
and I
ReuseConnections: 0

These are tho only modifications of the config-file



> > my workaround is to automatically restart the service by a cronjob
> > (and after that /usr/lib/apt-cacher-ng/expire-caller.pl) when this
> > line is seen.
>
> And now that is getting really odd. Do you have some kind of
> application
> firewall installed, like apparmor?

I didn't install (or configure) it by hand, but obviously it is pulled
in by $something in Bullseye:

At least 'dpkg -i apparmor' says something like:

ii  apparmor2.13.6-10amd64user-space parser
utility for AppArmor
un  apparmor-profiles-extra (keine
Beschreibung vorhanden)
un  apparmor-utils  (keine
Beschreibung vorhanden)


> We can have an interactive Jitsi session 

Bug#986749: apt-cacher-ng: acng still stops working with APOLL ADD failing and Bad file descriptor

2021-04-12 Thread Eduard Bloch
Hallo,
* Christian Meyer [Sun, Apr 11 2021, 01:26:01PM]:
> Package: apt-cacher-ng
> Version: 3.6.3-1
> Severity: important
> X-Debbugs-Cc: c2h...@web.de
>
> Dear Maintainer,
>
> even after the recent acng updates I still see apt-cacher-ng (3.6.3-1 amd64) 
> stop working.

What you describe here does, all in one, not make much sense. Too many
symptoms coming together which are not adding up. Are you sure that
there is no bad memory or system instability involved?

> Apt throws a bunch of errors like:
>
> E: Fehlschlag beim Holen von 
> http://deb.debian.org/debian/dists/bullseye/main/Contents-all Fehler beim 
> Lesen vom Server: Verbindung wurde durch den Se
> rver auf der anderen Seite geschlossen. [IP: 127.0.0.1 3142]

Closing connection is a potential outcome in worst situations but the
connection wouldn't stay down permanently as seen in the rest of your
log. Actually, that might be caused by periodic crashes so systemd keeps
the server down for a while. Which brings me to the suspicion of system
instability.

Please send the output of
journalctl -b0 -u apt-cacher-ng

and also please set debug=7 in acng.conf or similar conf file.

If that's a crash, please install systemd-coredump service and we can
debug the corefiles later.

> E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden 
> ignoriert oder alte an ihrer Stelle benutzt.
> ...
>
> W: Das Depot »http://127.0.0.1:3142/deb.debian.org/debian bullseye Release« 
> enthält keine Release-Datei.
> E: Fehlschlag beim Holen von 
> http://127.0.0.1:3142/deb.debian.org/debian/dists/bullseye/main/binary-i386/Packages
>  503  Service Unavailable [IP: 127.0.
> 0.1 3142]

Oh, that's probably our "nice" apt hiding the actual error string and
showing this dummy instead. It should match some error in the error log,
though.

This doesn't make sense. ACNG does not render this message string explicitly.
Is there some proxy server involved?

> Get:99 http://deb.debian.org/debian bullseye/main amd64 krb5-user amd64 
> 1.18.3-4 [151 kB]
> Get:100 http://deb.debian.org/debian bullseye/main amd64 ldap-utils amd64 
> 2.4.57+dfsg-2 [206 kB]
> Get:101 http://deb.debian.org/debian bullseye/main amd64 
> linux-image-5.10.0-5-amd64 amd64 5.10.26-1 [53.4 MB]
> Err:101 http://deb.debian.org/debian bullseye/main amd64 
> linux-image-5.10.0-5-amd64 amd64 5.10.26-1
>   Undetermined Error [IP: 127.0.0.1 3142]
That's apt's insufficient error message which probably indicates a short
read. Would fit to the connection refusal series.

> Err:102 http://deb.debian.org/debian bullseye/main amd64 linux-image-amd64 
> amd64 5.10.26-1
>   Could not connect to 127.0.0.1:3142 (127.0.0.1). - connect (111: Connection 
> refused) [IP: 127.0.0.1 3142]
> Err:103 http://deb.debian.org/debian bullseye/main amd64 whois amd64 5.5.8
>   Unable to connect to 127.0.0.1:3142: [IP: 127.0.0.1 3142]

> Since systemctl often tells me something about "Epoll ADD" failing:
> # systemctl status apt-cacher-ng.service
> apt-cacher-ng[646]: [warn] Epoll ADD(1) on fd 14 failed. Old events were 0; 
> read change was 1 (add); write change was 0 (none); close change was 0 
> (none): Bad file descriptor
>
> my workaround is to automatically restart the service by a cronjob (and after 
> that /usr/lib/apt-cacher-ng/expire-caller.pl) when this line is seen.

And now that is getting really odd. Do you have some kind of application
firewall installed, like apparmor?

> More information:
>
> Yes, my internet connection is very slow (around 8000 kbit/s) and that's my 
> reason to run acng).

8mbit/s isn't awful slow and it shouldn't make a difference anyway.

> # less /var/log/apt-cacher-ng/apt-cacher.err.1
> Beside of tons of "Bad file descriptor" messages:
> Fri Apr  9 11:13:05 
> 2021|debrep/pool/main/f/ffmpeg/libavutil56_4.3.2-0+deb11u1_amd64.deb storage 
> error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
> Fri Apr  9 11:13:06 
> 2021|debrep/pool/main/c/codec2/libcodec2-0.9_0.9.2-4_amd64.deb storage error 
> [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
> Fri Apr  9 11:13:17 
> 2021|debrep/pool/main/p/pulseaudio/libpulsedsp_14.2-2_amd64.deb storage error 
> [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor

So I checked the code and it doesn't make sense. fileitem.cc:755 is the
only location reporting that and it wouldn't get there when the
filedescriptor is okay, and you cannot get that FD closed by any regular
code path.

We can have an interactive Jitsi session if you prefer to get this
solved in realtime and it can be made reproducible somehow.

Best regards,
Eduard.



Bug#986749: apt-cacher-ng: acng still stops working with APOLL ADD failing and Bad file descriptor

2021-04-11 Thread Christian Meyer
Package: apt-cacher-ng
Version: 3.6.3-1
Severity: important
X-Debbugs-Cc: c2h...@web.de

Dear Maintainer,

even after the recent acng updates I still see apt-cacher-ng (3.6.3-1 amd64) 
stop working.
Apt throws a bunch of errors like:

E: Fehlschlag beim Holen von 
http://deb.debian.org/debian/dists/bullseye/main/Contents-all Fehler beim Lesen 
vom Server: Verbindung wurde durch den Se
rver auf der anderen Seite geschlossen. [IP: 127.0.0.1 3142]
E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden 
ignoriert oder alte an ihrer Stelle benutzt.
...

W: Das Depot »http://127.0.0.1:3142/deb.debian.org/debian bullseye Release« 
enthält keine Release-Datei.
E: Fehlschlag beim Holen von 
http://127.0.0.1:3142/deb.debian.org/debian/dists/bullseye/main/binary-i386/Packages
 503  Service Unavailable [IP: 127.0.
0.1 3142]
E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden 
ignoriert oder alte an ihrer Stelle benutzt.
E: Paket adduser kann nicht gefunden werden.
E: Paket apt kann nicht gefunden werden.
...

...
Get:99 http://deb.debian.org/debian bullseye/main amd64 krb5-user amd64 
1.18.3-4 [151 kB]
Get:100 http://deb.debian.org/debian bullseye/main amd64 ldap-utils amd64 
2.4.57+dfsg-2 [206 kB]
Get:101 http://deb.debian.org/debian bullseye/main amd64 
linux-image-5.10.0-5-amd64 amd64 5.10.26-1 [53.4 MB]
Err:101 http://deb.debian.org/debian bullseye/main amd64 
linux-image-5.10.0-5-amd64 amd64 5.10.26-1
  Undetermined Error [IP: 127.0.0.1 3142]
Err:102 http://deb.debian.org/debian bullseye/main amd64 linux-image-amd64 
amd64 5.10.26-1
  Could not connect to 127.0.0.1:3142 (127.0.0.1). - connect (111: Connection 
refused) [IP: 127.0.0.1 3142]
Err:103 http://deb.debian.org/debian bullseye/main amd64 whois amd64 5.5.8
  Unable to connect to 127.0.0.1:3142: [IP: 127.0.0.1 3142]


Since systemctl often tells me something about "Epoll ADD" failing:
# systemctl status apt-cacher-ng.service
apt-cacher-ng[646]: [warn] Epoll ADD(1) on fd 14 failed. Old events were 0; 
read change was 1 (add); write change was 0 (none); close change was 0 (none): 
Bad file descriptor

my workaround is to automatically restart the service by a cronjob (and after 
that /usr/lib/apt-cacher-ng/expire-caller.pl) when this line is seen.



More information:

Yes, my internet connection is very slow (around 8000 kbit/s) and that's my 
reason to run acng).

# less /var/log/apt-cacher-ng/apt-cacher.err.1
Beside of tons of "Bad file descriptor" messages:
Fri Apr  9 11:13:05 
2021|debrep/pool/main/f/ffmpeg/libavutil56_4.3.2-0+deb11u1_amd64.deb storage 
error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
Fri Apr  9 11:13:06 
2021|debrep/pool/main/c/codec2/libcodec2-0.9_0.9.2-4_amd64.deb storage error 
[HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
Fri Apr  9 11:13:17 
2021|debrep/pool/main/p/pulseaudio/libpulsedsp_14.2-2_amd64.deb storage error 
[HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor

I only see a few different ones:
Fri Apr  9 10:57:49 2021|Failed to move stale item 
debrep/dists/bullseye/main/i18n/Translation-de.bz2 out of the way: No such file 
or directory
Fri Apr  9 10:59:29 2021|Failed to move stale item 
debrep/dists/bullseye/main/i18n/Translation-en.bz2 out of the way: No such file 
or directory
Fri Apr  9 11:05:10 
2021|debrep/pool/main/f/fonts-liberation/fonts-liberation_1.07.4-11_all.deb 
storage error [HTTP/1.1 503 Broken pipe], last errno: Broken pipe
Fri Apr  9 11:06:56 
2021|debrep/pool/main/l/lm-sensors/libsensors5_3.6.0-7_i386.deb storage error 
[HTTP/1.1 503 Connection reset by peer], last errno: Connection reset by peer
Fri Apr  9 11:12:51 
2021|debrep/pool/main/libr/libreoffice/libreoffice-base-drivers_7.0.4-3_amd64.deb
 storage error [HTTP/1.1 503 Broken pipe], last errno: Broken pipe


# journalctl
only shows something like:
Apr 03 00:34:03 121-My apt-cacher-ng[2900630]: [warn] fcntl(15, F_SETFL): Bad 
file descriptor
...
Apr 09 11:12:44 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 75 
failed. Old events were 0; read change was 1 (add); write change was 0 (n>
Apr 09 11:12:53 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 75 
failed. Old events were 0; read change was 1 (add); write change was 0 (n>
Apr 09 11:12:53 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 73 
failed. Old events were 0; read change was 1 (add); write change was 0 

Christian Meyer



-- Package-specific info:

-- System Information:
Debian Release: bullseye/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-5-amd64 (SMP w/4 CPU threads)
Kernel taint flags: TAINT_CRAP
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages apt-cacher-ng depends on:
ii  adduser  3.118
ii  debconf