Processed: Re: Bug#951722: autopkgtest suite flaky on arm64
Processing control commands: > tags -1 patch Bug #951722 [src:dovecot] autopkgtest suite flaky on arm64 Bug #953576 [src:dovecot] dovecot: flaky arm64 autopkgtest: debian/tests/usage/00_setup exited with return code 75 Added tag(s) patch. Added tag(s) patch. -- 951722: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951722 953576: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953576 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#951722: autopkgtest suite flaky on arm64
control: tags -1 patch Michael sent me his draft patch. Slightly modified version with added commit message attached. Diff is against latest upstream git version. Thanks Michael! >From 89399122692823bc215cf1097b05da4ee2201e0e Mon Sep 17 00:00:00 2001 From: Nis Martensen Date: Sun, 24 May 2020 22:05:42 +0200 Subject: [PATCH 1/2] systemd integration: notify service manager when ready With Type=simple or Type=forking, systemd does not really know when the service is ready to accept connections and might start depending services too early. Use Type=notify to explicitly tell the service manager when the service is ready. For a real problem caused by assuming readiness too early, please see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951722 For the meaning of the service type and details of the readiness protocol, see also the following links: https://www.freedesktop.org/software/systemd/man/systemd.service.html#Type= https://www.freedesktop.org/software/systemd/man/sd_notify.html As discussed in the last link, more elaborate state notifications are possible. This patch only implements the most basic part. Original patch prepared by Michael Biebl, with slight modification. --- dovecot.service.in | 3 +-- src/lib-master/master-service-settings.c | 2 +- src/master/main.c| 6 ++ 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/dovecot.service.in b/dovecot.service.in index 5c45f590b..a1df992c5 100644 --- a/dovecot.service.in +++ b/dovecot.service.in @@ -14,9 +14,8 @@ Documentation=http://wiki2.dovecot.org/ After=local-fs.target network-online.target [Service] -Type=simple +Type=notify ExecStart=@sbindir@/dovecot -F -PIDFile=@rundir@/master.pid ExecReload=@bindir@/doveadm reload ExecStop=@bindir@/doveadm stop PrivateTmp=true diff --git a/src/lib-master/master-service-settings.c b/src/lib-master/master-service-settings.c index 657ef66bc..c7b8b369c 100644 --- a/src/lib-master/master-service-settings.c +++ b/src/lib-master/master-service-settings.c @@ -62,7 +62,7 @@ static const struct setting_define master_service_setting_defines[] = { /* */ #ifdef HAVE_SYSTEMD -# define ENV_SYSTEMD " LISTEN_PID LISTEN_FDS" +# define ENV_SYSTEMD " LISTEN_PID LISTEN_FDS NOTIFY_SOCKET" #else # define ENV_SYSTEMD "" #endif diff --git a/src/master/main.c b/src/master/main.c index 6e0e68fe7..08bea05ed 100644 --- a/src/master/main.c +++ b/src/master/main.c @@ -26,6 +26,9 @@ #include "service-process.h" #include "service-log.h" #include "dovecot-version.h" +#ifdef HAVE_SYSTEMD +#include "sd-daemon.h" +#endif #include #include @@ -544,6 +547,9 @@ static void main_init(const struct master_settings *set) master_clients_init(); services_monitor_start(services); +#ifdef HAVE_SYSTEMD + sd_notify(0, "READY=1"); +#endif startup_finished = TRUE; } -- 2.20.1
Bug#951722: autopkgtest suite flaky on arm64
Am 23.05.20 um 11:14 schrieb Nis Martensen: > Thanks a lot Noah and Michael for working on this! > > Michael Biebl wrote: >> The patch to add sd_notify is rather trivial. Problem is, that dovecot >> unhelpfully clears the full environment. In src/master/main.c, >> sd_notify() should be called around/after main_init(). >> Unfortunately, at this point master_service_env_clean() has been called, >> clearing the process environment, including NOTIFY_SOCKET, which is >> passed from systemd to dovecot and is needed to make sd_notify work. >> >> I haven't found a way how to instruct dovecot not to clear the >> NOTIFY_SOCKET env var. > > I have no idea if this works, but did you try adding NOTIFY_SOCKET to > line 65 of src/lib-master/master-service-settings.c? This does the trick. Thanks, Nis. signature.asc Description: OpenPGP digital signature
Bug#951722: autopkgtest suite flaky on arm64
Thanks a lot Noah and Michael for working on this! Michael Biebl wrote: > The patch to add sd_notify is rather trivial. Problem is, that dovecot > unhelpfully clears the full environment. In src/master/main.c, > sd_notify() should be called around/after main_init(). > Unfortunately, at this point master_service_env_clean() has been called, > clearing the process environment, including NOTIFY_SOCKET, which is > passed from systemd to dovecot and is needed to make sd_notify work. > > I haven't found a way how to instruct dovecot not to clear the > NOTIFY_SOCKET env var. I have no idea if this works, but did you try adding NOTIFY_SOCKET to line 65 of src/lib-master/master-service-settings.c?
Bug#951722: autopkgtest suite flaky on arm64
I'm convinced that applying such hacks are a bad practice and should be avoided. I also have to add, that my motivation to further look into this has now basically dropped to zero. signature.asc Description: OpenPGP digital signature
Bug#951722: autopkgtest suite flaky on arm64
On Fri, May 22, 2020 at 11:51:07PM +0200, Michael Biebl wrote: > > I will upload a new upstream version to sid containing the workaround > > for the test failures. I will leave this bug open, but will reduce the > > severity to 'normal'. In a subsequent upload, I will apply a patch to > > implement sd_notify and will resolve the bug. Please feel free to send > > a patch if you don't want to wait however long it'll take for me to get > > around to putting one together. > > Please don't apply this hack. If you don't want to fix this properly to > get a (newer) version into testing, please just disable the test for the > time being. If we don't test it, it can't be broken, right? > It's great that the autopkgtest suite unconvered a real issue. > Let's not mutilate the test suite. I think the test suite with the workaround in place has more value than the suite with this test completely disabled. If the service never becomes available, the test with the workaround will still detect the situation, which is exactly what it's there for. noah
Bug#951722: autopkgtest suite flaky on arm64
Am 22.05.20 um 07:29 schrieb Noah Meyerhans: > On Sun, May 10, 2020 at 11:06:26PM +0200, Michael Biebl wrote: >>> +echo "Waiting for the service to be available" >>> +c=0 >>> +while ! nc -z -U /var/run/dovecot/auth-userdb; do >>> + c=$(($c+1)) >>> + sleep 2 >>> + if [ $c -gt 30 ]; then >>> + echo "Timed out waiting for the service to be available" >&2 >>> + exit 1 >>> + fi >>> +done >> >> Looping until the service is ready appears to be a workaround/hack at >> best imho. > > I agree, however... > >> The dovecot service should only signal its readiness when the >> communication sockets are ready yet to accept connections. I.e. this >> autopkgtest appears to point at a real issue that should be fixed properly. > > I do not believe that this is an RC issue. In order to address the > stale upstream version and pending security updates in sid, and allow > the package to again enter bullseye, I propose the following: That's a policy determined by the release manager/ maintainers of debci. The recommended that I should file such issues with RC severity. If you don't agree with that policy, you should probably contact them directly. > I will upload a new upstream version to sid containing the workaround > for the test failures. I will leave this bug open, but will reduce the > severity to 'normal'. In a subsequent upload, I will apply a patch to > implement sd_notify and will resolve the bug. Please feel free to send > a patch if you don't want to wait however long it'll take for me to get > around to putting one together. Please don't apply this hack. If you don't want to fix this properly to get a (newer) version into testing, please just disable the test for the time being. It's great that the autopkgtest suite unconvered a real issue. Let's not mutilate the test suite. > Dovecot has been essentially unmaintained in Debian since August 2019, > and there's quite a backlog of work to do. I'm going to work on getting > it back into shape, but it will be a little while before it's where it > should be. It won't happen all at once.> > noah > The patch to add sd_notify is rather trivial. Problem is, that dovecot unhelpfully clears the full environment. In src/master/main.c, sd_notify() should be called around/after main_init(). Unfortunately, at this point master_service_env_clean() has been called, clearing the process environment, including NOTIFY_SOCKET, which is passed from systemd to dovecot and is needed to make sd_notify work. I haven't found a way how to instruct dovecot not to clear the NOTIFY_SOCKET env var. Regards, Michael signature.asc Description: OpenPGP digital signature
Bug#951722: autopkgtest suite flaky on arm64
On Sun, May 10, 2020 at 11:06:26PM +0200, Michael Biebl wrote: > > +echo "Waiting for the service to be available" > > +c=0 > > +while ! nc -z -U /var/run/dovecot/auth-userdb; do > > + c=$(($c+1)) > > + sleep 2 > > + if [ $c -gt 30 ]; then > > + echo "Timed out waiting for the service to be available" >&2 > > + exit 1 > > + fi > > +done > > Looping until the service is ready appears to be a workaround/hack at > best imho. I agree, however... > The dovecot service should only signal its readiness when the > communication sockets are ready yet to accept connections. I.e. this > autopkgtest appears to point at a real issue that should be fixed properly. I do not believe that this is an RC issue. In order to address the stale upstream version and pending security updates in sid, and allow the package to again enter bullseye, I propose the following: I will upload a new upstream version to sid containing the workaround for the test failures. I will leave this bug open, but will reduce the severity to 'normal'. In a subsequent upload, I will apply a patch to implement sd_notify and will resolve the bug. Please feel free to send a patch if you don't want to wait however long it'll take for me to get around to putting one together. Dovecot has been essentially unmaintained in Debian since August 2019, and there's quite a backlog of work to do. I'm going to work on getting it back into shape, but it will be a little while before it's where it should be. It won't happen all at once. noah
Bug#951722: autopkgtest suite flaky on arm64
Am 10.05.20 um 23:06 schrieb Michael Biebl: > On Sat, 7 Mar 2020 16:01:22 +0200 Mpampis Kostas > wrote: >> diff --git a/debian/tests/control b/debian/tests/control >> index 7abd238c3..5bf1dc94b 100644 >> --- a/debian/tests/control >> +++ b/debian/tests/control >> @@ -6,5 +6,5 @@ Tests: systemd >> Depends: dovecot-core, systemd-sysv >> >> Test-Command: run-parts --report --exit-on-error debian/tests/usage >> -Depends: dovecot-imapd, dovecot-pop3d, python3 >> +Depends: dovecot-imapd, dovecot-pop3d, python3, netcat-openbsd >> Restrictions: needs-root, breaks-testbed, allow-stderr >> diff --git a/debian/tests/usage/00_setup b/debian/tests/usage/00_setup >> index 2eeeb2f73..e90ca7e92 100755 >> --- a/debian/tests/usage/00_setup >> +++ b/debian/tests/usage/00_setup >> @@ -29,6 +29,17 @@ chown nobody:nogroup /srv/dovecot-dep8 >> echo "Restarting the service" >> systemctl restart dovecot >> >> +echo "Waiting for the service to be available" >> +c=0 >> +while ! nc -z -U /var/run/dovecot/auth-userdb; do >> +c=$(($c+1)) >> +sleep 2 >> +if [ $c -gt 30 ]; then >> +echo "Timed out waiting for the service to be available" >&2 >> +exit 1 >> +fi >> +done > > Looping until the service is ready appears to be a workaround/hack at > best imho. > > The dovecot service should only signal its readiness when the > communication sockets are ready yet to accept connections. I.e. this > autopkgtest appears to point at a real issue that should be fixed properly. Quickly glancing at dovecot.service, I see cat ./dovecot.service.in ... Type=simple This is problematic. Type=simple means, the service is considered ready as soon as the process has been forked off. In case of dovecot, this does not appear to be the correct choice, as the service is marked ready before it had a chance to setup its communication channels. See also https://www.lucas-nussbaum.net/blog/?p=877 My recommendation would be, that dovecot implements the systemd readiness protocol sd_notify: https://www.freedesktop.org/software/systemd/man/sd_notify.html If there are questions, please don't hesitate to ask. Michael signature.asc Description: OpenPGP digital signature
Bug#951722: autopkgtest suite flaky on arm64
On Sat, 7 Mar 2020 16:01:22 +0200 Mpampis Kostas wrote: > diff --git a/debian/tests/control b/debian/tests/control > index 7abd238c3..5bf1dc94b 100644 > --- a/debian/tests/control > +++ b/debian/tests/control > @@ -6,5 +6,5 @@ Tests: systemd > Depends: dovecot-core, systemd-sysv > > Test-Command: run-parts --report --exit-on-error debian/tests/usage > -Depends: dovecot-imapd, dovecot-pop3d, python3 > +Depends: dovecot-imapd, dovecot-pop3d, python3, netcat-openbsd > Restrictions: needs-root, breaks-testbed, allow-stderr > diff --git a/debian/tests/usage/00_setup b/debian/tests/usage/00_setup > index 2eeeb2f73..e90ca7e92 100755 > --- a/debian/tests/usage/00_setup > +++ b/debian/tests/usage/00_setup > @@ -29,6 +29,17 @@ chown nobody:nogroup /srv/dovecot-dep8 > echo "Restarting the service" > systemctl restart dovecot > > +echo "Waiting for the service to be available" > +c=0 > +while ! nc -z -U /var/run/dovecot/auth-userdb; do > + c=$(($c+1)) > + sleep 2 > + if [ $c -gt 30 ]; then > + echo "Timed out waiting for the service to be available" >&2 > + exit 1 > + fi > +done Looping until the service is ready appears to be a workaround/hack at best imho. The dovecot service should only signal its readiness when the communication sockets are ready yet to accept connections. I.e. this autopkgtest appears to point at a real issue that should be fixed properly. Regards, Michael signature.asc Description: OpenPGP digital signature
Bug#951722:
Hello, This doesn't seem to be arm64 related since the same occurs on ppc64el. I've been reproducing this failure consistently by running the autopkgtest suite on a stressed host. I think that the failure appears under similar high-load circumstances on the debian ci host. dovecot-lda communicates with dovecot through the socket at /var/run/dovecot/auth-userdb but on a stressed host it's possible for dovecot-lda to call connect() before listen() is called on this socket by dovecot. By applying the suggested patch, the failure has been vanished since dovecot-lda is called only after the socket becomes ready. Regards, Mpampis diff --git a/debian/tests/control b/debian/tests/control index 7abd238c3..5bf1dc94b 100644 --- a/debian/tests/control +++ b/debian/tests/control @@ -6,5 +6,5 @@ Tests: systemd Depends: dovecot-core, systemd-sysv Test-Command: run-parts --report --exit-on-error debian/tests/usage -Depends: dovecot-imapd, dovecot-pop3d, python3 +Depends: dovecot-imapd, dovecot-pop3d, python3, netcat-openbsd Restrictions: needs-root, breaks-testbed, allow-stderr diff --git a/debian/tests/usage/00_setup b/debian/tests/usage/00_setup index 2eeeb2f73..e90ca7e92 100755 --- a/debian/tests/usage/00_setup +++ b/debian/tests/usage/00_setup @@ -29,6 +29,17 @@ chown nobody:nogroup /srv/dovecot-dep8 echo "Restarting the service" systemctl restart dovecot +echo "Waiting for the service to be available" +c=0 +while ! nc -z -U /var/run/dovecot/auth-userdb; do + c=$(($c+1)) + sleep 2 + if [ $c -gt 30 ]; then + echo "Timed out waiting for the service to be available" >&2 + exit 1 + fi +done + echo "Sending a test message via the LDA" /usr/lib/dovecot/dovecot-lda -f "t...@example.com" -d dep8 <
Bug#951722: autopkgtest suite flaky on arm64
Source: dovecot Version: 1:2.3.7.2-1 Severity: serious User: debian...@lists.debian.org Usertags: flaky Hi, the autopkgtest suite appears to be flaky on arm64. Sometimes it succeeds, sometimes command1 fails [1]. This is problematic, as this (randomly) blocks other packages from entering testing, a recent example is systemd which is currently blocked because of autopkgtest for dovecot/1:2.3.7.2-1: amd64: Pass, arm64: Regression I asked on #debci, and they recommended to file this as RC bug. If you think this is an issue of the debci arm64 infrastructure, please get in touch with debian...@lists.debian.org. Regards, Michael [1] https://ci.debian.net/data/autopkgtest/unstable/arm64/d/dovecot/4294703/log.gz -- System Information: Debian Release: bullseye/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.4.0-4-amd64 (SMP w/4 CPU cores) Kernel taint flags: TAINT_WARN Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled