Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi, On Fri, 26 Jan 2024 11:31:37 +0100 Chris Hofstaedtler wrote: Paul Gevers noted that src:pdns's autopkgtests fail every so often on a large amd64 debci worker and on s390x workers. Apparently a similar problem can be seen in src:pdns-recursor's debci runs. The issue (or at least some issue) seems to be kernel related. Due to issues with the backports kernel on arm64, we had to revert to the bookworm kernel and now pdns fails on arm64 too. On ppc64el and riscv64 the test passes for the last two months, both run a newer kernel (backports or even sid). However, s390x also runs a backports kernel and the issue still exists there. Paul By the way, if you want to use "exit 77" when conditions are not met, you also need to set the skippable restriction on those tests, otherwise the exit code is used like any other. OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
* Paul Gevers [240126 22:25]: > Hi zeha, > > On 26-01-2024 10:21, Chris Hofstaedtler wrote: > > I see this "works", but now the tests fail after one try on the > > problematic worker and then are never retried. Can this please be > > fixed? > > What do you have in mind? I think you need to wait until issue 166 [1] is > fixed, which I guess isn't going to happen soon. 166 seems like an option, or auto-retry on a different worker, if thats possible? Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi zeha, On 26-01-2024 10:21, Chris Hofstaedtler wrote: I see this "works", but now the tests fail after one try on the problematic worker and then are never retried. Can this please be fixed? What do you have in mind? I think you need to wait until issue 166 [1] is fixed, which I guess isn't going to happen soon. Paul [1] https://salsa.debian.org/ci-team/debci/-/issues/166 OpenPGP_signature.asc Description: OpenPGP digital signature
Processed: Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Processing commands for cont...@bugs.debian.org: > clone 1059995 -1 Bug #1059995 {Done: Chris Hofstaedtler } [src:pdns] pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable Bug 1059995 cloned as bug 1061554 > reopen -1 Bug #1061554 {Done: Chris Hofstaedtler } [src:pdns] pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 'reopen' may be inappropriate when a bug has been closed with a version; all fixed versions will be cleared, and you may need to re-add them. Bug reopened No longer marked as fixed in versions pdns/4.8.3-3. > reassign -1 systemd Bug #1061554 [src:pdns] pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable Bug reassigned from package 'src:pdns' to 'systemd'. No longer marked as found in versions pdns/4.8.3-2. Ignoring request to alter fixed versions of bug #1061554 to the same values previously set > found -1 systemd/254.3-1 Bug #1061554 [systemd] pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable Marked as found in versions systemd/254.3-1. > forwarded -1 https://github.com/systemd/systemd/issues/31037 Bug #1061554 [systemd] pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable Set Bug forwarded-to-address to 'https://github.com/systemd/systemd/issues/31037'. > thanks Stopping processing here. Please contact me if you need assistance. -- 1059995: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059995 1061554: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1061554 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
clone 1059995 -1 reopen -1 reassign -1 systemd found -1 systemd/254.3-1 forwarded -1 https://github.com/systemd/systemd/issues/31037 thanks Dear systemd Packagers, Paul Gevers noted that src:pdns's autopkgtests fail every so often on a large amd64 debci worker and on s390x workers. Apparently a similar problem can be seen in src:pdns-recursor's debci runs. As there is no pdns(-recursor) code running at this point, this seems to be a problem somewhere in the space of systemd <> lxc <> apparmor <> kernel. I've opened a bug with systemd upstream, unfortunately with very little info as I don't know how to provide additional info from within a debci run. Help with providing additional info would be very welcome. Thanks, Chris
Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi Paul, * Paul Gevers [240104 18:14]: > Can you figure out decent numbers for these? Below I printed the output of > lsipc and AFAICT SHMMAX is already pretty big ;) (and the same on all our > hosts, which is also true for MSGMAX). > > On the other hand, $(ipcs -a) doesn't show anything on the host, not even if > I let it run in a while-loop (1 second interval) while I schedule the test > of pdns. So, could this be a bug in systemd (which you claim below should be > handeling this) or is this just not really supported in lxc and do you need > a full VM. Because it works elsewhere, I feel more like a bug, and it would > not be the first instance where code fails to properly handle 64 cores or > 256GB or RAM. Likely, but it is probably in systemd or in lxc or in apparmor or elsewhere. > > > > I wouldn't know what to do about this, its not really under the > > > > control of src:pdns. > > > > > > Well, maybe check for it and fail gracefully? > > > > But how? systemd sets up the IPC namespace. > > exit with 77 when you detect problems and add the skippable restriction. I see this "works", but now the tests fail after one try on the problematic worker and then are never retried. Can this please be fixed? Thanks, Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
On Fri, Jan 12, 2024 at 08:02:53PM +0100, Paul Gevers wrote: > Hi, > > On 12-01-2024 12:36, Chris Hofstaedtler wrote: > > can you confirm two additional things please: > > > > 1) this happens only on the large host? > > https://ci.debian.net/packages/p/pdns/testing/s390x/41650331/ > > Seems it happens on our s390x host too (which has 10 debci workers running > in parallel). > > > 2) this does not or does happen with other packages also requesting > > the same settings from systemd, e.g. dnsdist or pdns-recursor? > > https://ci.debian.net/packages/d/dnsdist/ -> Page not found. > > pdns-recursor seems to be flaky as well on amd64 and all passing tests were > on one of the smaller hosts. pdns-recursor passes on s390x though. For now I've added the exit 77 hack in the pdns tests, but this is quite unsatisfying. I've opened an issue with systemd upstream, maybe someone there has any insight: https://github.com/systemd/systemd/issues/31037 Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi, On 12-01-2024 12:36, Chris Hofstaedtler wrote: can you confirm two additional things please: 1) this happens only on the large host? https://ci.debian.net/packages/p/pdns/testing/s390x/41650331/ Seems it happens on our s390x host too (which has 10 debci workers running in parallel). 2) this does not or does happen with other packages also requesting the same settings from systemd, e.g. dnsdist or pdns-recursor? https://ci.debian.net/packages/d/dnsdist/ -> Page not found. pdns-recursor seems to be flaky as well on amd64 and all passing tests were on one of the smaller hosts. pdns-recursor passes on s390x though. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi, can you confirm two additional things please: 1) this happens only on the large host? 2) this does not or does happen with other packages also requesting the same settings from systemd, e.g. dnsdist or pdns-recursor? Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi, On 04-01-2024 17:28, Chris Hofstaedtler wrote: On Thu, Jan 04, 2024 at 03:37:21PM +0100, Paul Gevers wrote: Hi, On 04-01-2024 15:08, Chris Hofstaedtler wrote: It would seem that the host runs out of IPC space? What is IPC space? https://manpages.debian.org/bookworm/manpages/sysvipc.7.en.html https://manpages.debian.org/bookworm/manpages/ipc_namespaces.7.en.html And when does a host run out of it? As I said, this is one of our most powerful hosts, so I would expect it to run out of things last. Does it run more tests in parallel than other workers, or so? Yes, this host (like most of our host, but a bit more) runs multiple lxc based debci workers. My guess: the default limits are static, and if LXC doesn't do anything special, the limits are probably shared with the whole host. kernel.shmmax, kernel.msgmax are I think the limits (but I'm not entirely sure). Can you figure out decent numbers for these? Below I printed the output of lsipc and AFAICT SHMMAX is already pretty big ;) (and the same on all our hosts, which is also true for MSGMAX). On the other hand, $(ipcs -a) doesn't show anything on the host, not even if I let it run in a while-loop (1 second interval) while I schedule the test of pdns. So, could this be a bug in systemd (which you claim below should be handeling this) or is this just not really supported in lxc and do you need a full VM. Because it works elsewhere, I feel more like a bug, and it would not be the first instance where code fails to properly handle 64 cores or 256GB or RAM. I wouldn't know what to do about this, its not really under the control of src:pdns. Well, maybe check for it and fail gracefully? But how? systemd sets up the IPC namespace. exit with 77 when you detect problems and add the skippable restriction. Or, since a couple of days, if qemu VM don't run out of IPC space, we could run them in qemu always. I imagine a fully separated VM would not run out of IPC space, indeed. I just ran the test in qemu on ci-worker13 and it PASSed. Paul root@ci-worker13:~# lsipc RESOURCE DESCRIPTION LIMIT USED USE% MSGMNI Number of message queues 32000 0 0.00% MSGMAX Max size of message (bytes) 8K - - MSGMNB Default max size of queue (bytes) 16K - - SHMMNI Shared memory segments4096 0 0.00% SHMALL Shared memory pages 18446744073692774399 0 0.00% SHMMAX Max size of shared memory segment (bytes) 16E - - SHMMIN Min size of shared memory segment (bytes) 1B - - SEMMNI Number of semaphore identifiers 32000 0 0.00% SEMMNS Total number of semaphores 102400 0 0.00% SEMMSL Max semaphores per semaphore set.32000 - - SEMOPM Max number of operations per semop(2) 500 - - SEMVMX Semaphore max value 32767 - - OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
On Thu, Jan 04, 2024 at 03:37:21PM +0100, Paul Gevers wrote: > Hi, > > On 04-01-2024 15:08, Chris Hofstaedtler wrote: > > It would seem that the host runs out of IPC space? > > What is IPC space? https://manpages.debian.org/bookworm/manpages/sysvipc.7.en.html https://manpages.debian.org/bookworm/manpages/ipc_namespaces.7.en.html > And when does a host run out of it? As I said, this is > one of our most powerful hosts, so I would expect it to run out of things > last. > > > Does it run more tests in parallel than other workers, or so? > > Yes, this host (like most of our host, but a bit more) runs multiple lxc > based debci workers. My guess: the default limits are static, and if LXC doesn't do anything special, the limits are probably shared with the whole host. kernel.shmmax, kernel.msgmax are I think the limits (but I'm not entirely sure). > > I wouldn't know what to do about this, its not really under the > > control of src:pdns. > > Well, maybe check for it and fail gracefully? But how? systemd sets up the IPC namespace. > Or, since a couple of days, if > qemu VM don't run out of IPC space, we could run them in qemu always. I imagine a fully separated VM would not run out of IPC space, indeed. Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Hi, On 04-01-2024 15:08, Chris Hofstaedtler wrote: It would seem that the host runs out of IPC space? What is IPC space? And when does a host run out of it? As I said, this is one of our most powerful hosts, so I would expect it to run out of things last. Does it run more tests in parallel than other workers, or so? Yes, this host (like most of our host, but a bit more) runs multiple lxc based debci workers. I wouldn't know what to do about this, its not really under the control of src:pdns. Well, maybe check for it and fail gracefully? Or, since a couple of days, if qemu VM don't run out of IPC space, we could run them in qemu always. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
On Thu, Jan 04, 2024 at 02:42:59PM +0100, Paul Gevers wrote: > 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed > to set up IPC namespacing: Resource temporarily unavailable > 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed > at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily > unavailable It would seem that the host runs out of IPC space? Does it run more tests in parallel than other workers, or so? I wouldn't know what to do about this, its not really under the control of src:pdns. Chris
Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable
Source: pdns Version: 4.8.3-2 Severity: serious User: debian...@lists.debian.org Usertags: flaky Dear maintainer(s), I looked at the results of the autopkgtest of your package. I noticed that it regularly fails. The failures seem related on the host that runs the test. ci-worker13 is a beefy machine [1] and test seem to fail consistently there, while the other amd64 workers are much more moderate [2] and tests pass there. Because the unstable-to-testing migration software now blocks on regressions in testing, flaky tests, i.e. tests that flip between passing and failing without changes to the list of installed packages, are causing people unrelated to your package to spend time on these tests. Don't hesitate to reach out if you need help and some more information from our infrastructure. Paul [1] https://metal.equinix.com/product/servers/m3-large/ [2] https://aws.amazon.com/ec2/instance-types/m5/ https://ci.debian.net/packages/p/pdns/testing/amd64/ https://ci.debian.net/data/autopkgtest/testing/amd64/p/pdns/41325109/log.gz 268s + service pdns restart 269s Job for pdns.service failed because the control process exited with error code. 269s See "systemctl status pdns.service" and "journalctl -xeu pdns.service" for details. 269s + journalctl _SYSTEMD_UNIT=pdns.service -n 10 --no-pager 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily unavailable 269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service: Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily unavailable 269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service: Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily unavailable 269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service: Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily unavailable 269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable 269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service: Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily unavailable 269s ++ mktemp 269s + TMPFILE=/tmp/tmp.jah1Y5TJIa 269s + trap cleanup EXIT 269s + tee /tmp/tmp.jah1Y5TJIa 269s + sdig 127.0.0.1 53 smoke.pgsql.example.org A 279s Fatal: Timeout waiting for data 279s + grep -c '127\.0\.0\.222' /tmp/tmp.jah1Y5TJIa 279s 0 279s + echo smoke.pgsql.example.org could not be resolved 279s smoke.pgsql.example.org could not be resolved 279s + exit 1 279s + cleanup OpenPGP_signature.asc Description: OpenPGP digital signature