Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
I prepared a qemu-x86-64 disk image that can reproduce this symptom at https://drive.google.com/drive/folders/1ObSUu3DCF2r4tzcGrykhBCRpPSemHNiu After logging in as root with password root, systemd-nspawn -M container1 -n -b reproduces the symptom, that is, /bin/systemd-sysusers talks with rpcbind.socket before rpcbind.service is started, and hangs up. Ryutaroh
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
From: Michael Biebl Date: Sun, 9 Feb 2020 08:53:33 +0100 > Not sure what the right solution is. One might be, that the nis NSS > module handles this situation (rpcbind.socket running, rpcbind.service > not running) better, or that rpcbind.socket changes its ordering to > avoid this situation. > > I do think this should be fixed on the nis/rpcbind side though. I reassigned this report to the rpcbind Debian package... Ryutaroh
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Am 09.02.20 um 02:24 schrieb Ryutaroh Matsumoto: > Control: tags -1 + fixed - moreinfo > Control: retitle -1 Fix found: systemd-sysusers hangs if nis is > enabled in a systemd-nspawn container > > I found a solution (or a workaround). > > The problem is that > (1) systemd-sysusers tries to use Sun RPC (TCP connection to 127.0.0.1:111) > in a nis enabled container. To be more speficic, it's not systemd-sysusers directly, but the nis NSS module which appears to access the rpcbind socket. > (2) at that time, rpcbind.socket is ready (by systemd), and I guess this is actually a timinig issue and you seem to hit this condition that rpcbind.socket is already started in the container but not on the physical system. > (3) rpcbind.service is not ready, because rpcbind.service depends on > systemd-sysusers.service. More specifically, rpcbind.service has After=systemd-tmpfiles-setup.service and systemd-tmpfiles-setup.service has After=systemd-sysusers.service So the dependency chain is like this rpcbind.service->systemd-tmpfiles-setup.service->systemd-sysusers.service > (4) Then systemd-sysusers communication to 127.0.0.1:111 get stuck. > > When I made > > [Unit] > Before=rpcbind.socket > as /etc/systemd/system/systemd-sysusers.service.d/my.conf > > OR > > [Unit] > After=systemd-sysusers.service > as /etc/systemd/system/rpcbind.socket.d/my.conf > > Then the problem goes away!!! > > Anyway, the current dependency > rpcbind.socket -> systemd-sysusers.service -> rpcbind.service > looks nonsense. > > The remaining problem is where the above nonsense order of > dependency should be fixed, rpcbind Debian package or the systemd > upstream... Not sure what the right solution is. One might be, that the nis NSS module handles this situation (rpcbind.socket running, rpcbind.service not running) better, or that rpcbind.socket changes its ordering to avoid this situation. I do think this should be fixed on the nis/rpcbind side though. signature.asc Description: OpenPGP digital signature
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Control: tags -1 + fixed - moreinfo Control: retitle -1 Fix found: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container I found a solution (or a workaround). The problem is that (1) systemd-sysusers tries to use Sun RPC (TCP connection to 127.0.0.1:111) in a nis enabled container. (2) at that time, rpcbind.socket is ready (by systemd), and (3) rpcbind.service is not ready, because rpcbind.service depends on systemd-sysusers.service. (4) Then systemd-sysusers communication to 127.0.0.1:111 get stuck. When I made [Unit] Before=rpcbind.socket as /etc/systemd/system/systemd-sysusers.service.d/my.conf OR [Unit] After=systemd-sysusers.service as /etc/systemd/system/rpcbind.socket.d/my.conf Then the problem goes away!!! Anyway, the current dependency rpcbind.socket -> systemd-sysusers.service -> rpcbind.service looks nonsense. The remaining problem is where the above nonsense order of dependency should be fixed, rpcbind Debian package or the systemd upstream... Best regards, Ryutaroh
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Am 07.02.20 um 08:14 schrieb Ryutaroh Matsumoto: >> Do you have nscd installed inside the container (looking at the strace >> it appears you might have not). > > I installed "unscd" instead of "nscd", as "unscd" is said to be less buggy. > >> Is nscd installed outside of the container where you don't see the problem? > > The host running container also uses "unscd". > Its log is as below. Interestingly, I do not find TCP communication to > the Sun RPC port 111... Why??? Last time I used NIS was 15 years ago, so I doubt I can really help you with answering any NIS related questions. signature.asc Description: OpenPGP digital signature
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
> Do you have nscd installed inside the container (looking at the strace > it appears you might have not). I installed "unscd" instead of "nscd", as "unscd" is said to be less buggy. > Is nscd installed outside of the container where you don't see the problem? The host running container also uses "unscd". Its log is as below. Interestingly, I do not find TCP communication to the Sun RPC port 111... Why??? Some error message is in Japanese as the default locale is ja. Regarding on NIS configuration, the container and the host running the container seem almost the same to me... The host is Ubuntu 19.10, but it may not be related to this issue... Anyway, I have to leave my office now. So I cannot provide futher information until Monday morning... [Service] Type=oneshot RemainAfterExit=yes TimeoutSec=10s Environment=SYSTEMD_LOG_LEVEL=debug ExecStart=/usr/bin/strace -T -yy -D -f /bin/systemd-sysusers Feb 07 16:03:39 quadrop5000 strace[418]: /usr/bin/strace: Process 418 attached Feb 07 16:03:39 quadrop5000 strace[418]: execve("/bin/systemd-sysusers", ["/bin/systemd-sysusers"], 0x7ffe3b561190 /* 5 vars */) = 0 <0.151095> Feb 07 16:03:39 quadrop5000 strace[418]: brk(NULL) = 0x557f4a51b000 <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: arch_prctl(0x3001 /* ARCH_??? */, 0x7fff6e4c41a0) = -1 EINVAL (無効な引数です) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: access("/etc/ld.so.preload", R_OK) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.09> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.08> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls/haswell/x86_64", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls/haswell", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls/x86_64", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/haswell/x86_64", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/haswell", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/x86_64", 0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/systemd/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.07> Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd", {st_mode=S_IFDIR|0755, st_size=2148, ...}) = 0 <0.06> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.06> Feb 07 16:03:39 quadrop5000 strace[418]: fstat(3, {st_mode=S_IFREG|0644, st_size=67652, ...}) = 0 <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: mmap(NULL, 67652, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f30e04eb000 <0.06> Feb 07 16:03:39 quadrop5000 strace[418]: close(3) = 0 <0.04> Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.07> Feb 07 16:03:39 quadrop5000 strace[418]: read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360r\2\0\0\0\0\0"..., 832) = 832 <0.06> Feb 07 16:03:39 quadrop5000 strace[418]: lseek(3, 64, SEEK_SET) = 64 <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: read(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784) = 784 <0.05> Feb 07 16:03:39 quadrop5000 strace[418]: lseek(3, 848, SEEK_SET) = 848 <0.04> Feb 07 16:03:39 quadrop5000 strace[418]: read(3,
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
The container is started as systemd-nspawn -M bullseye --network-macvlan=eno1 -b The option --network-macvlan=eno1 is necessary so that the container can talk with the NIS server running on a third computer (not a container nor the host running the container). Ryutaroh From: Ryutaroh Matsumoto Subject: Re: Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container Date: Fri, 07 Feb 2020 15:54:54 +0900 (JST) > Hi Michael, > Thank you for paying attention to this.
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Do you have nscd installed inside the container (looking at the strace it appears you might have not). Does it help if you install nscd? Is nscd installed outside of the container where you don't see the problem? signature.asc Description: OpenPGP digital signature
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Hi Michael, Thank you for paying attention to this. > Do you have users/groups defined in > /usr/lib/sysusers.d/ or /etc/sysusers.d which are only resolvable via NIS? The above directories are untouched. The container was made by mmdebstrap --variant=debootstrap bullseye. /etc/passwd nor /etc/group is not modified manually except changing root password. No new user or group is added manually. > Can you run > SYSTEMD_LOG_LEVEL=debug /bin/systemd-sysusers > inside the container > If that hangs as well, please attach the output. > An strace might be helpful as well. I made the following /etc/systemd/system/systemd-sysusers.service [Unit] Description=Create System Users Documentation=man:sysusers.d(5) man:systemd-sysusers.service(8) DefaultDependencies=no Conflicts=shutdown.target After=systemd-remount-fs.service Before=sysinit.target shutdown.target systemd-update-done.service ConditionNeedsUpdate=/etc [Service] Type=oneshot RemainAfterExit=yes Environment=SYSTEMD_LOG_LEVEL=debug ExecStart=/usr/bin/strace -T -yy -D -f /bin/systemd-sysusers TimeoutSec=10s /etc/nsswitch.conf is as follows: passwd: files systemd nis group: files systemd nis shadow: files nis gshadow:files nis hosts: files dns networks: files protocols: db files services: db files ethers: db files rpc:db files netgroup: nis After logging in as root, journalctl -u systemd-sysusers shows the following. It seems that systemd-sysusers tries to write 127.0.0.1:111 (Sun RPC), and gets stuck. It is clear that no rpcbind.service or nis is running when systemd-sysusers is called by systemd. I have no idea what is "wrong". -- Reboot -- Feb 07 15:38:54 bullseye strace[23]: /usr/bin/strace: Process 19 attached Feb 07 15:38:54 bullseye strace[23]: execve("/bin/systemd-sysusers", ["/bin/systemd-sysusers"], 0x7ffee6748880 /* 5 vars */) = 0 <0.000155> Feb 07 15:38:54 bullseye strace[23]: brk(NULL) = 0x5574832f <0.08> Feb 07 15:38:54 bullseye strace[23]: access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.10> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.09> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/haswell/x86_64", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/haswell", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/x86_64", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/haswell/x86_64", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/haswell", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/x86_64", 0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08> Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd", {st_mode=S_IFDIR|0755, st_size=1828, ...}) = 0 <0.08> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.08> Feb 07 15:38:54 bullseye strace[23]: fstat(3, {st_mode=S_IFREG|0644, st_size=12490, ...}) = 0 <0.07> Feb 07 15:38:54 bullseye strace[23]: mmap(NULL, 12490, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd6e7a7b000 <0.08> Feb 07 15:38:54 bullseye strace[23]: close(3) = 0 <0.07> Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD,
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
Control: tags -1 + moreinfo Hi! Am 07.02.20 um 02:55 schrieb Ryutaroh Matsumoto: > * the above hang-up does not happen if "nis" is removed from > /etc/nsswitch.conf > * the above hang-up is NOT observed outside a container. > * I have no idea if it is an upstream issue. > * Both host and container run only NIS clients, and NIS server is running in > another host. > * NIS related Debian packages are as follows: Can you run SYSTEMD_LOG_LEVEL=debug /bin/systemd-sysusers inside the container If that hangs as well, please attach the output. An strace might be helpful as well. signature.asc Description: OpenPGP digital signature
Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container
One more question: Do you have users/groups defined in /usr/lib/sysusers.d/ or /etc/sysusers.d which are only resolvable via NIS? signature.asc Description: OpenPGP digital signature