Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-09 Thread Ryutaroh Matsumoto
I prepared a qemu-x86-64 disk image that can reproduce this symptom at
https://drive.google.com/drive/folders/1ObSUu3DCF2r4tzcGrykhBCRpPSemHNiu

After logging in as root with password root,
systemd-nspawn -M container1 -n -b
reproduces the symptom, that is,
/bin/systemd-sysusers talks with rpcbind.socket
before rpcbind.service is started, and hangs up.

Ryutaroh



Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-09 Thread Ryutaroh Matsumoto
From: Michael Biebl 
Date: Sun, 9 Feb 2020 08:53:33 +0100
> Not sure what the right solution is. One might be, that the nis NSS
> module handles this situation (rpcbind.socket running, rpcbind.service
> not running) better, or that rpcbind.socket changes its ordering to
> avoid this situation.
> 
> I do think this should be fixed on the nis/rpcbind side though.

I reassigned this report to the rpcbind Debian package...

Ryutaroh



Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-08 Thread Michael Biebl
Am 09.02.20 um 02:24 schrieb Ryutaroh Matsumoto:
> Control: tags -1 + fixed - moreinfo
> Control: retitle -1 Fix found: systemd-sysusers hangs if nis is
> enabled in a systemd-nspawn container
> 
> I found a solution (or a workaround).
> 
> The problem is that
> (1) systemd-sysusers  tries to use Sun RPC (TCP connection to 127.0.0.1:111)
>   in a nis enabled container.

To be more speficic, it's not systemd-sysusers directly, but the nis NSS
module which appears to access the rpcbind socket.

> (2) at that time, rpcbind.socket is ready (by systemd), and

I guess this is actually a timinig issue and you seem to hit this
condition that rpcbind.socket is already started in the container but
not on the physical system.

> (3) rpcbind.service is not ready, because rpcbind.service depends on
>   systemd-sysusers.service.

More specifically, rpcbind.service has
After=systemd-tmpfiles-setup.service
and systemd-tmpfiles-setup.service has
After=systemd-sysusers.service

So the dependency chain is like this
rpcbind.service->systemd-tmpfiles-setup.service->systemd-sysusers.service

> (4) Then systemd-sysusers communication to 127.0.0.1:111 get stuck.
> 
> When I made
> 
> [Unit]
> Before=rpcbind.socket
> as /etc/systemd/system/systemd-sysusers.service.d/my.conf
> 
> OR
> 
> [Unit]
> After=systemd-sysusers.service
> as /etc/systemd/system/rpcbind.socket.d/my.conf
> 
> Then the problem goes away!!!
> 
> Anyway, the current dependency
> rpcbind.socket -> systemd-sysusers.service -> rpcbind.service
> looks nonsense.
> 
> The remaining problem is where the above nonsense order of
> dependency should be fixed, rpcbind Debian package or the systemd
> upstream...

Not sure what the right solution is. One might be, that the nis NSS
module handles this situation (rpcbind.socket running, rpcbind.service
not running) better, or that rpcbind.socket changes its ordering to
avoid this situation.

I do think this should be fixed on the nis/rpcbind side though.




signature.asc
Description: OpenPGP digital signature


Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-08 Thread Ryutaroh Matsumoto
Control: tags -1 + fixed - moreinfo
Control: retitle -1 Fix found: systemd-sysusers hangs if nis is
enabled in a systemd-nspawn container

I found a solution (or a workaround).

The problem is that
(1) systemd-sysusers  tries to use Sun RPC (TCP connection to 127.0.0.1:111)
  in a nis enabled container.
(2) at that time, rpcbind.socket is ready (by systemd), and
(3) rpcbind.service is not ready, because rpcbind.service depends on
  systemd-sysusers.service.
(4) Then systemd-sysusers communication to 127.0.0.1:111 get stuck.

When I made

[Unit]
Before=rpcbind.socket
as /etc/systemd/system/systemd-sysusers.service.d/my.conf

OR

[Unit]
After=systemd-sysusers.service
as /etc/systemd/system/rpcbind.socket.d/my.conf

Then the problem goes away!!!

Anyway, the current dependency
rpcbind.socket -> systemd-sysusers.service -> rpcbind.service
looks nonsense.

The remaining problem is where the above nonsense order of
dependency should be fixed, rpcbind Debian package or the systemd
upstream...

Best regards, Ryutaroh



Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-07 Thread Michael Biebl
Am 07.02.20 um 08:14 schrieb Ryutaroh Matsumoto:
>> Do you have nscd installed inside the container (looking at the strace
>> it appears you might have not).
> 
> I installed "unscd" instead of "nscd", as "unscd" is said to be less buggy.
> 
>> Is nscd installed outside of the container where you don't see the problem?
> 
> The host running container also uses "unscd".
> Its log is as below. Interestingly, I do not find TCP communication to
> the Sun RPC port 111... Why???

Last time I used NIS was 15 years ago, so I doubt I can really help you
with answering any NIS related questions.




signature.asc
Description: OpenPGP digital signature


Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Ryutaroh Matsumoto
> Do you have nscd installed inside the container (looking at the strace
> it appears you might have not).

I installed "unscd" instead of "nscd", as "unscd" is said to be less buggy.

> Is nscd installed outside of the container where you don't see the problem?

The host running container also uses "unscd".
Its log is as below. Interestingly, I do not find TCP communication to
the Sun RPC port 111... Why???
Some error message is in Japanese as the default locale is ja.

Regarding on NIS configuration, the container and the host running the
container seem almost the same to me...
The host is Ubuntu 19.10, but it may not be related to this issue...

Anyway, I have to leave my office now.
So I cannot provide futher information until Monday morning...

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutSec=10s
Environment=SYSTEMD_LOG_LEVEL=debug
ExecStart=/usr/bin/strace -T -yy -D -f /bin/systemd-sysusers


Feb 07 16:03:39 quadrop5000 strace[418]: /usr/bin/strace: Process 418 attached
Feb 07 16:03:39 quadrop5000 strace[418]: execve("/bin/systemd-sysusers", 
["/bin/systemd-sysusers"], 0x7ffe3b561190 /* 5 vars */) = 0 <0.151095>
Feb 07 16:03:39 quadrop5000 strace[418]: brk(NULL)  
 = 0x557f4a51b000 <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: arch_prctl(0x3001 /* ARCH_??? */, 
0x7fff6e4c41a0) = -1 EINVAL (無効な引数です) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: access("/etc/ld.so.preload", R_OK) 
 = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.09>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.08>
Feb 07 16:03:39 quadrop5000 strace[418]: 
stat("/lib/systemd/tls/haswell/x86_64", 0x7fff6e4c33f0) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls/haswell", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls/x86_64", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/tls", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/haswell/x86_64", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/haswell", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd/x86_64", 
0x7fff6e4c33f0) = -1 ENOENT (そのようなファイルやディレクトリはありません) <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/systemd/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(そのようなファイルやディレクトリはありません) <0.07>
Feb 07 16:03:39 quadrop5000 strace[418]: stat("/lib/systemd", 
{st_mode=S_IFDIR|0755, st_size=2148, ...}) = 0 <0.06>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, "/etc/ld.so.cache", 
O_RDONLY|O_CLOEXEC) = 3 <0.06>
Feb 07 16:03:39 quadrop5000 strace[418]: fstat(3, 
{st_mode=S_IFREG|0644, st_size=67652, ...}) = 0 <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: mmap(NULL, 67652, PROT_READ, 
MAP_PRIVATE, 3, 0) = 0x7f30e04eb000 <0.06>
Feb 07 16:03:39 quadrop5000 strace[418]: close(3) 
 = 0 <0.04>
Feb 07 16:03:39 quadrop5000 strace[418]: openat(AT_FDCWD, 
"/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 
3 <0.07>
Feb 07 16:03:39 quadrop5000 strace[418]: 
read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360r\2\0\0\0\0\0"..., 832) = 
832 <0.06>
Feb 07 16:03:39 quadrop5000 strace[418]: 
lseek(3, 64, SEEK_SET) = 64 <0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: 
read(3, 
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784) = 784 
<0.05>
Feb 07 16:03:39 quadrop5000 strace[418]: 
lseek(3, 848, SEEK_SET) = 848 <0.04>
Feb 07 16:03:39 quadrop5000 strace[418]: 
read(3, 

Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Ryutaroh Matsumoto
The container is started as
systemd-nspawn -M bullseye --network-macvlan=eno1 -b

The option --network-macvlan=eno1 is necessary so that the container
can talk with the NIS server running on a third computer (not a container
nor the host running the container).

Ryutaroh

From: Ryutaroh Matsumoto 
Subject: Re: Bug#950822: systemd-sysusers hangs if nis is enabled in a 
systemd-nspawn container
Date: Fri, 07 Feb 2020 15:54:54 +0900 (JST)

> Hi Michael,
> Thank you for paying attention to this.



Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Michael Biebl
Do you have nscd installed inside the container (looking at the strace
it appears you might have not).
Does it help if you install nscd?
Is nscd installed outside of the container where you don't see the problem?



signature.asc
Description: OpenPGP digital signature


Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Ryutaroh Matsumoto
Hi Michael,
Thank you for paying attention to this.

> Do you have users/groups defined in
> /usr/lib/sysusers.d/ or /etc/sysusers.d which are only resolvable via NIS?

The above directories are untouched. The container was made by
mmdebstrap --variant=debootstrap bullseye.
/etc/passwd nor /etc/group is not modified manually except changing root 
password.
No new user or group is added manually.

> Can you run
> SYSTEMD_LOG_LEVEL=debug /bin/systemd-sysusers
> inside the container
> If that hangs as well, please attach the output.
> An strace might be helpful as well.

I made the following /etc/systemd/system/systemd-sysusers.service

[Unit]
Description=Create System Users
Documentation=man:sysusers.d(5) man:systemd-sysusers.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=systemd-remount-fs.service
Before=sysinit.target shutdown.target systemd-update-done.service
ConditionNeedsUpdate=/etc

[Service]
Type=oneshot
RemainAfterExit=yes
Environment=SYSTEMD_LOG_LEVEL=debug
ExecStart=/usr/bin/strace -T -yy -D -f /bin/systemd-sysusers
TimeoutSec=10s

/etc/nsswitch.conf is as follows:
passwd: files systemd nis
group:  files systemd nis
shadow: files nis
gshadow:files nis

hosts:  files dns
networks:   files

protocols:  db files
services:   db files
ethers: db files
rpc:db files

netgroup:   nis


After logging in as root, journalctl -u systemd-sysusers shows the following.
It seems that systemd-sysusers tries to write 127.0.0.1:111 (Sun RPC),
and gets stuck. It is clear that no rpcbind.service or nis is running when
systemd-sysusers is called by systemd. I have no idea what is "wrong".

-- Reboot --
Feb 07 15:38:54 bullseye strace[23]: /usr/bin/strace: Process 19 attached
Feb 07 15:38:54 bullseye strace[23]: execve("/bin/systemd-sysusers", 
["/bin/systemd-sysusers"], 0x7ffee6748880 /* 5 vars */) = 0 <0.000155>
Feb 07 15:38:54 bullseye strace[23]: brk(NULL)   = 
0x5574832f <0.08>
Feb 07 15:38:54 bullseye strace[23]: access("/etc/ld.so.preload", R_OK)  = 
-1 ENOENT (No such file or directory) <0.10>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(No such file or directory) <0.09>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/haswell/x86_64", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/haswell", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls/x86_64", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or 
directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/tls", 0x7ffc6405e590) = 
-1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No 
such file or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/haswell/x86_64", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file 
or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/haswell", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 
"/lib/systemd/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file 
or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd/x86_64", 
0x7ffc6405e590) = -1 ENOENT (No such file or directory) <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/lib/systemd/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) <0.08>
Feb 07 15:38:54 bullseye strace[23]: stat("/lib/systemd", 
{st_mode=S_IFDIR|0755, st_size=1828, ...}) = 0 <0.08>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, "/etc/ld.so.cache", 
O_RDONLY|O_CLOEXEC) = 3 <0.08>
Feb 07 15:38:54 bullseye strace[23]: fstat(3, 
{st_mode=S_IFREG|0644, st_size=12490, ...}) = 0 <0.07>
Feb 07 15:38:54 bullseye strace[23]: mmap(NULL, 12490, PROT_READ, MAP_PRIVATE, 
3, 0) = 0x7fd6e7a7b000 <0.08>
Feb 07 15:38:54 bullseye strace[23]: close(3)  = 
0 <0.07>
Feb 07 15:38:54 bullseye strace[23]: openat(AT_FDCWD, 

Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Michael Biebl
Control: tags -1 + moreinfo

Hi!

Am 07.02.20 um 02:55 schrieb Ryutaroh Matsumoto:
> * the above hang-up does not happen if "nis" is removed from 
> /etc/nsswitch.conf
> * the above hang-up is NOT observed outside a container.
> * I have no idea if it is an upstream issue.
> * Both host and container run only NIS clients, and NIS server is running in 
> another host.
> * NIS related Debian packages are as follows:

Can you run
SYSTEMD_LOG_LEVEL=debug /bin/systemd-sysusers
inside the container
If that hangs as well, please attach the output.
An strace might be helpful as well.




signature.asc
Description: OpenPGP digital signature


Bug#950822: systemd-sysusers hangs if nis is enabled in a systemd-nspawn container

2020-02-06 Thread Michael Biebl
One more question:
Do you have users/groups defined in
/usr/lib/sysusers.d/ or /etc/sysusers.d which are only resolvable via NIS?



signature.asc
Description: OpenPGP digital signature