Re: [systemd-devel] systemd-nspawn with filesystem id mapping

2021-06-08 Thread Lennart Poettering
On Fr, 04.06.21 14:53, systemd-de...@notandy.de (systemd-de...@notandy.de) 
wrote:

> Hi again,
>
> after some more debugging this EOVERFLOW seems to be the result of a call to 
> may_o_create in fs/namei.c in the kernel.
> There is a check:
>
> if (!fsuidgid_has_mapping(dir->dentry->d_sb, mnt_userns))
>   return -EOVERFLOW;
>
> This seems to be the one returning EOVERFLOW to nspawn and resulting in the 
> container spawn to fail.
> My guess would be that this is a systemd bug when combining filesystem id 
> mapping with --bind.
> Before I start spending more time debugging this, has anyone so far used 
> --bind with --private-users=pick and --private-users-ownership=map 
> successfull?
>
> As far as I understand the pull request #19438 , didn't add any handling to 
> the mount_bind function. Was this maybe overlooked?
> In my understanding there is a remount_idmap missing in that function well as 
> the touch needs to be done in the correct user namespace or with mapped 
> uid/gids.
>
> I'm new to the systemd source code, could somebody confirm that I'm on the 
> right track there and not heading in the wrong direction?

Let's follow up on the PR, it's the better place to development
discussions on specific bugs or problems. I replied on it the other
day.


Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn with filesystem id mapping

2021-06-04 Thread systemd-devel
Hi again,

after some more debugging this EOVERFLOW seems to be the result of a call to 
may_o_create in fs/namei.c in the kernel.
There is a check:

if (!fsuidgid_has_mapping(dir->dentry->d_sb, mnt_userns))
return -EOVERFLOW;

This seems to be the one returning EOVERFLOW to nspawn and resulting in the 
container spawn to fail.
My guess would be that this is a systemd bug when combining filesystem id 
mapping with --bind.
Before I start spending more time debugging this, has anyone so far used --bind 
with --private-users=pick and --private-users-ownership=map successfull?

As far as I understand the pull request #19438 , didn't add any handling to the 
mount_bind function. Was this maybe overlooked?
In my understanding there is a remount_idmap missing in that function well as 
the touch needs to be done in the correct user namespace or with mapped 
uid/gids.

I'm new to the systemd source code, could somebody confirm that I'm on the 
right track there and not heading in the wrong direction?

Thanks,
nd



OpenPGP_signature
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-nspawn with filesystem id mapping

2021-05-30 Thread systemd-devel
Hi!

I was very pleased to see the "nspawn: add support for kernel 5.12 ID mapping 
mounts #19438"-pull request and went right at it to try it out.
The following was tested on the current git head of systemd running on 
archlinux.

What I try to achieve on a high level is kind of emulating bubblewrap and 
executing chromium under wayland with gpu acceleration and working audio using 
PipeWire.
For that I need to pass some sockets and devices to the container using 
--bind-ro . I want to use --private-users=pick to have easier separation 
between multiple Containers.
That means I do not know the running uid of the process before nspawn spawns my 
container. That results on problems accessing the sockets.
Until now I used setfacl to work around this limitation and allow access to the 
sockets.
I was hoping to be able to skip that with --private-users-ownership=map .

I'm passing three sockets belonging to uid 1000 on the host to a container with 
private-users=pick and and try to access it via uid 1000 (name "user") in the 
container.
Everything is happening on an ext4 file system. I'd prefer btrfs but that is 
(so far) lacking id mapping support.
The full call looks like that:

statepath="/machines/state/chromium/${profilename}"
systemd-nspawn \
-D /machines/images/archlinux-chromium/ \
--private-users=pick \
--private-users-ownership=map \
--no-new-privileges=yes \
--as-pid2 \
--machine "chromium-${profilename}" \
--user user \
--bind-ro /var/run/user/1000/pulse/native:/sockets/pulse/native \
--bind-ro /var/run/user/1000/wayland-1:/sockets/wayland-1 \
--bind-ro /var/run/user/1000/pipewire-0:/sockets/pipewire-0 \
--bind "${statepath}:/home/user" \
--bind /dev/dri/renderD128 \
-E WAYLAND_DISPLAY=wayland-1 \
-E XDG_RUNTIME_DIR=/sockets \
chromium --enable-features=UseOzonePlatform --ozone-platform=wayland

This results in the following output:

Spawning container chromium-default on /machines/images/archlinux-chromium.
Press ^] three times within 1s to kill container.
Selected user namespace base 552206336 and range 65536.
Failed to create mount point 
/machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for 
defined data type

I've run strace on it, this results in the following relevant output:

[pid   524] mount("/machines/state/chromium/default", "/proc/self/fd/8", NULL, 
MS_BIND|MS_REC, NULL) = 0
[pid   524] close(8)= 0
[pid   524] newfstatat(AT_FDCWD, "/var/run/user/1000/pipewire-0", 
{st_mode=S_IFSOCK|0666, st_size=0, ...}, 0) = 0
[pid   524] openat(AT_FDCWD, "/machines/images/archlinux-chromium", 
O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 8
[pid   524] openat(8, "sockets", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 10
[pid   524] newfstatat(10, "", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 
AT_EMPTY_PATH) = 0
[pid   524] close(8)= 0
[pid   524] openat(10, "pipewire-0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 
ENOENT (No such file or directory
)
[pid   524] close(10)   = 0
[pid   524] newfstatat(AT_FDCWD, "/machines/images/archlinux-chromium/sockets", 
{st_mode=S_IFDIR|0700, st_size=40
96, ...}, 0) = 0
[pid   524] openat(AT_FDCWD, 
"/machines/images/archlinux-chromium/sockets/pipewire-0", 
O_RDONLY|O_NOFOLLOW|O_CLOE
XEC|O_PATH) = -1 ENOENT (No such file or directory)
[pid   524] openat(AT_FDCWD, 
"/machines/images/archlinux-chromium/sockets/pipewire-0", 
O_WRONLY|O_CREAT|O_EXCL|O_
CLOEXEC, 0644) = -1 EOVERFLOW (Value too large for defined data type)
[pid   524] writev(2, [{iov_base="Failed to create mount point /ma"..., 
iov_len=122}, {iov_base="\n", iov_len=1}]
, 2Failed to create mount point 
/machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for 
defin
ed data type
) = 123

This maps to the touch in nspawn-mount.c at line 754.
If I skip the --bind(-ro) part this works fine (except chromium of course not 
working), same if I keep the binds and remove the --private-users-ownership=map.
I'm kind of lost on how to go on about this issue at this point.
Have I made a mistake or wrong assumption about how that should work?
Should I open an issue on github about that?

Thanks,
nd
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel