The pve-common path of this patch set should be straight forward: minor additions to ProcFSTools and Tools, as well as the new mount api constants added to Syscall.pm.
The container part then makes use of the new mount api in case the currently running kernel supports it. The hope for the future would be to simplify the code a bit once we can stop supporting kernels older than 5.2. For now, it starts with the ability to stage a mount point, and then moves on to changing the startup process to use this. Previously, the startup goes through the mount points in order and mounts them directly at the target location. This is prone to symlink attacks (especially when using nested shared bind mounts). When staging a mount in a fixed directory first, we can pick it up afterwards with the new `open_tree()` syscall, and move it in place with the new `move_mount()` syscall, which can work relative to directory file descriptors and has flags for whether or not the paths are allowed to follow symlinks. (In the future this can be hardened even more using `openat2()` using the container's root directory as "implicit chroot" while looking up the target directory and then issuing a `move_mount()` right onto the resulting path file descriptor via `MOVE_MOUNT_T_EMPTY_PATH`.) The main advantage of the new API however, is that we can pick up the mounts as file descriptors, then switch into the running container's mount namespace and `move_mount()` the mount point in place, without having to rely on an existing MS_SHARED mount point "hack". Hence the final patch adds support for mount point hotplugging - but only hotplug, not un-plug, since unmounting has a lot of issues (open file descriptors, unshared MS_PRIVATE mount namespaces referencing the mount (as well as those namespaces opened as file descriptors...), mounts having been moved (if they were previously hotplugged at least), ...). Wolfgang Bumiller (8): implement "staged mountpoints" add open_pid_fd, open_lxc_pid, open_ppid helpers split open_namespace out of enter_namespace add get_container_namespace helper add mount stage directory helpers prestart-hook: use staged mountpoints on newer kernels config: vmconfig_apply_pending_mountpoint helper implement mountpoint hotplugging src/PVE/LXC.pm | 183 ++++++++++++++++++++++++++++++++++++-- src/PVE/LXC/Config.pm | 87 ++++++++++++------ src/lxc-pve-prestart-hook | 79 +++++++++++++--- 3 files changed, 304 insertions(+), 45 deletions(-) -- 2.20.1 _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel