The pve-common path of this patch set should be straight forward:
minor additions to ProcFSTools and Tools, as well as the new mount api
constants added to Syscall.pm.

The container part then makes use of the new mount api in case the
currently running kernel supports it. The hope for the future would be
to simplify the code a bit once we can stop supporting kernels older
than 5.2.
For now, it starts with the ability to stage a mount point, and then
moves on to changing the startup process to use this.
Previously, the startup goes through the mount points in order and
mounts them directly at the target location. This is prone to symlink
attacks (especially when using nested shared bind mounts).
When staging a mount in a fixed directory first, we can pick it up
afterwards with the new `open_tree()` syscall, and move it in place with
the new `move_mount()` syscall, which can work relative to directory
file descriptors and has flags for whether or not the paths are allowed
to follow symlinks. (In the future this can be hardened even more using
`openat2()` using the container's root directory as "implicit chroot"
while looking up the target directory and then issuing a `move_mount()`
right onto the resulting path file descriptor via
`MOVE_MOUNT_T_EMPTY_PATH`.)

The main advantage of the new API however, is that we can pick up the
mounts as file descriptors, then switch into the running container's
mount namespace and `move_mount()` the mount point in place, without
having to rely on an existing MS_SHARED mount point "hack". Hence the
final patch adds support for mount point hotplugging - but only hotplug,
not un-plug, since unmounting has a lot of issues (open file
descriptors, unshared MS_PRIVATE mount namespaces referencing the mount
(as well as those namespaces opened as file descriptors...), mounts
having been moved (if they were previously hotplugged at least), ...).

Wolfgang Bumiller (8):
  implement "staged mountpoints"
  add open_pid_fd, open_lxc_pid, open_ppid helpers
  split open_namespace out of enter_namespace
  add get_container_namespace helper
  add mount stage directory helpers
  prestart-hook: use staged mountpoints on newer kernels
  config: vmconfig_apply_pending_mountpoint helper
  implement mountpoint hotplugging

 src/PVE/LXC.pm            | 183 ++++++++++++++++++++++++++++++++++++--
 src/PVE/LXC/Config.pm     |  87 ++++++++++++------
 src/lxc-pve-prestart-hook |  79 +++++++++++++---
 3 files changed, 304 insertions(+), 45 deletions(-)

-- 
2.20.1


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to