"Michael Kerrisk (man-pages)" <mtk.manpa...@gmail.com> writes:
> Hello Eric, > > A ping on my question below. Could you take a look please? > > Thanks, > > Michael > >>>>> The concern from our conversation at the container mini-summit was that >>>>> there is a pathology if in your initial mount namespace all of the >>>>> mounts are marked MS_SHARED like systemd does (and is almost necessary >>>>> if you are going to use mount propagation), that if new_root itself >>>>> is MS_SHARED then unmounting the old_root could propagate. >>>>> >>>>> So I believe the desired sequence is: >>>>> >>>>>>>> chdir(new_root); >>>>> +++ mount("", ".", MS_SLAVE | MS_REC, NULL); >>>>>>>> pivot_root(".", "."); >>>>>>>> umount2(".", MNT_DETACH); >>>>> >>>>> The change to new new_root could be either MS_SLAVE or MS_PRIVATE. So >>>>> long as it is not MS_SHARED the mount won't propagate back to the >>>>> parent mount namespace. >>>> >>>> Thanks. I made that change. >>> >>> For what it is worth. The sequence above without the change in mount >>> attributes will fail if it is necessary to change the mount attributes >>> as "." is both put_old as well as new_root. >>> >>> When I initially suggested the change I saw "." was new_root and forgot >>> "." was also put_old. So I thought there was a silent danger without >>> that sequence. >> >> So, now I am a little confused by the comments you added here. Do you >> now mean that the >> >> mount("", ".", MS_SLAVE | MS_REC, NULL); >> >> call is not actually necessary? Apologies for being slow getting back to you. To my knowledge there are two cases where pivot_root is used. - In the initial mount namespace from a ramdisk when mounting root. This is the original use case and somewhat historical as rootfs (aka an initial ramfs) may not be unmounted. - When setting up a new mount namespace to jettison all of the mounts you don't need. The sequence: chdir(new_root); pivot_root(".", "."); umount2(".", MNT_DETACH); is perfect for both use cases (as nothing needs to be known about the directory layout of the new root filesystem). In the case when you are setting up a new mount namespace propogating changes in the mount layout to another mount namespace is fatal. But that is not a concern for using that pivot_root sequence above because pivot_root will fail deterministically if 'mount("", ".", MS_SLAVE | MS_REC, NULL)' is needed but not specified. So I would document the above sequence of three system calls in the man-page. I would document that pivot_root will fail if propagation would occur. I would document in pivot_root or under unshare(CLONE_NEWNS) that if mount propagation is enabled (the default with systemd) that you need to call 'mount("", "/", MS_SLAVE | MS_REC, NULL);' or 'mount("", "/", MS_PRIVATE | MS_REC, NULL);' after creating a mount namespace. Or mounts will propagate backwards, which is usually not what people want. Creating of a mount namespace in a user namespace automatically does 'mount("", "/", MS_SLAVE | MS_REC, NULL);' if the starting mount namespace was not created in that user namespace. AKA creating a mount namespace in a user namespace does the unshare for you. Eric