Re: Documenting the ioctl interfaces to discover relationships between namespaces
On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote: > [was: [PATCH 0/4 v3] Add an interface to discover relationships > between namespaces] > > Hello Andrei > > See below for my attempt to document the following. Hi Michael, Eric already did my work:). I have read this documentation and it looks good for me. I have nothing to add to Eric's comments. Thanks, Andrei > > On 6 September 2016 at 09:47, Andrei Vagin wrote: > > From: Andrey Vagin > > > > Each namespace has an owning user namespace and now there is not way > > to discover these relationships. > > > > Pid and user namepaces are hierarchical. There is no way to discover > > parent-child relationships too. > > > > Why we may want to know relationships between namespaces? > > > > One use would be visualization, in order to understand the running > > system. Another would be to answer the question: what capability does > > process X have to perform operations on a resource governed by namespace > > Y? > > > > One more use-case (which usually called abnormal) is checkpoint/restart. > > In CRIU we are going to dump and restore nested namespaces. > > > > There [1] was a discussion about which interface to choose to determing > > relationships between namespaces. > > > > Eric suggested to add two ioctl-s [2]: > >> Grumble, Grumble. I think this may actually a case for creating ioctls > >> for these two cases. Now that random nsfs file descriptors are bind > >> mountable the original reason for using proc files is not as pressing. > >> > >> One ioctl for the user namespace that owns a file descriptor. > >> One ioctl for the parent namespace of a namespace file descriptor. > > > > Here is an implementaions of these ioctl-s. > > > > $ man man7/namespaces.7 > > ... > > Since Linux 4.X, the following ioctl(2) calls are supported for > > namespace file descriptors. The correct syntax is: > > > > fd = ioctl(ns_fd, ioctl_type); > > > > where ioctl_type is one of the following: > > > > NS_GET_USERNS > > Returns a file descriptor that refers to an owning user names‐ > > pace. > > > > NS_GET_PARENT > > Returns a file descriptor that refers to a parent namespace. > > This ioctl(2) can be used for pid and user namespaces. For > > user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same > > meaning. > > > > In addition to generic ioctl(2) errors, the following specific ones > > can occur: > > > > EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. > > > > EPERM The requested namespace is outside of the current namespace > > scope. > > > > [1] https://lkml.org/lkml/2016/7/6/158 > > [2] https://lkml.org/lkml/2016/7/9/101 > > The following is the text I propose to add to the namespaces(7) page. > Could you please review and let me know of corrections and > improvements. > > Thanks, > > Michael > > >Introspecting namespace relationships >Since Linux 4.9, two ioctl(2) operations are provided to allow >introspection of namespace relationships (see user_namespaces(7) >and pid_namespaces(7)). The form of the calls is: > >ioctl(fd, request); > >In each case, fd refers to a /proc/[pid]/ns/* file. > >NS_GET_USERNS > Returns a file descriptor that refers to the owning user > namespace for the namespace referred to by fd. > >NS_GET_PARENT > Returns a file descriptor that refers to the parent names‐ > pace of the namespace referred to by fd. This operation is > valid only for hierarchical namespaces (i.e., PID and user > namespaces). For user namespaces, NS_GET_PARENT is synony‐ > mous with NS_GET_USERNS. > >In each case, the returned file descriptor is opened with O_RDONLY >and O_CLOEXEC (close-on-exec). > >By applying fstat(2) to the returned file descriptor, one obtains >a stat structure whose st_ino (inode number) field identifies the >owning/parent namespace. This inode number can be matched with >the inode number of another /proc/[pid]/ns/{pid,user} file to >determine whether that is the owning/parent namespace. > >Either of these ioctl(2) operations can fail with the following >error: > >EPERM The requested namespace is outside of the caller's names‐ > pace scope. This error can occur if, for example, the own‐ > ing user namespace is an ancestor of the caller's current > user namespace. It can also occur on attempts to obtain > the parent of the initial user or PID namespace. > >Additionally, the NS_GET_PARENT operation can fail with the fol‐ >lowing error: > >EINVAL fd refers to a nonhierarchical namespace. > >See the EXAMPLE section for an example of the use of these opera‐
Re: Documenting the ioctl interfaces to discover relationships between namespaces
On 12/15/2016 01:46 AM, Andrei Vagin wrote: > On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote: >> [was: [PATCH 0/4 v3] Add an interface to discover relationships >> between namespaces] >> >> Hello Andrei >> >> See below for my attempt to document the following. > > Hi Michael, > > Eric already did my work:). I have read this documentation and it looks > good for me. I have nothing to add to Eric's comments. Thanks, Andrei! Cheers, Michael >> >> On 6 September 2016 at 09:47, Andrei Vagin wrote: >>> From: Andrey Vagin >>> >>> Each namespace has an owning user namespace and now there is not way >>> to discover these relationships. >>> >>> Pid and user namepaces are hierarchical. There is no way to discover >>> parent-child relationships too. >>> >>> Why we may want to know relationships between namespaces? >>> >>> One use would be visualization, in order to understand the running >>> system. Another would be to answer the question: what capability does >>> process X have to perform operations on a resource governed by namespace >>> Y? >>> >>> One more use-case (which usually called abnormal) is checkpoint/restart. >>> In CRIU we are going to dump and restore nested namespaces. >>> >>> There [1] was a discussion about which interface to choose to determing >>> relationships between namespaces. >>> >>> Eric suggested to add two ioctl-s [2]: Grumble, Grumble. I think this may actually a case for creating ioctls for these two cases. Now that random nsfs file descriptors are bind mountable the original reason for using proc files is not as pressing. One ioctl for the user namespace that owns a file descriptor. One ioctl for the parent namespace of a namespace file descriptor. >>> >>> Here is an implementaions of these ioctl-s. >>> >>> $ man man7/namespaces.7 >>> ... >>> Since Linux 4.X, the following ioctl(2) calls are supported for >>> namespace file descriptors. The correct syntax is: >>> >>> fd = ioctl(ns_fd, ioctl_type); >>> >>> where ioctl_type is one of the following: >>> >>> NS_GET_USERNS >>> Returns a file descriptor that refers to an owning user names‐ >>> pace. >>> >>> NS_GET_PARENT >>> Returns a file descriptor that refers to a parent namespace. >>> This ioctl(2) can be used for pid and user namespaces. For >>> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same >>> meaning. >>> >>> In addition to generic ioctl(2) errors, the following specific ones >>> can occur: >>> >>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. >>> >>> EPERM The requested namespace is outside of the current namespace >>> scope. >>> >>> [1] https://lkml.org/lkml/2016/7/6/158 >>> [2] https://lkml.org/lkml/2016/7/9/101 >> >> The following is the text I propose to add to the namespaces(7) page. >> Could you please review and let me know of corrections and >> improvements. >> >> Thanks, >> >> Michael >> >> >>Introspecting namespace relationships >>Since Linux 4.9, two ioctl(2) operations are provided to allow >>introspection of namespace relationships (see user_namespaces(7) >>and pid_namespaces(7)). The form of the calls is: >> >>ioctl(fd, request); >> >>In each case, fd refers to a /proc/[pid]/ns/* file. >> >>NS_GET_USERNS >> Returns a file descriptor that refers to the owning user >> namespace for the namespace referred to by fd. >> >>NS_GET_PARENT >> Returns a file descriptor that refers to the parent names‐ >> pace of the namespace referred to by fd. This operation is >> valid only for hierarchical namespaces (i.e., PID and user >> namespaces). For user namespaces, NS_GET_PARENT is synony‐ >> mous with NS_GET_USERNS. >> >>In each case, the returned file descriptor is opened with O_RDONLY >>and O_CLOEXEC (close-on-exec). >> >>By applying fstat(2) to the returned file descriptor, one obtains >>a stat structure whose st_ino (inode number) field identifies the >>owning/parent namespace. This inode number can be matched with >>the inode number of another /proc/[pid]/ns/{pid,user} file to >>determine whether that is the owning/parent namespace. >> >>Either of these ioctl(2) operations can fail with the following >>error: >> >>EPERM The requested namespace is outside of the caller's names‐ >> pace scope. This error can occur if, for example, the own‐ >> ing user namespace is an ancestor of the caller's current >> user namespace. It can also occur on attempts to obtain >> the parent of the initial user or PID namespace. >> >>Additionally, the NS_GET_PARENT operation can fail with the fol‐ >>lowing error: >> >>EINVAL fd r
Re: Documenting the ioctl interfaces to discover relationships between namespaces
On 12/12/2016 07:18 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> On 12/11/2016 11:30 PM, Eric W. Biederman wrote: >>> "Michael Kerrisk (man-pages)" writes: >>> [was: [PATCH 0/4 v3] Add an interface to discover relationships between namespaces] >>> >>> One small comment below. >>> Introspecting namespace relationships Since Linux 4.9, two ioctl(2) operations are provided to allow introspection of namespace relationships (see user_namespaces(7) and pid_namespaces(7)). The form of the calls is: ioctl(fd, request); In each case, fd refers to a /proc/[pid]/ns/* file. NS_GET_USERNS Returns a file descriptor that refers to the owning user namespace for the namespace referred to by fd. NS_GET_PARENT Returns a file descriptor that refers to the parent names‐ pace of the namespace referred to by fd. This operation is valid only for hierarchical namespaces (i.e., PID and user namespaces). For user namespaces, NS_GET_PARENT is synony‐ mous with NS_GET_USERNS. In each case, the returned file descriptor is opened with O_RDONLY and O_CLOEXEC (close-on-exec). By applying fstat(2) to the returned file descriptor, one obtains a stat structure whose st_ino (inode number) field identifies the owning/parent namespace. This inode number can be matched with the inode number of another /proc/[pid]/ns/{pid,user} file to determine whether that is the owning/parent namespace. >>> >>> Like all fstat inode comparisons to be fully accurate you need to >>> compare both the st_ino and st_dev. I reserve the right for st_dev to >>> be significant when comparing namespaces. Otherwise I might have to >>> create a namespace of namespaces someday and that is ugly. >>> Either of these ioctl(2) operations can fail with the following error: EPERM The requested namespace is outside of the caller's names‐ pace scope. This error can occur if, for example, the own‐ ing user namespace is an ancestor of the caller's current user namespace. It can also occur on attempts to obtain the parent of the initial user or PID namespace. Additionally, the NS_GET_PARENT operation can fail with the fol‐ lowing error: EINVAL fd refers to a nonhierarchical namespace. See the EXAMPLE section for an example of the use of these opera‐ tions. >> >> So, after playing with this a bit, I have a question. >> >> I gather that in order to, for example, elaborate the tree of user >> namespaces on the system, one would use NS_GET_PARENT on each of >> the /proc/*/ns/user files and match up the results. Right? >> >> What happens if one of the parent user namespaces contains no >> processes? That is, the parent namespace exists by virtue of being >> pinned because a proc/PID/ns/user file is open or bind mounted. >> (Chrome seems to do this sort of dance with user namespaces, for >> example.) How do we find the ancestor of *that* user namespace? > > What is returned from NS_GET_USERNS and NS_GET_PARENT is a file > descriptor, that you can call NS_GET_PARENT on. Thanks, Eric. While trying to solve the small task I set myself, and probably confused by past discussions[1], I was overlooking the obvious. Cheers, Michael [1] https://lkml.org/lkml/2016/7/28/365 -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Documenting the ioctl interfaces to discover relationships between namespaces
"Michael Kerrisk (man-pages)" writes: > On 12/11/2016 11:30 PM, Eric W. Biederman wrote: >> "Michael Kerrisk (man-pages)" writes: >> >>> [was: [PATCH 0/4 v3] Add an interface to discover relationships >>> between namespaces] >> >> One small comment below. >> >>> >>>Introspecting namespace relationships >>>Since Linux 4.9, two ioctl(2) operations are provided to allow >>>introspection of namespace relationships (see user_namespaces(7) >>>and pid_namespaces(7)). The form of the calls is: >>> >>>ioctl(fd, request); >>> >>>In each case, fd refers to a /proc/[pid]/ns/* file. >>> >>>NS_GET_USERNS >>> Returns a file descriptor that refers to the owning user >>> namespace for the namespace referred to by fd. >>> >>>NS_GET_PARENT >>> Returns a file descriptor that refers to the parent names‐ >>> pace of the namespace referred to by fd. This operation is >>> valid only for hierarchical namespaces (i.e., PID and user >>> namespaces). For user namespaces, NS_GET_PARENT is synony‐ >>> mous with NS_GET_USERNS. >>> >>>In each case, the returned file descriptor is opened with O_RDONLY >>>and O_CLOEXEC (close-on-exec). >>> >>>By applying fstat(2) to the returned file descriptor, one obtains >>>a stat structure whose st_ino (inode number) field identifies the >>>owning/parent namespace. This inode number can be matched with >>>the inode number of another /proc/[pid]/ns/{pid,user} file to >>>determine whether that is the owning/parent namespace. >> >> Like all fstat inode comparisons to be fully accurate you need to >> compare both the st_ino and st_dev. I reserve the right for st_dev to >> be significant when comparing namespaces. Otherwise I might have to >> create a namespace of namespaces someday and that is ugly. >> >>>Either of these ioctl(2) operations can fail with the following >>>error: >>> >>>EPERM The requested namespace is outside of the caller's names‐ >>> pace scope. This error can occur if, for example, the own‐ >>> ing user namespace is an ancestor of the caller's current >>> user namespace. It can also occur on attempts to obtain >>> the parent of the initial user or PID namespace. >>> >>>Additionally, the NS_GET_PARENT operation can fail with the fol‐ >>>lowing error: >>> >>>EINVAL fd refers to a nonhierarchical namespace. >>> >>>See the EXAMPLE section for an example of the use of these opera‐ >>>tions. > > So, after playing with this a bit, I have a question. > > I gather that in order to, for example, elaborate the tree of user > namespaces on the system, one would use NS_GET_PARENT on each of > the /proc/*/ns/user files and match up the results. Right? > > What happens if one of the parent user namespaces contains no > processes? That is, the parent namespace exists by virtue of being > pinned because a proc/PID/ns/user file is open or bind mounted. > (Chrome seems to do this sort of dance with user namespaces, for > example.) How do we find the ancestor of *that* user namespace? What is returned from NS_GET_USERNS and NS_GET_PARENT is a file descriptor, that you can call NS_GET_PARENT on. Eric
Re: Documenting the ioctl interfaces to discover relationships between namespaces
On 12/11/2016 11:30 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> [was: [PATCH 0/4 v3] Add an interface to discover relationships >> between namespaces] > > One small comment below. > >> >>Introspecting namespace relationships >>Since Linux 4.9, two ioctl(2) operations are provided to allow >>introspection of namespace relationships (see user_namespaces(7) >>and pid_namespaces(7)). The form of the calls is: >> >>ioctl(fd, request); >> >>In each case, fd refers to a /proc/[pid]/ns/* file. >> >>NS_GET_USERNS >> Returns a file descriptor that refers to the owning user >> namespace for the namespace referred to by fd. >> >>NS_GET_PARENT >> Returns a file descriptor that refers to the parent names‐ >> pace of the namespace referred to by fd. This operation is >> valid only for hierarchical namespaces (i.e., PID and user >> namespaces). For user namespaces, NS_GET_PARENT is synony‐ >> mous with NS_GET_USERNS. >> >>In each case, the returned file descriptor is opened with O_RDONLY >>and O_CLOEXEC (close-on-exec). >> >>By applying fstat(2) to the returned file descriptor, one obtains >>a stat structure whose st_ino (inode number) field identifies the >>owning/parent namespace. This inode number can be matched with >>the inode number of another /proc/[pid]/ns/{pid,user} file to >>determine whether that is the owning/parent namespace. > > Like all fstat inode comparisons to be fully accurate you need to > compare both the st_ino and st_dev. I reserve the right for st_dev to > be significant when comparing namespaces. Otherwise I might have to > create a namespace of namespaces someday and that is ugly. > >>Either of these ioctl(2) operations can fail with the following >>error: >> >>EPERM The requested namespace is outside of the caller's names‐ >> pace scope. This error can occur if, for example, the own‐ >> ing user namespace is an ancestor of the caller's current >> user namespace. It can also occur on attempts to obtain >> the parent of the initial user or PID namespace. >> >>Additionally, the NS_GET_PARENT operation can fail with the fol‐ >>lowing error: >> >>EINVAL fd refers to a nonhierarchical namespace. >> >>See the EXAMPLE section for an example of the use of these opera‐ >>tions. So, after playing with this a bit, I have a question. I gather that in order to, for example, elaborate the tree of user namespaces on the system, one would use NS_GET_PARENT on each of the /proc/*/ns/user files and match up the results. Right? What happens if one of the parent user namespaces contains no processes? That is, the parent namespace exists by virtue of being pinned because a proc/PID/ns/user file is open or bind mounted. (Chrome seems to do this sort of dance with user namespaces, for example.) How do we find the ancestor of *that* user namespace? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Documenting the ioctl interfaces to discover relationships between namespaces
[Fixing Serge's address in my original CC] On 12/11/2016 11:30 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > >> [was: [PATCH 0/4 v3] Add an interface to discover relationships >> between namespaces] > > One small comment below. > >> >>Introspecting namespace relationships >>Since Linux 4.9, two ioctl(2) operations are provided to allow >>introspection of namespace relationships (see user_namespaces(7) >>and pid_namespaces(7)). The form of the calls is: >> >>ioctl(fd, request); >> >>In each case, fd refers to a /proc/[pid]/ns/* file. >> >>NS_GET_USERNS >> Returns a file descriptor that refers to the owning user >> namespace for the namespace referred to by fd. >> >>NS_GET_PARENT >> Returns a file descriptor that refers to the parent names‐ >> pace of the namespace referred to by fd. This operation is >> valid only for hierarchical namespaces (i.e., PID and user >> namespaces). For user namespaces, NS_GET_PARENT is synony‐ >> mous with NS_GET_USERNS. >> >>In each case, the returned file descriptor is opened with O_RDONLY >>and O_CLOEXEC (close-on-exec). >> >>By applying fstat(2) to the returned file descriptor, one obtains >>a stat structure whose st_ino (inode number) field identifies the >>owning/parent namespace. This inode number can be matched with >>the inode number of another /proc/[pid]/ns/{pid,user} file to >>determine whether that is the owning/parent namespace. > > Like all fstat inode comparisons to be fully accurate you need to > compare both the st_ino and st_dev. I reserve the right for st_dev to > be significant when comparing namespaces. Otherwise I might have to > create a namespace of namespaces someday and that is ugly. Ah yes. Thanks for catching that. I've adjusted the text, and the example program. Cheers, Michael >>Either of these ioctl(2) operations can fail with the following >>error: >> >>EPERM The requested namespace is outside of the caller's names‐ >> pace scope. This error can occur if, for example, the own‐ >> ing user namespace is an ancestor of the caller's current >> user namespace. It can also occur on attempts to obtain >> the parent of the initial user or PID namespace. >> >>Additionally, the NS_GET_PARENT operation can fail with the fol‐ >>lowing error: >> >>EINVAL fd refers to a nonhierarchical namespace. >> >>See the EXAMPLE section for an example of the use of these opera‐ >>tions. >> >>[...] > > Eric > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: Documenting the ioctl interfaces to discover relationships between namespaces
"Michael Kerrisk (man-pages)" writes: > [was: [PATCH 0/4 v3] Add an interface to discover relationships > between namespaces] One small comment below. > >Introspecting namespace relationships >Since Linux 4.9, two ioctl(2) operations are provided to allow >introspection of namespace relationships (see user_namespaces(7) >and pid_namespaces(7)). The form of the calls is: > >ioctl(fd, request); > >In each case, fd refers to a /proc/[pid]/ns/* file. > >NS_GET_USERNS > Returns a file descriptor that refers to the owning user > namespace for the namespace referred to by fd. > >NS_GET_PARENT > Returns a file descriptor that refers to the parent names‐ > pace of the namespace referred to by fd. This operation is > valid only for hierarchical namespaces (i.e., PID and user > namespaces). For user namespaces, NS_GET_PARENT is synony‐ > mous with NS_GET_USERNS. > >In each case, the returned file descriptor is opened with O_RDONLY >and O_CLOEXEC (close-on-exec). > >By applying fstat(2) to the returned file descriptor, one obtains >a stat structure whose st_ino (inode number) field identifies the >owning/parent namespace. This inode number can be matched with >the inode number of another /proc/[pid]/ns/{pid,user} file to >determine whether that is the owning/parent namespace. Like all fstat inode comparisons to be fully accurate you need to compare both the st_ino and st_dev. I reserve the right for st_dev to be significant when comparing namespaces. Otherwise I might have to create a namespace of namespaces someday and that is ugly. >Either of these ioctl(2) operations can fail with the following >error: > >EPERM The requested namespace is outside of the caller's names‐ > pace scope. This error can occur if, for example, the own‐ > ing user namespace is an ancestor of the caller's current > user namespace. It can also occur on attempts to obtain > the parent of the initial user or PID namespace. > >Additionally, the NS_GET_PARENT operation can fail with the fol‐ >lowing error: > >EINVAL fd refers to a nonhierarchical namespace. > >See the EXAMPLE section for an example of the use of these opera‐ >tions. > >[...] Eric