On 30.07.2020 17:34, Eric W. Biederman wrote:
> Kirill Tkhai <ktk...@virtuozzo.com> writes:
> 
>> Currently, there is no a way to list or iterate all or subset of namespaces
>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories,
>> but some also may be as open files, which are not attached to a process.
>> When a namespace open fd is sent over unix socket and then closed, it is
>> impossible to know whether the namespace exists or not.
>>
>> Also, even if namespace is exposed as attached to a process or as open file,
>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because
>> this multiplies at tasks and fds number.
> 
> I am very dubious about this.
> 
> I have been avoiding exactly this kind of interface because it can
> create rather fundamental problems with checkpoint restart.

restart/restore :)

> You do have some filtering and the filtering is not based on current.
> Which is good.
> 
> A view that is relative to a user namespace might be ok.    It almost
> certainly does better as it's own little filesystem than as an extension
> to proc though.
> 
> The big thing we want to ensure is that if you migrate you can restore
> everything.  I don't see how you will be able to restore these files
> after migration.  Anything like this without having a complete
> checkpoint/restore story is a non-starter.

There is no difference between files in /proc/namespaces/ directory and 
/proc/[pid]/ns/.

CRIU can restore open files in /proc/[pid]/ns, the same will be with 
/proc/namespaces/ files.
As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't 
see any
problem here.

If you have a specific worries about, let's discuss them.

CC: Pavel Tikhomirov CRIU maintainer, who knows everything about namespaces C/R.
 
> Further by not going through the processes it looks like you are
> bypassing the existing permission checks.  Which has the potential
> to allow someone to use a namespace who would not be able to otherwise.

I agree, and I wrote to Christian, that permissions should be more strict.
This just should be formalized. Let's discuss this.

> So I think this goes one step too far but I am willing to be persuaded
> otherwise.
> 
> Eric
> 
> 
> 
> 
>> This patchset introduces a new /proc/namespaces/ directory, which exposes
>> subset of permitted namespaces in linear view:
>>
>> # ls /proc/namespaces/ -l
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'cgroup:[4026531835]' -> 
>> 'cgroup:[4026531835]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'ipc:[4026531839]' -> 
>> 'ipc:[4026531839]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531840]' -> 
>> 'mnt:[4026531840]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531861]' -> 
>> 'mnt:[4026531861]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532133]' -> 
>> 'mnt:[4026532133]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532134]' -> 
>> 'mnt:[4026532134]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532135]' -> 
>> 'mnt:[4026532135]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532136]' -> 
>> 'mnt:[4026532136]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'net:[4026531993]' -> 
>> 'net:[4026531993]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'pid:[4026531836]' -> 
>> 'pid:[4026531836]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'time:[4026531834]' -> 
>> 'time:[4026531834]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'user:[4026531837]' -> 
>> 'user:[4026531837]'
>> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'uts:[4026531838]' -> 
>> 'uts:[4026531838]'
>>
>> Namespace ns is exposed, in case of its user_ns is permitted from /proc's 
>> pid_ns.
>> I.e., /proc is related to pid_ns, so in /proc/namespace we show only a ns, 
>> which is
>>
>>      in_userns(pid_ns->user_ns, ns->user_ns).
>>
>> In case of ns is a user_ns:
>>
>>      in_userns(pid_ns->user_ns, ns).
>>
>> The patchset follows this steps:
>>
>> 1)A generic counter in ns_common is introduced instead of separate
>>   counters for every ns type (net::count, uts_namespace::kref,
>>   user_namespace::count, etc). Patches [1-8];
>> 2)Patch [9] introduces IDR to link and iterate alive namespaces;
>> 3)Patch [10] is refactoring;
>> 4)Patch [11] actually adds /proc/namespace directory and fs methods;
>> 5)Patches [12-23] make every namespace to use the added methods
>>   and to appear in /proc/namespace directory.
>>
>> This may be usefull to write effective debug utils (say, fast build
>> of networks topology) and checkpoint/restore software.
>> ---
>>
>> Kirill Tkhai (23):
>>       ns: Add common refcount into ns_common add use it as counter for net_ns
>>       uts: Use generic ns_common::count
>>       ipc: Use generic ns_common::count
>>       pid: Use generic ns_common::count
>>       user: Use generic ns_common::count
>>       mnt: Use generic ns_common::count
>>       cgroup: Use generic ns_common::count
>>       time: Use generic ns_common::count
>>       ns: Introduce ns_idr to be able to iterate all allocated namespaces in 
>> the system
>>       fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c
>>       fs: Add /proc/namespaces/ directory
>>       user: Free user_ns one RCU grace period after final counter put
>>       user: Add user namespaces into ns_idr
>>       net: Add net namespaces into ns_idr
>>       pid: Eextract child_reaper check from pidns_for_children_get()
>>       proc_ns_operations: Add can_get method
>>       pid: Add pid namespaces into ns_idr
>>       uts: Free uts namespace one RCU grace period after final counter put
>>       uts: Add uts namespaces into ns_idr
>>       ipc: Add ipc namespaces into ns_idr
>>       mnt: Add mount namespaces into ns_idr
>>       cgroup: Add cgroup namespaces into ns_idr
>>       time: Add time namespaces into ns_idr
>>
>>
>>  fs/mount.h                     |    4 
>>  fs/namespace.c                 |   14 +
>>  fs/nsfs.c                      |   78 ++++++++
>>  fs/proc/Makefile               |    1 
>>  fs/proc/internal.h             |   18 +-
>>  fs/proc/namespaces.c           |  382 
>> +++++++++++++++++++++++++++-------------
>>  fs/proc/root.c                 |   17 ++
>>  fs/proc/task_namespaces.c      |  183 +++++++++++++++++++
>>  include/linux/cgroup.h         |    6 -
>>  include/linux/ipc_namespace.h  |    3 
>>  include/linux/ns_common.h      |   11 +
>>  include/linux/pid_namespace.h  |    4 
>>  include/linux/proc_fs.h        |    1 
>>  include/linux/proc_ns.h        |   12 +
>>  include/linux/time_namespace.h |   10 +
>>  include/linux/user_namespace.h |   10 +
>>  include/linux/utsname.h        |   10 +
>>  include/net/net_namespace.h    |   11 -
>>  init/version.c                 |    2 
>>  ipc/msgutil.c                  |    2 
>>  ipc/namespace.c                |   17 +-
>>  ipc/shm.c                      |    1 
>>  kernel/cgroup/cgroup.c         |    2 
>>  kernel/cgroup/namespace.c      |   25 ++-
>>  kernel/pid.c                   |    2 
>>  kernel/pid_namespace.c         |   46 +++--
>>  kernel/time/namespace.c        |   20 +-
>>  kernel/user.c                  |    2 
>>  kernel/user_namespace.c        |   23 ++
>>  kernel/utsname.c               |   23 ++
>>  net/core/net-sysfs.c           |    6 -
>>  net/core/net_namespace.c       |   18 +-
>>  net/ipv4/inet_timewait_sock.c  |    4 
>>  net/ipv4/tcp_metrics.c         |    2 
>>  34 files changed, 746 insertions(+), 224 deletions(-)
>>  create mode 100644 fs/proc/task_namespaces.c
>>
>> --
>> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>

Reply via email to