On Sun, 2015-02-08 at 20:00 +0100, Oleg Nesterov wrote: > On 02/05, Ian Kent wrote: > > > > +int umh_enter_ns(struct task_struct *tsk, struct cred *new) > > +{ > > + char path[NS_PATH_MAX]; > > + struct vfsmount *mnt; > > + const char *name; > > + pid_t pid; > > + int err = 0; > > + > > + pid = task_pid_nr(tsk); > > + > > + /* > > + * The user mode thread runner runs in the root init namespace > > + * so it will see all system pids. > > + */ > > + mnt = task_active_pid_ns(current)->proc_mnt; > > + > > + for (name = ns_names[0]; *name; name++) { > > + struct file *this; > > + int len; > > + > > + len = snprintf(path, > > + NS_PATH_MAX, NS_PATH_FMT, > > + (unsigned long) pid, name); > > + if (len >= NS_PATH_MAX) { > > + err = -ENAMETOOLONG; > > + break; > > + } > > + > > + this = file_open_root(mnt->mnt_root, mnt, path, O_RDONLY); > > + if (unlikely(IS_ERR(this))) { > > + err = PTR_ERR(this); > > + break; > > + } > > + > > + err = setns_inode(file_inode(this), 0); > > + fput(this); > > + if (err) > > + break; > > + } > > + > > + return err; > > +} > > Yes, I need to actually read this series and setns paths, but at first glance > there must be a simpler method to call ops->install's and > switch_task_namespaces.
Yes, the namespaces implementation does seem a bit strange in this respect. I mentioned that concern the first time I posted these. But I'm still not that clear on the big picture of how namespace are meant to work. It's not just access to ops->install() that's the problem. For each of the individual namespaces we open a file handle, to get access to ops->install() for that namespace, install it, drop "all" the namespaces then replace them with the new set that essentially has one namespace changed. > > Sorry if this was already discussed before, but to me it looks a bit strange > to abuse /proc/ files for this. And again, iiuc file_open_root() can fail if > tsk has already exited (init can be multithreaded). Not sure that the failure is a problem though as long as it's handled since, if the init process of the container is gone (or will be gone once were done), so is the container and the caller. The use of proc is largely because we can't use the callers environment to setup the process as the caller could manipulate it to subvert the system. When not executing in a container the thread runner runs under root init so nothing needs to be done but in a container we want to use the init process of the container so the container's namespaces are used. There is probably a better way to do it, suggestions welcome! Ian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/