Hi, Stéphane Graber Many thanks for your review!
> -----Original Message----- > From: Stéphane Graber [mailto:stgra...@ubuntu.com] > Sent: Friday, July 08, 2016 11:26 PM > To: Zhao Lei <zhao...@cn.fujitsu.com> > Cc: linux-kernel@vger.kernel.org; contain...@lists.linux-foundation.org; Eric > W. > Biederman <ebied...@xmission.com> > Subject: Re: [PATCH] [RFC] Limit dump_pipe program's permission to init for > container > > On Fri, Jul 08, 2016 at 07:08:10PM +0800, Zhao Lei wrote: > > Currently when we set core_pattern to a pipe, the pipe program is > > forked by kthread running with root's permission, and write dumpfile > > into host's filesystem. > > Same thing happened for container, the dumper and dumpfile are also > > in host(not in container). > > > > It have following program: > > 1: Not consistent with file_type core_pattern > > When we set core_pattern to a file, the container will write dump > > into container's filesystem instead of host. > > 2: Not safe for privileged container > > In a privileged container, user can destroy host system by following > > command: > > # # In a container > > # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern > > # make_dump > > > > This patch switch dumper program's permission to init task, it is to say, > > for container, dumper program have same permission with init task in > > container, which make dumper program put in container's filesystem, and > > write coredump into container's filesystem. > > The dumper's permission is also limited into subset of container's init > > process. > > > > See following discussion for detail: > > http://www.gossamer-threads.com/lists/linux/kernel/2455363 > > http://www.gossamer-threads.com/lists/linux/kernel/2397602 > > > > Todo: > > 1: Set different core_pattern to host and each container. > > 2: Keep compatibility with current design > > All above are done in previous version of this patch, > > need some restruct. > > > > Any suggestion is welcome for this patch. > > > Ok, so those two todo items are I think absolute musts before we can > consider any of this. > > Changing the default behavior that we have had ever since pipe in > core_pattern and namespaces have existed would cause serious problem for > existing users. > > > Speaking for myself, LXC does setup limitations on privileged containers > which prevent them from changing the core_pattern so that it can't be > abused for privileged containers. > The core_pattern on Ubuntu then sends all crashes to apport, as root, on > the host. Apport is then container aware and handles forwarding the > crash (through a unix socket) to the apport process inside the > container. > > Your suggested change would likely make this all fail as apport on the > host must be called as root to be able to access the container's > filesystem (through /proc/<PID>/root). > > > With namespacing in place, then we wouldn't need the apport relay trick > on the host, so that'd be fine. > Yes, keeping compatibility with existed container program is important, we can not break these tools. The compatibility problem is discussed in previous impliment of this patch before, from: http://www.gossamer-threads.com/lists/linux/kernel/2399347#2399347 to http://www.gossamer-threads.com/lists/linux/kernel/2400776#2400776 We get this way in discusstion: Use new behavior(dump into container) only if user re-set core_pattern in container. And for lxc, because core_pattern is set by host, the dump will keeping old behavior. It is already done in the previous patch, and it can be move to this patch, I'll start it. > And otherwise, making this behavior optional (different pattern or > something) would be fine too (not breaking existing users). > Yes, it is a good idea, it give us another selection for compatibility problem. Thanks Zhaolei > > > > Suggested-by: Eric W. Biederman <ebied...@xmission.com> > > Suggested-by: KOSAKI Motohiro <kosaki.motoh...@jp.fujitsu.com> > > > > Signed-off-by: Zhao Lei <zhao...@cn.fujitsu.com> > > --- > > fs/coredump.c | 84 > ++++++++++++++++++++++++++++++++++++++++++++++++- > > include/linux/binfmts.h | 1 + > > 2 files changed, 84 insertions(+), 1 deletion(-) > > > > diff --git a/fs/coredump.c b/fs/coredump.c > > index 281b768..92dc343 100644 > > --- a/fs/coredump.c > > +++ b/fs/coredump.c > > @@ -516,6 +516,8 @@ static int umh_pipe_setup(struct subprocess_info > *info, struct cred *new) > > { > > struct file *files[2]; > > struct coredump_params *cp = (struct coredump_params *)info->data; > > + struct task_struct *base_task; > > + > > int err = create_pipe_files(files, 0); > > if (err) > > return err; > > @@ -524,10 +526,78 @@ static int umh_pipe_setup(struct subprocess_info > *info, struct cred *new) > > > > err = replace_fd(0, files[0], 0); > > fput(files[0]); > > + if (err) > > + return err; > > + > > /* and disallow core files too */ > > current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; > > > > - return err; > > + base_task = cp->base_task; > > + if (base_task) { > > + const struct cred *base_cred; > > + > > + /* Set fs_root to base_task */ > > + spin_lock(&base_task->fs->lock); > > + set_fs_root(current->fs, &base_task->fs->root); > > + spin_unlock(&base_task->fs->lock); > > + > > + /* Set namespaces to base_task */ > > + switch_task_namespaces(current, base_task->nsproxy); > > + > > + /* Set cgroup to base_task */ > > + current->flags &= ~PF_NO_SETAFFINITY; > > + err = cgroup_attach_task_all(base_task, current); > > + if (err < 0) > > + return err; > > + > > + /* Set cred to base_task */ > > + base_cred = get_task_cred(base_task); > > + > > + new->uid = base_cred->uid; > > + new->gid = base_cred->gid; > > + new->suid = base_cred->suid; > > + new->sgid = base_cred->sgid; > > + new->euid = base_cred->euid; > > + new->egid = base_cred->egid; > > + new->fsuid = base_cred->fsuid; > > + new->fsgid = base_cred->fsgid; > > + > > + new->securebits = base_cred->securebits; > > + > > + new->cap_inheritable = base_cred->cap_inheritable; > > + new->cap_permitted = base_cred->cap_permitted; > > + new->cap_effective = base_cred->cap_effective; > > + new->cap_bset = base_cred->cap_bset; > > + new->cap_ambient = base_cred->cap_ambient; > > + > > + security_cred_free(new); > > +#ifdef CONFIG_SECURITY > > + new->security = NULL; > > +#endif > > + err = security_prepare_creds(new, base_cred, GFP_KERNEL); > > + if (err < 0) { > > + put_cred(base_cred); > > + return err; > > + } > > + > > + free_uid(new->user); > > + new->user = base_cred->user; > > + get_uid(new->user); > > + > > + put_user_ns(new->user_ns); > > + new->user_ns = base_cred->user_ns; > > + get_user_ns(new->user_ns); > > + > > + put_group_info(new->group_info); > > + new->group_info = base_cred->group_info; > > + get_group_info(new->group_info); > > + > > + put_cred(base_cred); > > + > > + validate_creds(new); > > + } > > + > > + return 0; > > } > > > > void do_coredump(const siginfo_t *siginfo) > > @@ -590,6 +660,7 @@ void do_coredump(const siginfo_t *siginfo) > > > > if (ispipe) { > > int dump_count; > > + struct task_struct *vinit_task; > > char **helper_argv; > > struct subprocess_info *sub_info; > > > > @@ -631,6 +702,12 @@ void do_coredump(const siginfo_t *siginfo) > > goto fail_dropcount; > > } > > > > + vinit_task = find_task_by_vpid(1); > > + if (!vinit_task) { > > + printk(KERN_WARNING "failed getting init task info, > > skipping > core dump\n"); > > + goto fail_dropcount; > > + } > > + > > helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL); > > if (!helper_argv) { > > printk(KERN_WARNING "%s failed to allocate memory\n", > > @@ -638,6 +715,10 @@ void do_coredump(const siginfo_t *siginfo) > > goto fail_dropcount; > > } > > > > + get_task_struct(vinit_task); > > + > > + cprm.base_task = vinit_task; > > + > > retval = -ENOMEM; > > sub_info = call_usermodehelper_setup(helper_argv[0], > > helper_argv, NULL, GFP_KERNEL, > > @@ -646,6 +727,7 @@ void do_coredump(const siginfo_t *siginfo) > > retval = call_usermodehelper_exec(sub_info, > > UMH_WAIT_EXEC); > > > > + put_task_struct(vinit_task); > > argv_free(helper_argv); > > if (retval) { > > printk(KERN_INFO "Core dump to |%s pipe failed\n", > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > index 314b3ca..0c9a72c 100644 > > --- a/include/linux/binfmts.h > > +++ b/include/linux/binfmts.h > > @@ -59,6 +59,7 @@ struct linux_binprm { > > > > /* Function parameter for binfmt->coredump */ > > struct coredump_params { > > + struct task_struct *base_task; > > const siginfo_t *siginfo; > > struct pt_regs *regs; > > struct file *file; > > -- > > 1.8.5.1 > > > > > > > > _______________________________________________ > > Containers mailing list > > contain...@lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/containers > > -- > Stéphane Graber > Ubuntu developer > http://www.ubuntu.com