On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote:
> >> I am on the fence about what to do when a uid from the filesystem server
> >> or for other filesystems the on-disk data structures does not map, but
> >> make_bad_inode is simpler in conception.  So make_bad_inode seems like
> >> a good place to start.   For fuse especially this isn't hard because
> >> the make_bad_inode calls are already there to handle a change in i_mode.
> >
> > I agree that if we're unsure then make_bad_inode is a more logical place
> > to start, since it's easier to loosen the restrictions later than to
> > tighten them. I've got an initiail implementation that I'm in the
> > process of testing. If you want to take a look I've pushed it to:
> >
> >   git://kernel.ubuntu.com/sforshee/linux.git fuse-userns 
> 
> Thanks.  If I can scrape together enough focus I will look at it.
> 
> As a second best thing here are my prototype from when I was looking at
> performing this fuse conversion last.  Given that you had missed
> checking the from_kuid permission checks, it might be worth comparing
> and seeing if there is something else in these patches that would be
> worth including.

I think all of the from_kuid checks have been there for at least the
last two rounds of patches, but let me know if you see one I missed.

Looking at your patches, I see a lot that looks familiar. Seems like a
good sign :-)

I do see a few things I missed though. I've pointed those out below, and
in those cases I'm updating my patches. There are also some other
differences which I don't believe require any changes on my part, and
I've provided explanations for those. I also asked a few questions.

Miklos, are you satisfied with Eric's explanations about using a single
namespace and the permissions checks on uids? If so I should be able to
send updated patches early next week, otherwise let's continue trying to
resolve the points of disagreement.

> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>                         struct kstat *stat)
>  {
>       unsigned int blkbits;
> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct 
> fuse_attr *attr,
>       stat->ino = attr->ino;
>       stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>       stat->nlink = attr->nlink;
> -     stat->uid = make_kuid(&init_user_ns, attr->uid);
> -     stat->gid = make_kgid(&init_user_ns, attr->gid);
> +     stat->uid = make_kuid(fc->user_ns, attr->uid);
> +     if (!uid_valid(stat->uid))
> +             return -EOVERFLOW;
> +     stat->gid = make_kgid(fc->user_ns, attr->gid);
> +     if (!gid_valid(stat->gid))
> +             return -EOVERFLOW;

If we were to go with the invalid id approach these checks shouldn't be
necessary, and they'll get converted to the overflow ids for userspace.
In my make_bad_inode patches I'm doing this check when we update the
inode and returning -EINVAL if it doesn't map. Then I changed
fuse_fillattr to just copy them from the inode, since fuse always
updates the inode right before calling fuse_fillattr anyway and that
makes it clear why we don't need to check the results of the conversion.

>  static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
> @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct 
> kstat *stat,
>                                              attr_timeout(&outarg),
>                                              attr_version);
>                       if (stat)
> -                             fuse_fillattr(inode, &outarg.attr, stat);
> +                             err = fuse_fillattr(inode, &outarg.attr,
> +                                                 stat);
>               }
>       }
>       return err;

So right before calling fuse_fillattr here fuse_change_attributes is
called to update the inode, so it appears your patches would have
allowed setting inode->i_iuid to INVALID_UID?

> @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>       if (atomic_dec_and_test(&fc->count)) {
>               if (fc->destroy_req)
>                       fuse_request_free(fc->destroy_req);
> +             put_user_ns(fc->user_ns);
> +             fc->user_ns = NULL;
>               fc->release(fc);
>       }
>  }

I did this in fuse_free_con, but now looking again this is a bug for
cuse since it uses a different release method. I've updated my tree to
do it here instead.

> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void 
> *data, int silent)
>       sb->s_maxbytes = MAX_LFS_FILESIZE;
>       sb->s_time_gran = 1;
>       sb->s_export_op = &fuse_export_operations;
> +     sb->s_user_ns = get_user_ns(current_user_ns());
>  
>       file = fget(d.fd);
>       err = -EINVAL;
> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>       }
>  
>       kill_anon_super(sb);
> +     put_user_ns(sb->s_user_ns);
>  }

What's the point of s_user_ns? It doesn't seem to be used anywhere.

> From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebied...@xmission.com>
> Date: Thu, 4 Oct 2012 13:42:34 -0700
> Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace.

<snip>

> -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
> +static int convert_fuse_file_lock(struct fuse_conn *fc,
> +                               const struct fuse_file_lock *ffl,
>                                 struct file_lock *fl)
>  {
>       switch (ffl->type) {
> @@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct 
> fuse_file_lock *ffl,
>  
>               fl->fl_start = ffl->start;
>               fl->fl_end = ffl->end;
> -             fl->fl_pid = ffl->pid;
> +             rcu_read_lock();
> +             fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
> +             rcu_read_unlock();
>               break;
>  
>       default:
> @@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct 
> fuse_file_lock *ffl,
>  }
>  
>  static void fuse_lk_fill(struct fuse_req *req, struct file *file,
> -                      const struct file_lock *fl, int opcode, pid_t pid,
> +                      const struct file_lock *fl, int opcode, struct pid 
> *pid,
>                        int flock)
>  {
>       struct inode *inode = file_inode(file);
> @@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct 
> file *file,
>       arg->lk.start = fl->fl_start;
>       arg->lk.end = fl->fl_end;
>       arg->lk.type = fl->fl_type;
> -     arg->lk.pid = pid;
> +     arg->lk.pid = pid_nr_ns(pid, fc->pid_ns);
>       if (flock)
>               arg->lk_flags |= FUSE_LK_FLOCK;
>       req->in.h.opcode = opcode;
> @@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct 
> file_lock *fl)
>       if (IS_ERR(req))
>               return PTR_ERR(req);
>  
> -     fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0);
> +     fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0);
>       req->out.numargs = 1;
>       req->out.args[0].size = sizeof(outarg);
>       req->out.args[0].value = &outarg;
> @@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct 
> file_lock *fl)
>       err = req->out.h.error;
>       fuse_put_request(fc, req);
>       if (!err)
> -             err = convert_fuse_file_lock(&outarg.lk, fl);
> +             err = convert_fuse_file_lock(fc, &outarg.lk, fl);
>  
>       return err;
>  }
> @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct 
> file_lock *fl, int flock)
>       struct fuse_conn *fc = get_fuse_conn(inode);
>       struct fuse_req *req;
>       int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
> -     pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
> +     struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
>       int err;
>  
>       if (fl->fl_lmops && fl->fl_lmops->lm_grant) {

D'oh, I did totally miss this file locking stuff. Your changes look
good, so I'll pretty much take them verbatim.

> @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>                       fuse_request_free(fc->destroy_req);
>               put_user_ns(fc->user_ns);
>               fc->user_ns = NULL;
> +             put_pid_ns(fc->pid_ns);
> +             fc->pid_ns = NULL;
>               fc->release(fc);
>       }
>  }

And I had a leak here for cuse as with the userns.

> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebied...@xmission.com>
> Date: Fri, 5 Oct 2012 10:18:28 -0700
> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport
> 
> Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
> ---
>  fs/fuse/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 5284d7fda269..75f5326868e0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>  static struct file_system_type fuse_fs_type = {
>       .owner          = THIS_MODULE,
>       .name           = "fuse",
> -     .fs_flags       = FS_HAS_SUBTYPE,
> +     .fs_flags       = FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>       .mount          = fuse_mount,
>       .kill_sb        = fuse_kill_sb_anon,
>  };

I have the order of my patches switched, then this one squashed in with
the one which adds userns support to fuse.

> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebied...@xmission.com>
> Date: Fri, 5 Oct 2012 14:33:36 -0700
> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
> 
> In the context of unprivileged mounts supporting anything except
> xattrs with the "user." prefix seems foolish.  Return -EOPNOSUPP
> for all other types of xattrs.
> 
> Cc: Miklos Szeredi <mik...@szeredi.hu>
> Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
> ---
>  fs/fuse/dir.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d74c75a057cd..d84f5b819fab 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched.h>
>  #include <linux/namei.h>
>  #include <linux/slab.h>
> +#include <linux/xattr.h>
>  
>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>  {
> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const 
> char *name,
>       if (fc->no_setxattr)
>               return -EOPNOTSUPP;
>  
> +     if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +             return -EOPNOTSUPP;
> +
>       req = fuse_get_req_nopages(fc);
>       if (IS_ERR(req))
>               return PTR_ERR(req);
> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, 
> const char *name,
>       if (fc->no_getxattr)
>               return -EOPNOTSUPP;
>  
> +     if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> +             return -EOPNOTSUPP;
> +
>       req = fuse_get_req_nopages(fc);
>       if (IS_ERR(req))
>               return PTR_ERR(req);

This I hadn't considered, but it seems reasonable. Two questions though.

Why shouldn't we let root-owned mounts support namespaces other than
"user."?

Thinking ahead to (hopefully) allowing other filesystems to be mounted
from user namespaces, should this be enforced by the vfs instead?

Thanks,
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to