Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Monday 16 April 2007 18:21, Christoph Hellwig wrote: > On Mon, Apr 16, 2007 at 06:11:30PM +0200, Andreas Gruenbacher wrote: > > On Thursday 12 April 2007 12:06, Christoph Hellwig wrote: > > > Once again very strong NACK. Every conditional passing of vfsmounts get > > > my veto. As mentioned last time if you really want this send a patch > > > series first that passed the vfsmount consistantly. > > > > I don't consider it fair to NACK this patch just because some other part > > of the kernel uses vfs_create() in the wrong way -- those are really > > independent issues. The bug is not that hard to fix though; here is a > > proposed patch on top of the other AppArmor patches. > > Note that it's not just this particular hunk, all these kinds of conditional > passing are wrong. Yes, the vfs uses or passes through a NULL nameidata or vfsmount pointer and no intent information in several places: (1) vfs_create() is called with a NULL nameidata in two places, (2) lookup_one_len() and lookup_one_len_kern() pass a NULL nameidata to permission(), d_op->d_revalidate(), and i_op->lookup(), (3) file_permission() passes a NULL nameidata to permission(). (4) permission() is directly called with NULL nameidata in some places, and In general it is incorrect to use NULL nameidata or vfsmounts because then the vfsmount->mnt_flags cannot be checked, and no intent information is available. Intents are an optional optimization for remote filesystems; we can simply ignore them for local filesystems. There are some cases where we are passing NULL which are real bugs, but there are also other cases in which the operations are not mount related: for example, filesystems like pipefs have the MS_NOUSER flag set which indicates that they cannot be mounted at all. On filesystems like sysfs, files are created and removed whether or not that filesystem is mounted. The places where we should pass a proper nameidata / vfsmount but don't should be fixed. Those are long-standing issues in the vfs though, and at least some are not easy to correct. AppArmor is affected by some of the NULL vfsmount passing, and we found a few more problems when looking into this more closely, but most of that NULL passing doesn't affect AppArmor: * In some places, vfs functions are called without nameidata / vfsmount, followed by calling lsm hooks with a vfsmount. In those cases, the vfs functions cannot check the vfsmount flags, but the lsm hooks will still do the appropriate checks. * In the AppArmor security model, directory searching is always allowed, and the access check are done once the leaf dentry + vfsmount has been looked up. For example, granting write access to /var/tmp/foo in a profile is sufficient to allow that access; search access to /var and /var/tmp does not need to be specified. (The regular inode permission checks are of course still performed.) * For filesystems like nfsd, the concept of pathname based access control doesn't make sense because of properties of the protocol: there is no reliable mapping from an nfs file handle to a particular file; aliases to the same file cannot be distinguished. (The AppArmor techdoc [1] describes this in more detail.) Not allowing to confine nfsd by AppArmor doesn't hurt much: nfsd provides its own export access control mechanisms. Also, nfsd cannot really be confined in a secure way because of how it is implemented in the kernel. Gfs2 may have similar properties; I didn't actually look into the protocol. So I propose to separate the vfs NULL nameidata / vfsmount passing discussion from the AppArmor discussion. I have no problem working on the vfs fixes -- we could some nice cleanups there -- but that really is independent of AppArmor. We almost have the revised AppArmor patches ready. We'll include all the fixes that are truly necessary, and everything beyond that will be posted separately. > But anyway, creating fake nameidata structures is not really helpful. > If there is a nameidata passed people expect it to be complete, and > if you pass them to an LSM people will e.g. try to look into lookup > intents. Splitting up nameidata helps somewhat here. In cases where we don't have intent information available we can zero out the intent flags, and so in the worst case, we won't get the intent optimizations. Thanks, Andreas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Monday 16 April 2007 18:45, Christoph Hellwig wrote: > You should provide intent information, yes - which your patch didn't :) Well, the information implicitly provided is "no intent": In do_create() in ipc/mqueue.c intents would be pretty pointless because the mqueue filesystem is local. In fs/nfsd/vfs.c, intents would make slightly more sense assuming that the underlying filesystem eported via nfsd is remote. That's an optimization, and not even a very important one. > (Which btw, I expect to cause quite a few problems for apparmor or other > lsms, but I guess so far no one has tried them on NFSv4) Pathname wise, NFSv4 should look like any other filesystem on the client side. On the Server side, the concept of pathnames doesn't really fly for nfs: if a directory contains more than one link to the same file, there is no way to tell those aliases from each other from the file descriptor. In addition, computing even the somewhat ambiguous pathnames that can be computed would require subtree checking. But trying to confine nfsd is pretty pointless anyway: the deamon is privileged and can do whatever it wants. It makes more sense to export the right directories with the right permissions in the first place. Thanks, Andreas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Mon, Apr 16, 2007 at 06:40:41PM +0200, Andreas Gruenbacher wrote: > On Monday 16 April 2007 18:21, Christoph Hellwig wrote: > > But anyway, creating fake nameidata structures is not really helpful. > > If there is a nameidata passed people expect it to be complete, and > > if you pass them to an LSM people will e.g. try to look into lookup > > intents. > > I don't actually agree with that: when nfsd creates a file, it still is a > file > create no matter where it originates from, and so it does make sense to > provide the appropriate intent information too. Struct nameidata contains > other crap only needed during an actual lookup too --- that's a mess, and the You should provide intent information, yes - which your patch didn't :) And yes, I didn't like doing the ugly intent in nameidata hack, it's creating loads of problems for various parties, e.g. the stackable filesystem folks. Now the basic intent in nameidata mistaken has been made even worse by passing back a struct file in it conditionally and doing lots of work in ->lookup that shouldn't be there. (Which btw, I expect to cause quite a few problems for apparmor or other lsms, but I guess so far no one has tried them on NFSv4) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Monday 16 April 2007 18:21, Christoph Hellwig wrote: > But anyway, creating fake nameidata structures is not really helpful. > If there is a nameidata passed people expect it to be complete, and > if you pass them to an LSM people will e.g. try to look into lookup > intents. I don't actually agree with that: when nfsd creates a file, it still is a file create no matter where it originates from, and so it does make sense to provide the appropriate intent information too. Struct nameidata contains other crap only needed during an actual lookup too --- that's a mess, and the nameidata2 patch shows one possibility how to deal with it. (It's the cleanest approach I could think of right now without cluttering the vfs code with insane amounts of dereferences.) Thanks, Andreas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Mon, Apr 16, 2007 at 06:11:30PM +0200, Andreas Gruenbacher wrote: > +static inline int > +nfsd_do_create(struct inode *dir, struct dentry *child, struct vfsmount *mnt, > +int mode) > +{ > + struct nameidata nd = { > + .dentry = child, > + .mnt = mnt, > + }; > + > + return vfs_create(dir, child, mode, &nd); > +} > + Wouldn't it normally result in fewer instructions (on most architectures ... maybe not x86) to keep the same argument order as vfs_create? ie: static inline int nfsd_do_create(struct inode *dir, struct dentry *child, int mode, struct vfsmount *mnt) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Mon, Apr 16, 2007 at 06:11:30PM +0200, Andreas Gruenbacher wrote: > On Thursday 12 April 2007 12:06, Christoph Hellwig wrote: > > Once again very strong NACK. Every conditional passing of vfsmounts get my > > veto. As mentioned last time if you really want this send a patch series > > first that passed the vfsmount consistantly. > > I don't consider it fair to NACK this patch just because some other part of > the kernel uses vfs_create() in the wrong way -- those are really independent > issues. The bug is not that hard to fix though; here is a proposed patch on > top of the other AppArmor patches. Note that it's not just this particular hunk, all these kinds of conditional passing are wrong. > > The offenders are nfsd and the mqueue filesystem. Instantiate a struct > nameidata there. > > The .dentry and .mnt fields can easily be filled in in nfsd. The mqueue > filesystem is somewhat special: files that mq_open creates are on an internal > mount and the mqueue filesystem is not generally visible (it may or may not > be mounted). When passing a regular vfsmount to vfs_create the files would > appear disconnected while they actually are kernel-internal objects. Use a > NULL vfsmount there to avoid that -- passing the kernel-internal vfsmount > there wouldn't help, anyway. But anyway, creating fake nameidata structures is not really helpful. If there is a nameidata passed people expect it to be complete, and if you pass them to an LSM people will e.g. try to look into lookup intents. But similar to the per-mountpoint r/o work I suspect you're simply operating on the wrong level, but then again this might not be fixable due to the braindead apparmor design. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[nameidata 1/2] Don't pass NULL nameidata to vfs_create
On Thursday 12 April 2007 12:06, Christoph Hellwig wrote: > Once again very strong NACK. Every conditional passing of vfsmounts get my > veto. As mentioned last time if you really want this send a patch series > first that passed the vfsmount consistantly. I don't consider it fair to NACK this patch just because some other part of the kernel uses vfs_create() in the wrong way -- those are really independent issues. The bug is not that hard to fix though; here is a proposed patch on top of the other AppArmor patches. The offenders are nfsd and the mqueue filesystem. Instantiate a struct nameidata there. The .dentry and .mnt fields can easily be filled in in nfsd. The mqueue filesystem is somewhat special: files that mq_open creates are on an internal mount and the mqueue filesystem is not generally visible (it may or may not be mounted). When passing a regular vfsmount to vfs_create the files would appear disconnected while they actually are kernel-internal objects. Use a NULL vfsmount there to avoid that -- passing the kernel-internal vfsmount there wouldn't help, anyway. Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]> --- fs/namei.c|7 --- fs/nfsd/vfs.c | 23 +++ ipc/mqueue.c |6 +- 3 files changed, 28 insertions(+), 8 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1503,7 +1503,7 @@ int vfs_create(struct inode *dir, struct return -EACCES; /* shouldn't it be ENOSYS? */ mode &= S_IALLUGO; mode |= S_IFREG; - error = security_inode_create(dir, dentry, nd ? nd->mnt : NULL, mode); + error = security_inode_create(dir, dentry, nd->mnt, mode); if (error) return error; DQUOT_INIT(dir); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1108,6 +1108,18 @@ nfsd_commit(struct svc_rqst *rqstp, stru } #endif /* CONFIG_NFSD_V3 */ +static inline int +nfsd_do_create(struct inode *dir, struct dentry *child, struct vfsmount *mnt, + int mode) +{ + struct nameidata nd = { + .dentry = child, + .mnt = mnt, + }; + + return vfs_create(dir, child, mode, &nd); +} + /* * Create a file (regular, directory, device, fifo); UNIX sockets * not yet implemented. @@ -1192,7 +1204,8 @@ nfsd_create(struct svc_rqst *rqstp, stru err = 0; switch (type) { case S_IFREG: - host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL); + host_err = nfsd_do_create(dirp, dchild, exp->ex_mnt, + iap->ia_mode); break; case S_IFDIR: host_err = vfs_mkdir(dirp, dchild, exp->ex_mnt, iap->ia_mode); @@ -1253,6 +1266,7 @@ nfsd_create_v3(struct svc_rqst *rqstp, s int *truncp, int *created) { struct dentry *dentry, *dchild = NULL; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1271,6 +1285,7 @@ nfsd_create_v3(struct svc_rqst *rqstp, s goto out; dentry = fhp->fh_dentry; + exp = fhp->fh_export; dirp = dentry->d_inode; /* Get all the sanity checks out of the way before @@ -1288,7 +1303,7 @@ nfsd_create_v3(struct svc_rqst *rqstp, s if (IS_ERR(dchild)) goto out_nfserr; - err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); + err = fh_compose(resfhp, exp, dchild, fhp); if (err) goto out; @@ -1334,13 +1349,13 @@ nfsd_create_v3(struct svc_rqst *rqstp, s goto out; } - host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL); + host_err = nfsd_do_create(dirp, dchild, exp->ex_mnt, iap->ia_mode); if (host_err < 0) goto out_nfserr; if (created) *created = 1; - if (EX_ISSYNC(fhp->fh_export)) { + if (EX_ISSYNC(exp)) { err = nfserrno(nfsd_sync_dir(dentry)); /* setattr will sync the child (or not) */ } --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -604,6 +604,10 @@ static int mq_attr_ok(struct mq_attr *at static struct file *do_create(struct dentry *dir, struct dentry *dentry, int oflag, mode_t mode, struct mq_attr __user *u_attr) { + struct nameidata nd = { + .dentry = dentry, + /* Not a regular create, so set .mnt to NULL. */ + }; struct mq_attr attr; int ret; @@ -619,7 +623,7 @@ static struct file *do_create(struct den } mode &= ~current->fs->umask; - ret = vfs_create(dir->d_inode, dentry, mode, NULL); + ret = vfs_create(dir->d_inode, dentry, mode, &nd); dentry->d_fsdata = NULL; if (ret) goto out; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger