Re: [AppArmor 00/44] AppArmor security module overview

2007-07-02 Thread Eric W. Biederman
Adrian Bunk [EMAIL PROTECTED] writes:

 On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
 On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen [EMAIL PROTECTED] wrote:
 
   
   so...  where do we stand with this?  Fundamental, irreconcilable
   differences over the use of pathname-based security?
   
  There certainly seems to be some differences of opinion over the use
  of pathname-based-security.
 
 I was refreshed to have not been cc'ed on a lkml thread for once.  I guess
 it couldn't last.
 
 Do you agree with the irreconcilable part?  I think I do.
 
 I suspect that we're at the stage of having to decide between
 
 a) set aside the technical issues and grudgingly merge this stuff as a
service to Suse and to their users (both of which entities are very
important to us) and leave it all as an object lesson in
how-not-to-develop-kernel-features.
 
Minimisation of the impact on the rest of the kernel is of course
very important here.
 
 versus
 
 b) leave it out and require that Suse wear the permanent cost and
quality impact of maintaining it out-of-tree.  It will still be an
object lesson in how-not-to-develop-kernel-features.
...

 versus

 c) if [1] AppArmor is considered to be something that wouldn't 
be merged if it wasn't already widely deployed by Suse: leave it out, 
work on an ideal solution [2], and let Suse wear the one-time cost
of migrating their users to the ideal solution

 One important point is that if AppArmor gets merged there will be much 
 more distribution support for it, and many people on !Suse will start 
 using it.

 I'm not claiming to understand the technical details, but from both 
 slightly reading over the previous discussions and the What are the 
 advantages of AppArmor over SELinux? section in the AppArmor FAQ [3] my 
 impression is that a main advantage of AppArmor are more user friendly 
 userspace tools. Therefore, if [1] AppArmor is considered technically 
 inferior to SELinux, it might still become more popular than SELinux 
 simply because it's easier to use - and although it's technically 
 inferior.


A couple of random thoughts to mix up this discussion.

From what I have been able to observer the LSM is roughly firewalls
rules for in box operations.  All it can do is increase the chances
you will get -EPERM.

Not being able to take path names into consideration is a current
deficiency of the LSM.

Being exported to modules is another deficiency of the current LSM
as it seems no serious user of the LSM can exist simply as a kernel
module.

Not being able to mix code from different projects is also a very
serious limitation of the LSM.  Currently I don't think we can
build a kernel that supports selinux and any other LSM at the same
time.  Which horribly limits what we can do with the LSM.

So it seems clear that if we are aiming at an ideal solution.  We
first need to enhance the LSM.  Then merge in the AppArmor
functionality.  Doing it all in one patch series looks to overwhelming
for a decent code review.

That said is anyone interested in making the LSM more like iptables
with a generic table based rules structure?  That way we could fix
the one true LSM problem and concentrate on simpler pieces that give
specific bits of interesting functionality.  Or at the very least
be able to compile in multiple different bits of functionality into
the kernel simultaneously.

I'm really not familiar with the security issues the LSM addresses
but I do know, it encourages huge incompatible mega solutions, and
it tends to break when I fix real security problems in the kernel.
So at this point I am convinced that the LSM is deficient, has very
limited usability, and seems to be a very fragile firewall structure
to me.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-26 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 On Apr 25 2007 11:21, Eric W. Biederman wrote:
 
  Why did we want to use fsuid, exactly?
 
 - Because ruid is completely the wrong thing we want mounts owned
   by whomever's permissions we are using to perform the mount.
 
 Think nfs. I access some nfs file as an unprivileged user. knfsd, by
 nature, would run as euid=0, uid=0, but it needs fsuid=jengelh for
 most permission logic to work as expected.

 I don't think knfsd will ever want to call mount(2).

 But yeah, I've been convinced, that using fsuid is the right thing to
 do.

Actually knfsd does call mount when it crosses a mount point on the nfs
server it generates an equivalent mount point in linux.  At least I think
that is the what it is doing.  It is very similar to our mount propagation
path.

However as a special case I don't think the permission checking is likely
to bite us there.  It is worth double checking once we have the other details
ironed out.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-25 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]
 
 - refine adding nosuid and nodev flags for unprivileged mounts:
 o add nosuid, only if mounter doesn't have CAP_SETUID capability
 o add nodev, only if mounter doesn't have CAP_MKNOD capability
 
 - allow unprivileged forced unmount, but only for FS_SAFE filesystems
 
 - allow mounting over special files, but not symlinks
 
 - for mounting and umounting check fsuid instead of ruid

 Andrew, please skip this patch, for now.

 Serge found a problem with the fsuid approach: setfsuid(nonzero) will
 remove filesystem related capabilities.  So even if root is trying to
 set the user=UID flag on a mount, access to the target (and in case
 of bind, the source) is checked with user privileges.

I do have a major problem with this patchset though.  We still have
the unnecessary concept of user mounts.  That seems only needed now
for the /proc/mounts output.

All mounts should have an owner.  Prior to the unprivileged mount work
root owns all mounts.

 Root should be able to set this flag on any mountpoint, _regardless_
 of permissions.

We don't need a flag, and thinking of it in the context of a flag
is clearly the wrong thing.  Yes if we have the proper capability we
should be able to explicitly specify  the owner of the mount

 It is possible to restore filesystem capabilities after setting fsuid,
 but the interfaces are rather horrible at all levels.  mount(8) can
 probably live with these, but I'm not sure that using fsuid over
 ruid has enough advantages to force this.

 Why did we want to use fsuid, exactly?

- Because ruid is completely the wrong thing we want mounts owned
  by whomever's permissions we are using to perform the mount.


There are two basic cases.
- Mounting a filesystem as who we are.
  This can use fsuid with no problems.  If we are suid to root to perform
  the mount by default we want root to own the mount so that is correct.

- Mounting a filesystem as another user.
  This is the tricky case rare case needed in setup.  If we aren't
  jumping through to many hoops to make it work when using fsuid it
  sounds like the right thing here as well.

  How hard is it to set fsuid to a different value?  I.e. What hoops
  does root have to jump through.

Further when using fsuid we don't need an extra flag to mount.

Plus things are a little more consistent with the rest of the
linux/unix interface.

Now I can see doing something like using a special flag and not using
fsuid for the one case where we explicitly want to mount a filesystem
as someone else.  However if only user space has to special case this
(as it does anyway) and we don't have to special case it in the
kernel.  So much the better. 

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-25 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting H. Peter Anvin ([EMAIL PROTECTED]):
 Miklos Szeredi wrote:
  
  Andrew, please skip this patch, for now.
  
  Serge found a problem with the fsuid approach: setfsuid(nonzero) will
  remove filesystem related capabilities.  So even if root is trying to
  set the user=UID flag on a mount, access to the target (and in case
  of bind, the source) is checked with user privileges.
  
  Root should be able to set this flag on any mountpoint, _regardless_
  of permissions.
  
 
 Right, if you're using fsuid != 0, you're not running as root 

 Sure, but what I'm not clear on is why, if I've done a
 prctl(PR_SET_KEEPCAPS, 1) before the setfsuid, I still lose the
 CAP_FS_MASK perms.  I see the special case handling in
 cap_task_post_setuid().  I'm sure there was a reason for it, but
 this is a piece of the capability implementation I don't understand
 right now.

So we drop CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
CAP_FOWNER, and CAP_FSETID

Since we are checking CAP_SETUID or CAP_SYS_ADMIN how is that
a problem?

Are there other permission checks that mount is doing that we
care about.


 (fsuid is
 the equivalent to euid for the filesystem.)

 If it were really the equivalent then I could keep my capabilities :)
 after changing it.

We drop all capabilities after we change the euid.

 I fail to see how ruid should have *any* impact on mount(2).  That seems
 to be a design flaw.

 May be, but just using fsuid at this point stops me from enabling user
 mounts under /share if /share is chmod 000 (which it is).

I'm dense today.  If we can't work out the details we can always use a flag.
But what is the problem with fsuid?

You are not trying to test this using a non-default security model are you?


Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-25 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Eric W. Biederman ([EMAIL PROTECTED]):
 
 Are there other permission checks that mount is doing that we
 care about.

 Not mount itself, but in looking up /share/fa/root/home/fa,
 user fa doesn't have the rights to read /share, and by setting
 fsuid to fa and dropping CAP_DAC_READ_SEARCH the mount action fails.

Got it. 

I'm not certain this is actually a problem it may be a feature.
But it does fly in the face of the general principle of just
getting out of roots way so things can get done.

I think we can solve your basic problem by simply doing like:
chdir(/share); mount(.);  To simply avoid the permission problem.

The practical question is how much do we care.

 But the solution you outlined in your previous post would work around
 this perfectly.

If we are not using usual permissions which user do we use current-uid?
Or do we pass that user someplace?

  If it were really the equivalent then I could keep my capabilities :)
  after changing it.
 
 We drop all capabilities after we change the euid.

 Not if we've done prctl(PR_SET_KEEPCAPS, 1)

Ah cap_clear doesn't do the obvious thing.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/8] mount ownership and unprivileged mount syscall (v4)

2007-04-24 Thread Eric W. Biederman
Karel Zak [EMAIL PROTECTED] writes:

 On Fri, Apr 20, 2007 at 12:25:32PM +0200, Miklos Szeredi wrote:
 The following extra security measures are taken for unprivileged
 mounts:
 
  - usermounts are limited by a sysctl tunable
  - force nosuid,nodev mount options on the created mount

  The original userspace user= solution also implies the noexec
  option by default (you can override the default by exec option).
  
  It means the kernel based solution is not fully compatible ;-(

Why noexec?  Either it was a silly or arbitrary decision, or
our kernel design may be incomplete.

Now I can see not wanting to support executables if you are locking
down a system.  The classic don't execute a program from a CD just because
the CD was stuck in the drive problem.

So I can see how executing code from an untrusted source could prevent
exploitation of other problems, and we certainly don't want to do it
automatically.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/8] allow unprivileged umount

2007-04-22 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 Does this mean, that containers will need this?  Or that you don't
 know yet?

The uid namespace is something we have to handle carefully and we
have not decided on the final design.

What is clear is that all permission checks will need to become
either (uid namspace, uid) tuple comparisons.  Or struct user
pointer comparisons.  To see if we are talking about the same
uid.

So the eventual uid namespace combined with the possibility
for rlimits if we use struct user *.  See to make using a struct
user a clear win.

 storing a user struct on each mount point seems sane, plus it allows
 per user mount rlimits which is major plus.  Especially since we
 seem to be doing accounting only for user mounts a per user rlimit
 seems good.

 I'm not against per-user rlimits for mounts, but I'd rather do this
 later...

Then let's add a non-discriminate limit.  Instead of a limit that
applies only to root.

 To get the user we should be user fs_uid as HPA suggested.

 fsuid is exclusively used for checking file permissions, which we
 don't do here anymore.  So while it could be argued, that mount() _is_
 a filesystem operation, it is really a different sort of filesystem
 operation than the rest.

 OTOH it wouldn't hurt to use fsuid instead of ruid...

Yes.  I may be confused but I'm pretty certain we want either
the fsuid or the euid to be the mount owner.  ruid just looks wrong.
The fsuid is a special case of the effective uid.  Which is who
we should perform operations as.  Unless I'm just confused.

 Finally I'm pretty certain the capability we should care about in
 this context is CAP_SETUID.  Instead of CAP_SYS_ADMIN.
 
 If we have CAP_SETUID we can become which ever user owns the mount,
 and the root user in a container needs this, so he can run login
 programs.  So changing the appropriate super user checks from
 CAP_SYS_ADMIN to CAP_SETUID I think is the right thing todo.

 That's a flawed logic.  If you want to mount as a specific user, and
 you have CAP_SETUID, then just use set*uid() and then mount().

Totally agreed for mount.

 Changing the capability check for mount() would break the userspace
 ABI.

Sorry I apparently wasn't clear.  CAP_SETUID should be the capability
check for umount.

Hopefully my other more detail replies helped with this.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/8] add user mounts to the kernel

2007-04-22 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

  The MNT_USER flag is not copied on any kind of mount cloning:
  namespace creation, binding or propagation.
 
 I half agree, and as an initial approximation this works.
 Ultimately we should be at the point that for mount propagation
 that we copy the owner of the from the owner of our parent mount
 at the propagation destination.

 Yes, that sounds the most sane.

 Ram, what do you think?

  +  if (mnt-mnt_flags  MNT_USER)
  +  seq_printf(m, ,user=%i, mnt-mnt_uid);
 How about making the test if (mnt-mnt_user != root_user)

 We don't want to treat root_user special.  That's what capabilities
 were invented for.

For the print statement?  What ever it is minor.

  Index: linux/include/linux/fs.h
  ===
  --- linux.orig/include/linux/fs.h  2007-04-20 11:55:02.0 +0200
  +++ linux/include/linux/fs.h   2007-04-20 11:55:05.0 +0200
  @@ -123,6 +123,7 @@ extern int dir_notify_enable;
   #define MS_SLAVE  (119) /* change to slave */
   #define MS_SHARED (120) /* change to shared */
  #define MS_RELATIME (121) /* Update atime relative to mtime/ctime. */
  +#define MS_SETUSER(122) /* set mnt_uid to current user */
 
 If we unconditionally use the fsuid I think we can get away without
 this flag.

 That coudl work if we wouldn't have to worry about breaking the user
 interface.  As it is, we cannot be sure, that existing callers of
 mount(2) don't have fsuid set to some random value.

If we can get away without an extra flag it would really be
preferable.

In the container case we have an interesting and very common
scenario struct user *our_user != root_user. our_user-uid == 0.

I.e. The root in the what is the container but not the root
of the entire system.

So I want to minimize the changes needed to existing programs.
Now if all we have to do is specify MS_SETUSER when root a
user with CAP_SETUID is setting up a mount as a user other
then himself then I don't much care.  If we have to call MS_SETUSER
as unprivileged users I will have to modify mount binaries to work
differently inside and outside of containers.

Further there is only one or two versions of mount in widespread
use on linux, and unless you do something special fsuid == euid.
So the chance of fsuid set to some random value is pretty low.
So yes I think we can be 99.9% certain that existing callers of
mount(2) don't have fsuid set to some random value just by
inspecting the code of mount(1).

   #define MNT_SHRINKABLE0x100
  +#define MNT_USER  0x200
 
 If we assign a user to all mount points and root gets to own the
 initial set of mounts then we don't need the internal MNT_USER
 flag.

 I think we do want to treat owned mounts special, rather than
 treating user=0 mounts special.

I don't think we should treat any mount special and all mounts
should be owned.

  +
  +  uid_t mnt_uid;  /* owner of the mount */
 
 Can we please make this a user struct.   That requires a bit of
 reference counting but it has uid namespace benefits as well
 as making it easy to implement per user mount rlimits.

 OK, can you ellaborate, what the uid namespace benifits are?

In the uid namespace the comparison is simpler as are the propagations
rules.  Basically if you use a struct user you will never need to
care about a uid namespace.  If you don't we will have to tear through
this code another time.

Plus like I was mentioning earlier.  If we do have a struct user there
implementing per user mount rlimits becomes trivial.


Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/8] account user mounts

2007-04-22 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

  From: Miklos Szeredi [EMAIL PROTECTED]
 
  Add sysctl variables for accounting and limiting the number of user
  mounts.
 
  The maximum number of user mounts is set to 1024 by default.  This
  won't in itself enable user mounts, setting a mount to be owned by a
  user is first needed
 
 Since each mount has a user can we just make this a per user rlimit?
 
 If we are going to implement a sysctl at this point I think it should
 be a global limit that doesn't care if who you are.  Even root can
 have recursive mounts that attempt to get out of control.

 Recursive bind mounts are done carefully enough, so they don't get out
 of control.

 Recursive mount propagations can get out of control.  But root can
 shoot itself in the foot any number of ways, and it's not for the
 kernel to police that.

Yes.  It is.

This is mostly about removing special cases.

We routinely have limits on resources that are global and apply
to root along with every one else.  Root can change them but
they still apply to root.  Things like the number of inodes
in the system or the total number of files.

Since it is perfectly possible to do a per user rlimit at this stage
in the design.  I contend that either:
- We implement a per user rlimit of mounts.
- We implement a global limit on mounts.

No other case makes sense.  The previous objections were at least in
part because the limit only applied to user mounts but the name of
the limit did not apply to user mounts.


 Also currently you are not checking the max_users.  It looks like
 you do this in a later patch but still it is a little strange to
 allow user own mounts and have accounting but to not check the
 limit at this state.

 Yeah, but at this stage user mounts are not yet allowed, so this is
 safe.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/8] allow unprivileged umount

2007-04-21 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes:

 On Sat, 21 Apr 2007 10:09:42 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote:

   +static bool permit_umount(struct vfsmount *mnt, int flags)
   +{
  
   ...
  
   +return mnt-mnt_uid == current-uid;
   +}
  
  Yes, this seems very wrong.  I'd have thought that comparing user_struct*'s
  would get us a heck of a lot closer to being able to support aliasing of
  UIDs between different namespaces.
  
 
 OK, I'll fix this up.
 
 Actually an earlier version of this patch did use user_struct's but
 I'd changed it to uids, because it's simpler.

 OK..

  I didn't think about
 this being contrary to the id namespaces thing.

 Well I was madly assuming that when serarate UID namespaces are in use, UID
 42 in container A will have a different user_struct from UID 42 in
 container B.  I'd suggest that we provoke an opinion from Eric  co before
 you do work on this.

That is what I what I have been thinking as well, storing a user
struct on each mount point seems sane, plus it allows per user mount
rlimits which is major plus.  Especially since we seem to be doing
accounting only for user mounts a per user rlimit seems good.

To get the user we should be user fs_uid as HPA suggested.

Finally I'm pretty certain the capability we should care about in
this context is CAP_SETUID.  Instead of CAP_SYS_ADMIN.

If we have CAP_SETUID we can become which ever user owns the mount,
and the root user in a container needs this, so he can run login
programs.  So changing the appropriate super user checks from
CAP_SYS_ADMIN to CAP_SETUID I think is the right thing todo.

With the CAP_SETUID thing handled I'm not currently seeing any adverse
implications to using this in containers.

Ok.  Now that I have a reasonable approximation of the 10,000 foot
view now to see how the patches match up.

Eric


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/8] add user mounts to the kernel

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 Add ownership information to mounts.

 A new mount flag, MS_SETUSER is used to make a mount owned by a user.
 If this flag is specified, then the owner will be set to the current
 real user id and the mount will be marked with the MNT_USER flag.  On
 remount don't preserve previous owner, and treat MS_SETUSER as for a
 new mount.  The MS_SETUSER flag is ignored on mount move.

 The MNT_USER flag is not copied on any kind of mount cloning:
 namespace creation, binding or propagation.

I half agree, and as an initial approximation this works.
Ultimately we should be at the point that for mount propagation
that we copy the owner of the from the owner of our parent mount
at the propagation destination.

Mount propagation semantics are a major pain.

 For bind mounts the
 cloned mount(s) are set to MNT_USER depending on the MS_SETUSER mount
 flag.  In all the other cases MNT_USER is always cleared.

 For MNT_USER mounts a user=UID option is added to /proc/PID/mounts.
 This is compatible with how mount ownership is stored in /etc/mtab.

Ok.  While I generally agree with the concept this can be simplified some
more.

Eric


 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---

 Index: linux/fs/namespace.c
 ===
 --- linux.orig/fs/namespace.c 2007-04-20 11:55:02.0 +0200
 +++ linux/fs/namespace.c  2007-04-20 11:55:05.0 +0200
 @@ -227,6 +227,13 @@ static struct vfsmount *skip_mnt_tree(st
   return p;
  }
  
 +static void set_mnt_user(struct vfsmount *mnt)
 +{
 + BUG_ON(mnt-mnt_flags  MNT_USER);
 + mnt-mnt_uid = current-uid;

This should be based on fsuid.  Unless I'm completely confused.

I think getting the mount user from current when we do mount
propagation could be a problem.  In particular I'm pretty
certain in the mount propagation case the mount should be owned
by the user who owns the destination mount that is above us.

 + mnt-mnt_flags |= MNT_USER;
 +}
 +
  static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
   int flag)
  {
 @@ -241,6 +248,11 @@ static struct vfsmount *clone_mnt(struct
   mnt-mnt_mountpoint = mnt-mnt_root;
   mnt-mnt_parent = mnt;
  
 + /* don't copy the MNT_USER flag */
 + mnt-mnt_flags = ~MNT_USER;
 + if (flag  CL_SETUSER)
 + set_mnt_user(mnt);
 +
   if (flag  CL_SLAVE) {
   list_add(mnt-mnt_slave, old-mnt_slave_list);
   mnt-mnt_master = old;
 @@ -403,6 +415,8 @@ static int show_vfsmnt(struct seq_file *
   if (mnt-mnt_flags  fs_infop-flag)
   seq_puts(m, fs_infop-str);
   }
 + if (mnt-mnt_flags  MNT_USER)
 + seq_printf(m, ,user=%i, mnt-mnt_uid);
How about making the test if (mnt-mnt_user != root_user)

   if (mnt-mnt_sb-s_op-show_options)
   err = mnt-mnt_sb-s_op-show_options(m, mnt);
   seq_puts(m,  0 0\n);
 @@ -920,8 +934,9 @@ static int do_change_type(struct nameida
  /*
   * do loopback mount.
   */
 -static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
 +static int do_loopback(struct nameidata *nd, char *old_name, int flags)
  {
 + int clone_flags;
   struct nameidata old_nd;
   struct vfsmount *mnt = NULL;
   int err = mount_is_safe(nd);
 @@ -941,11 +956,12 @@ static int do_loopback(struct nameidata 
   if (!check_mnt(nd-mnt) || !check_mnt(old_nd.mnt))
   goto out;
  
 + clone_flags = (flags  MS_SETUSER) ? CL_SETUSER : 0;
   err = -ENOMEM;
 - if (recurse)
 - mnt = copy_tree(old_nd.mnt, old_nd.dentry, 0);
 + if (flags  MS_REC)
 + mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags);
   else
 - mnt = clone_mnt(old_nd.mnt, old_nd.dentry, 0);
 + mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags);
  
   if (!mnt)
   goto out;
 @@ -987,8 +1003,11 @@ static int do_remount(struct nameidata *
  
   down_write(sb-s_umount);
   err = do_remount_sb(sb, flags, data, 0);
 - if (!err)
 + if (!err) {
   nd-mnt-mnt_flags = mnt_flags;
 + if (flags  MS_SETUSER)
 + set_mnt_user(nd-mnt);
 + }
   up_write(sb-s_umount);
   if (!err)
   security_sb_post_remount(nd-mnt, flags, data);
 @@ -1093,10 +1112,13 @@ static int do_new_mount(struct nameidata
   if (!capable(CAP_SYS_ADMIN))
   return -EPERM;
  
 - mnt = do_kern_mount(type, flags, name, data);
 + mnt = do_kern_mount(type, flags  ~MS_SETUSER, name, data);
   if (IS_ERR(mnt))
   return PTR_ERR(mnt);
  
 + if (flags  MS_SETUSER)
 + set_mnt_user(mnt);
 +
   return do_add_mount(mnt, nd, mnt_flags, NULL);
  }
  
 @@ -1127,7 

Re: [patch 2/8] allow unprivileged umount

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 The owner doesn't need sysadmin capabilities to call umount().

 Similar behavior as umount(8) on mounts having user=UID option in
 /etc/mtab.  The difference is that umount also checks /etc/fstab,
 presumably to exclude another mount on the same mountpoint.

 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---

 Index: linux/fs/namespace.c
 ===
 --- linux.orig/fs/namespace.c 2007-04-20 11:55:05.0 +0200
 +++ linux/fs/namespace.c  2007-04-20 11:55:06.0 +0200
 @@ -659,6 +659,25 @@ static int do_umount(struct vfsmount *mn
  }
  
  /*
 + * umount is permitted for
 + *  - sysadmin
 + *  - mount owner, if not forced umount
 + */
 +static bool permit_umount(struct vfsmount *mnt, int flags)
 +{
 + if (capable(CAP_SYS_ADMIN))
 + return true;
 +
 + if (!(mnt-mnt_flags  MNT_USER))
 + return false;
 +
 + if (flags  MNT_FORCE)
 + return false;
 +
 + return mnt-mnt_uid == current-uid;
 +}

I think this should be:

static bool permit_umount(struct vfsmount *mnt, int flags)
{
if ((mnt-mnt_uid != current-fsuid)  !capable(CAP_SETUID))
return false;

if ((flags  MNT_FORCE)  !capable(CAP_SYS_ADMIN))
return false;

return true;
}

I.e. 
  MNT_USER gone.
  compare against fsuid.
  Only require setuid for unmounts unless force is specified.

  I suspect we can allow MNT_FORCE for non-privileged users 
  as well if we can trust the filesystem.

 +/*
   * Now umount can handle mount points as well as block devices.
   * This is important for filesystems which use unnamed block devices.
   *
 @@ -681,7 +700,7 @@ asmlinkage long sys_umount(char __user *
   goto dput_and_out;
  
   retval = -EPERM;
 - if (!capable(CAP_SYS_ADMIN))
 + if (!permit_umount(nd.mnt, flags))
   goto dput_and_out;
  
   retval = do_umount(nd.mnt, flags);

 --
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/8] account user mounts

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 Add sysctl variables for accounting and limiting the number of user
 mounts.

 The maximum number of user mounts is set to 1024 by default.  This
 won't in itself enable user mounts, setting a mount to be owned by a
 user is first needed

Since each mount has a user can we just make this a per user rlimit?

If we are going to implement a sysctl at this point I think it should
be a global limit that doesn't care if who you are.  Even root can
have recursive mounts that attempt to get out of control.

Also currently you are not checking the max_users.  It looks like
you do this in a later patch but still it is a little strange to
allow user own mounts and have accounting but to not check the
limit at this state.

Eric


 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---

 Index: linux/include/linux/sysctl.h
 ===
 --- linux.orig/include/linux/sysctl.h 2007-04-20 11:55:02.0 +0200
 +++ linux/include/linux/sysctl.h  2007-04-20 11:55:07.0 +0200
 @@ -818,6 +818,8 @@ enum
   FS_AIO_NR=18,   /* current system-wide number of aio requests */
   FS_AIO_MAX_NR=19, /* system-wide maximum number of aio requests */
   FS_INOTIFY=20,  /* inotify submenu */
 + FS_NR_USER_MOUNTS=21,   /* int:current number of user mounts */
 + FS_MAX_USER_MOUNTS=22,  /* int:maximum number of user mounts */
   FS_OCFS2=988,   /* ocfs2 */
  };
  
 Index: linux/kernel/sysctl.c
 ===
 --- linux.orig/kernel/sysctl.c2007-04-20 11:55:02.0 +0200
 +++ linux/kernel/sysctl.c 2007-04-20 11:55:07.0 +0200
 @@ -1063,6 +1063,22 @@ static ctl_table fs_table[] = {
  #endif   
  #endif
   {
 + .ctl_name   = FS_NR_USER_MOUNTS,
 + .procname   = nr_user_mounts,
 + .data   = nr_user_mounts,
 + .maxlen = sizeof(int),
 + .mode   = 0444,
 + .proc_handler   = proc_dointvec,
 + },
 + {
 + .ctl_name   = FS_MAX_USER_MOUNTS,
 + .procname   = max_user_mounts,
 + .data   = max_user_mounts,
 + .maxlen = sizeof(int),
 + .mode   = 0644,
 + .proc_handler   = proc_dointvec,
 + },
 + {
   .ctl_name   = KERN_SETUID_DUMPABLE,
   .procname   = suid_dumpable,
   .data   = suid_dumpable,
 Index: linux/Documentation/filesystems/proc.txt
 ===
 --- linux.orig/Documentation/filesystems/proc.txt 2007-04-20 
 11:55:02.0
 +0200
 +++ linux/Documentation/filesystems/proc.txt 2007-04-20 11:55:07.0 
 +0200
 @@ -923,6 +923,15 @@ reaches aio-max-nr then io_setup will fa
  raising aio-max-nr does not result in the pre-allocation or re-sizing
  of any kernel data structures.
  
 +nr_user_mounts and max_user_mounts
 +--
 +
 +These represent the number of user mounts and the maximum number of
 +user mounts respectively.  User mounts may be created by
 +unprivileged users.  User mounts may also be created with sysadmin
 +privileges on behalf of a user, in which case nr_user_mounts may
 +exceed max_user_mounts.
 +
  2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
  ---
  
 Index: linux/fs/namespace.c
 ===
 --- linux.orig/fs/namespace.c 2007-04-20 11:55:06.0 +0200
 +++ linux/fs/namespace.c  2007-04-20 11:55:07.0 +0200
 @@ -39,6 +39,9 @@ static int hash_mask __read_mostly, hash
  static struct kmem_cache *mnt_cache __read_mostly;
  static struct rw_semaphore namespace_sem;
  
 +int nr_user_mounts;
 +int max_user_mounts = 1024;
 +
  /* /sys/fs */
  decl_subsys(fs, NULL, NULL);
  EXPORT_SYMBOL_GPL(fs_subsys);
 @@ -227,11 +230,30 @@ static struct vfsmount *skip_mnt_tree(st
   return p;
  }
  
 +static void dec_nr_user_mounts(void)
 +{
 + spin_lock(vfsmount_lock);
 + nr_user_mounts--;
 + spin_unlock(vfsmount_lock);
 +}
 +
  static void set_mnt_user(struct vfsmount *mnt)
  {
   BUG_ON(mnt-mnt_flags  MNT_USER);
   mnt-mnt_uid = current-uid;
   mnt-mnt_flags |= MNT_USER;
 + spin_lock(vfsmount_lock);
 + nr_user_mounts++;
 + spin_unlock(vfsmount_lock);
 +}
 +
 +static void clear_mnt_user(struct vfsmount *mnt)
 +{
 + if (mnt-mnt_flags  MNT_USER) {
 + mnt-mnt_uid = 0;
 + mnt-mnt_flags = ~MNT_USER;
 + dec_nr_user_mounts();
 + }
  }
  
  static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 @@ -283,6 +305,7 @@ static inline void __mntput(struct vfsmo
  {
   struct super_block *sb = 

Re: [patch 4/8] propagate error values from clone_mnt

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 Allow clone_mnt() to return errors other than ENOMEM.  This will be
 used for returning a different error value when the number of user
 mounts goes over the limit.

 Fix copy_tree() to return EPERM for unbindable mounts.

 Don't propagate further from dup_mnt_ns() as that copy_tree() can only
 fail with -ENOMEM.

At a quick skim this looks ok.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/8] allow unprivileged bind mounts

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 Allow bind mounts to unprivileged users if the following conditions
 are met:

   - mountpoint is not a symlink or special file

Why?  This sounds like a left over from when we were checking permissions.

   - parent mount is owned by the user
   - the number of user mounts is below the maximum

 Unprivileged mounts imply MS_SETUSER, and will also have the nosuid
 and nodev mount flags set.

So in principle I agree, but in detail I disagree.

capable(CAP_SETUID) should be required to leave MNT_NOSUID clear.
capable(CAP_MKNOD) should be required to leave MNT_NODEV clear.

I.e.  We should not special case this as a user mount but rather
simply check to see if the user performing the mount has the appropriate
capabilities to allow the flags.

How we properly propagate MNT_NOSUID and MNT_NODEV in the context of a
user id namespace is still a puzzle to me.  Because to the user capability
should theoretically at least be namespace local.  However until we
get to the user id namespace we don't have that problem.

Eric

 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---

 Index: linux/fs/namespace.c
 ===
 --- linux.orig/fs/namespace.c 2007-04-20 11:55:09.0 +0200
 +++ linux/fs/namespace.c  2007-04-20 11:55:10.0 +0200
 @@ -237,11 +237,30 @@ static void dec_nr_user_mounts(void)
   spin_unlock(vfsmount_lock);
  }
  
 -static void set_mnt_user(struct vfsmount *mnt)
 +static int reserve_user_mount(void)
 +{
 + int err = 0;
 + spin_lock(vfsmount_lock);
 + if (nr_user_mounts = max_user_mounts  !capable(CAP_SYS_ADMIN))
 + err = -EPERM;
 + else
 + nr_user_mounts++;
 + spin_unlock(vfsmount_lock);
 + return err;
 +}
 +
 +static void __set_mnt_user(struct vfsmount *mnt)
  {
   BUG_ON(mnt-mnt_flags  MNT_USER);
   mnt-mnt_uid = current-uid;
   mnt-mnt_flags |= MNT_USER;
 + if (!capable(CAP_SYS_ADMIN))
 + mnt-mnt_flags |= MNT_NOSUID | MNT_NODEV;
 +}
 +
 +static void set_mnt_user(struct vfsmount *mnt)
 +{
 + __set_mnt_user(mnt);
   spin_lock(vfsmount_lock);
   nr_user_mounts++;
   spin_unlock(vfsmount_lock);
 @@ -260,9 +279,16 @@ static struct vfsmount *clone_mnt(struct
   int flag)
  {
   struct super_block *sb = old-mnt_sb;
 - struct vfsmount *mnt = alloc_vfsmnt(old-mnt_devname);
 + struct vfsmount *mnt;
 +
 + if (flag  CL_SETUSER) {
 + int err = reserve_user_mount();
 + if (err)
 + return ERR_PTR(err);
 + }
 + mnt = alloc_vfsmnt(old-mnt_devname);
   if (!mnt)
 - return ERR_PTR(-ENOMEM);
 + goto alloc_failed;
  
   mnt-mnt_flags = old-mnt_flags;
   atomic_inc(sb-s_active);
 @@ -274,7 +300,7 @@ static struct vfsmount *clone_mnt(struct
   /* don't copy the MNT_USER flag */
   mnt-mnt_flags = ~MNT_USER;
   if (flag  CL_SETUSER)
 - set_mnt_user(mnt);
 + __set_mnt_user(mnt);
  
   if (flag  CL_SLAVE) {
   list_add(mnt-mnt_slave, old-mnt_slave_list);
 @@ -299,6 +325,11 @@ static struct vfsmount *clone_mnt(struct
   spin_unlock(vfsmount_lock);
   }
   return mnt;
 +
 + alloc_failed:
 + if (flag  CL_SETUSER)
 + dec_nr_user_mounts();
 + return ERR_PTR(-ENOMEM);
  }
  
  static inline void __mntput(struct vfsmount *mnt)
 @@ -745,22 +776,29 @@ asmlinkage long sys_oldumount(char __use
  
  #endif
  
 -static int mount_is_safe(struct nameidata *nd)
 +/*
 + * Conditions for unprivileged mounts are:
 + * - mountpoint is not a symlink or special file
 + * - mountpoint is in a mount owned by the user
 + */
 +static bool permit_mount(struct nameidata *nd, int *flags)
  {
 + struct inode *inode = nd-dentry-d_inode;
 +
   if (capable(CAP_SYS_ADMIN))
 - return 0;
 - return -EPERM;
 -#ifdef notyet
 - if (S_ISLNK(nd-dentry-d_inode-i_mode))
 - return -EPERM;
 - if (nd-dentry-d_inode-i_mode  S_ISVTX) {
 - if (current-uid != nd-dentry-d_inode-i_uid)
 - return -EPERM;
 - }
 - if (vfs_permission(nd, MAY_WRITE))
 - return -EPERM;
 - return 0;
 -#endif
 + return true;
 +
 + if (!S_ISDIR(inode-i_mode)  !S_ISREG(inode-i_mode))
 + return false;
 +
 + if (!(nd-mnt-mnt_flags  MNT_USER))
 + return false;
 +
 + if (nd-mnt-mnt_uid != current-uid)
 + return false;
 +
 + *flags |= MS_SETUSER;
 + return true;
  }

Can't this just be:
static bool permit_mount(struct nameidata *nd, uid_t *mnt_uid)
{
*mnt_uid = current-fsuid;

if ((nd-mnt-mnt_uid != current-fsuid)  !capable(CAP_SETUID))
return false;

return true;
}

  
  static int lives_below_in_same_fs(struct dentry *d, 

Re: [patch 7/8] allow unprivileged mounts

2007-04-21 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes:

 On Fri, 20 Apr 2007 12:25:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote:

 Define a new fs flag FS_SAFE, which denotes, that unprivileged
 mounting of this filesystem may not constitute a security problem.
 
 Since most filesystems haven't been designed with unprivileged
 mounting in mind, a thorough audit is needed before setting this flag.

 Practically speaking, is there any realistic likelihood that any filesystem
 apart from FUSE will ever use this?

Also potentially some of the kernel virtual filesystems.  /proc should
be safe already.  If you don't have any kind of backing store this problem
gets easier.

With unprivileged users allowed to create mounts the utility of kernel
functionality exported as filesystems goes up quite a bit.  We are not
plan9 but this is the last bottleneck in allowing the everything is
a filesystem paradigm from being really usable in linux.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 8/8] allow unprivileged fuse mounts

2007-04-21 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 Use FS_SAFE for fuse fs type, but not for fuseblk.

 FUSE was designed from the beginning to be safe for unprivileged
 users.  This has also been verified in practice over many years.  In
 addition unprivileged mounts require the parent mount to be owned by
 the user, which is more strict than the current userspace policy.

 This will enable future installations to remove the suid-root
 fusermount utility.

 Don't require the user_id= and group_id= options for unprivileged
 mounts, but if they are present, verify them for sanity.

 Disallow the allow_other option for unprivileged mounts.

 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---

 Index: linux/fs/fuse/inode.c
 ===
 --- linux.orig/fs/fuse/inode.c2007-04-20 11:55:01.0 +0200
 +++ linux/fs/fuse/inode.c 2007-04-20 11:55:14.0 +0200
 @@ -311,6 +311,19 @@ static int parse_fuse_opt(char *opt, str
   d-max_read = ~0;
   d-blksize = 512;
  
 + /*
 +  * For unprivileged mounts use current uid/gid.  Still allow
 +  * user_id and group_id options for compatibility, but
 +  * only if they match these values.
 +  */
 + if (!capable(CAP_SYS_ADMIN)) {
 + d-user_id = current-uid;
 + d-user_id_present = 1;
 + d-group_id = current-gid;
 + d-group_id_present = 1;
 +
 + }

CAP_SETUID is the appropriate capability...

This is not a dimension we have not fully explored.
What is the problem with a user controlled mount having different
uid and gid values.

Yes they map into different users but how is this a problem.
The only problem that I can recall is the historic chown problem
where you could give files to other users and mess up their quotas.

Or is the problem other users writing to this user controlled
filesystem?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 7/8] allow unprivileged mounts

2007-04-21 Thread Eric W. Biederman
Jan Engelhardt [EMAIL PROTECTED] writes:

 On Apr 21 2007 08:10, Eric W. Biederman wrote:

 Define a new fs flag FS_SAFE, which denotes, that unprivileged
 mounting of this filesystem may not constitute a security problem.
 
 Since most filesystems haven't been designed with unprivileged
 mounting in mind, a thorough audit is needed before setting this flag.

 Practically speaking, is there any realistic likelihood that any filesystem
 apart from FUSE will ever use this?

Also potentially some of the kernel virtual filesystems.  /proc should
be safe already.  If you don't have any kind of backing store this problem
gets easier.

 tmpfs!

tmpfs is a possible problem because it can consume lots of ram/swap.  Which
is why it has limits on the amount of space it can consume.  Those are set as
mount options as I recall.  Which means that we would need to do something
different with respect to limits before tmpfs could become safe for
an untrusted user to mount.

Still it's close.


Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 7/8] allow unprivileged mounts

2007-04-21 Thread Eric W. Biederman
Andi Kleen [EMAIL PROTECTED] writes:

 Andrew Morton [EMAIL PROTECTED] writes:

 On Fri, 20 Apr 2007 12:25:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
  Define a new fs flag FS_SAFE, which denotes, that unprivileged
  mounting of this filesystem may not constitute a security problem.
  
  Since most filesystems haven't been designed with unprivileged
  mounting in mind, a thorough audit is needed before setting this flag.
 
 Practically speaking, is there any realistic likelihood that any filesystem
 apart from FUSE will ever use this?

 If it worked for mount --bind for any fs I could see uses of this.  I haven't
 thought
 through the security implications though, so it might not work.

Binding a directory that you have access to in other was is essentially
the same thing as a symlink.  So there are no real security implications
there.  The only problem case I can think of is removal media that you
want to remove but someone has made a bind mount to.  But that is
essentially the same case as opening a file so there are no new
real issues.  Although our diagnostic tools will likely fall behind
for a bit.

We handle the security implications by assigning an owner to all mounts
and only allowing you to add additional mounts on top of a mount you
already own.

If you have the right capabilities you can create a mount owned by
another user.

For a new mount if you don't have the appropriate capabilities nodev
and nosuid will be forced.

Initial super block creation is a lot more delicate so we need the
FS_SAFE flag, to know that the kernel is prepared to deal with the
crazy things that a hostile user space is prepared to do.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/8] mount ownership and unprivileged mount syscall (v4)

2007-04-20 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Miklos Szeredi ([EMAIL PROTECTED]):
 This patchset has now been bared to the lowest common denominator
 that everybody can agree on.  Or at least there weren't any objections
 to this proposal.
 
 Andrew, please consider it for -mm.
 
 Thanks,
 Miklos
 
 
 v3 - v4:
 
  - simplify interface as much as possible, now only a single option
(user=UID) is used to control everything
  - no longer allow/deny mounting based on file/directory permissions,
that approach does not always make sense
 
 
 This patchset adds support for keeping mount ownership information in
 the kernel, and allow unprivileged mount(2) and umount(2) in certain
 cases.
 
 The mount owner has the following privileges:
 
   - unmount the owned mount
   - create a submount under the owned mount
 
 The sysadmin can set the owner explicitly on mount and remount.  When
 an unprivileged user creates a mount, then the owner is automatically
 set to the user.
 
 The following use cases are envisioned:
 
 1) Private namespace, with selected mounts owned by user.
E.g. /home/$USER is a good candidate for allowing unpriv mounts and
unmounts within.
 
 2) Private namespace, with all mounts owned by user and having the
nosuid flag.  User can mount and umount anywhere within the
namespace, but suid programs will not work.
 
 3) Global namespace, with a designated directory, which is a mount
owned by the user.  E.g. /mnt/users/$USER is set up so that it is
bind mounted onto itself, and set to be owned by $USER.  The user
can add/remove mounts only under this directory.
 
 The following extra security measures are taken for unprivileged
 mounts:
 
  - usermounts are limited by a sysctl tunable
  - force nosuid,nodev mount options on the created mount

 Very nice.  I like these semantics.

 I'll try to rework my laptop in the next few days to use this patchset
 as a test.

Agreed.  It appears the approach of adding owner ship information to
mount points and using that to control what may happen with them
in regards to mount/unmount is the only workable approach in the
unix environment.

Now to dig into the details and ensure that they are correct.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-18 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

  I've tried to make this unprivileged mount thing as simple as
  possible, and no simpler.  If we can make it even simpler, all the
  better.
 
 We are certainly much more complex then the code in plan9 (just
 read through it) so I think we have room for improvement.
 
 Just for reference what I saw in plan 9 was:
 - No super user checks in it's mount, unmount, or namespace creation paths.
 - A flag to deny new mounts but not new bind mounts (for administrative
 purposes
   the comment said).
 
 Our differences from plan9.
 - suid capable binaries. (SUID please go away).
 - A history of programs assuming only root could call mount/unmount.

 I hate suid as well.  _The_ motivation behind this patchset was to get
 rid of fusermount, a suid mount helper for fuse.

 But I don't think suid is going away, and definitely not overnight.

Agreed, unless it just happens that killing is equally as hard as
doing non-privileged mounts.   Which I really don't think will be
the case.  suid executables can always be replaced by a non-privileged
part that connect to a daemon on localhost.

 Also I don't think we want to require auditing userspace before
 enabling user mounts.

Given the complexity of the code to avoids audits adds, and especially
the uncertainty of how all of the pieces add up together I really
think we do need to audit user space (but only in a limited way).

This is a paradigm shift in how mounts are managed and to get
the full benefit of it we need to be prepared to deal with the
paradigm shift.

Essentially what the audit must ensure is that 
(a) non-root users are always run in a non-default namespace so
mounts and unmounts they generate will not have global effect.
(b) If we setup something like the proposed /share that administrative
tools know how to treat it properly.

That is not an especially hard audit of user space and it noticeably
reduces the complexity of what we must implement.

 If I understand correctly, your proposal is to get rid of MNT_USER and
 MNT_ALLOWUSERMNT and allow/deny unprivileged mounts and umounts based
 on a boolean sysctl flag and on a check if the target namespace is the
 initial namespace or not.  

No.  I really do not want to treat the initial namespace in a special way
from an implementation point of view.

From a distro point of view I don't think we should ever allow a user
into the initial mount namespace, and that is what we should audit.
All of the ways a user can login to a machine.

We need to audit the login paths anyway if we are going to do anything
interesting with the mount namespace.  So I see this as no real
additional overhead to make things usable.

 And maybe add some extra checks which
 prevent ugliness from happening with suid programs.  Is this correct?

Definitely add some extra permission checks to prevent ugliness from
happening with suid executables.  In the general case the problem
with suid executables is that they expect parts of the mount namespace
that a user cannot modify not to be modified.

Ensuring that our permission checks keep that promise for mount
and umount is going to be a bit challenging, because we need to
think through a lot of scenarios.  But it is not that hard.

 If so, how are we going to make sure this won't break existing
 userspace without doing a full audit of all suid programs in every
 distro that wants this feature?

The permission checks to ensure you can not modify a filesystem with
mounts in a way that you could not modify it by creating or deleting
files should be enough.

That probably means that we are restricted to mounting filesystems
with the equivalent of uid=$(id -u) gid=$(id -g).  That is if you
aren't root all files in the filesystem you cause to be mounted
must be owned by you.  That way you have permission to unmount or
delete them.

At the very least the user who causes a mount to be created should
be the owner and have complete control over the directory mount point.
So that they can at least theoretically remove all of the files and
directories at the mount point (which would be equivalent to
unmounting it).

 Also how are we going to prevent the user from creating millions of
 mounts, and using up all the kernel memory for vfsmounts?

I definitely agree that we need some kind of accounting.  I'm don't
think we can do this per user (which would be nice) and I don't
believe your code does attempts to limit mounts by user either.
A simple global limit on the number of mounts should be sufficient.
If necessary we can allow the limit to be ignored if you have
the appropriate capability.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:
 
 Why are directory permissions not sufficient to allow/deny non-priveleged
 mounts?
 I don't understand that contention yet.

 The same scenarios laid out previously in this thread.  I.e.

 1. user hallyn does mount --bind / /home/hallyn/root
 2. (...)
 3. admin does deluser hallyn

 and deluser starts wiping out root

 Or,

 1. user hallyn does mount --bind / /home/hallyn/root
 2. backup daemon starts backing up /home/hallyn/root/home/hallyn/root/home...

 So we started down the path of forcing users to clone a new namespace
 before doing user mounts, which is what the clone flag was about.  Using
 per-mount flags also suffices as you had pointed out, which is being
 done here.  But directory permissions are inadequate.

Interesting

So far even today these things can happen, however they are sufficiently
unlikely the tools don't account for them.

Once a hostile user can cause them things are more of a problem.

 (Unless you want to tackle each problem legacy tool one at a time to
 remove problems - i.e. deluser should umount everything under
 /home/hallyn before deleting, backup should be spawned from it's own
 namespace cloned right after boot or just back up on one filesystem,
 etc.)

I don't see a way that backup and deluser won't need to be modified
to work properly in a system where non-priveleged mounts are allowed,
at least they will need to account for /share.

That said it is clearly a hazard if we enable this functionality by
default.

If we setup a pam module that triggers on login and perhaps when
cron and at jobs run to setup an additional mount namespace I think
keeping applications locked away in their own mount namespace is
sufficient to avoid hostile users from doing unexpected things to
the initial mount namespace.  So unless I am mistake it should be
relatively simple to prevent user space from encountering problems.

That still leaves the question of how we handle systems with an old
user space that is insufficiently robust to deal with mounts occurring
at unexpected locations.


  I think a simple sysctl to enable/disable of non-priveleged mounts 
  defaulting to disabled is enough.

Am I correct or will it be more difficult than just a little pam
module to ensure non-trusted users never run in the initial mount
namespace?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 I'm still not sure, what your problem is.

My problem right now is that I see a serious complexity escalation in
the user interface that we must support indefinitely.

I see us taking a nice powerful concept and seriously watering it down.
To some extent we have to avoid confusing suid applications.  (I would
so love to remove the SUID bit...).

I'm being contrary to ensure we have a good code review.

I have heard it said that there are two kinds of design.  Something
so simple it obviously has no deficiencies.  Something so complex it has
no obvious deficiencies.  I am very much afraid we are slipping the
mount namespace into the latter category of work.  Which is a bad
bad thing for core OS feature.

 With the v3 of the usermounts patchset, by default, user mounts are
 disabled, because the allow unpriv submounts flag is cleared on all
 mounts.

 There are several options available to sysadmins and distro builders
 to enable user mounts in a secure way:

   - pam module, which creates a private namespace, and sets allow
 unpriv submounts on the mounts within the namespace

   - pam module, which rbinds / onto /mnt/ns/$USER, and chroots into
 /mnt/ns/$USER, then sets the allow unpriv submounts on the
 mounts under /mnt/ns/$USER.

In part this really disturbs me because we now have two mechanisms for
controlling the scope of what a user can do.

A flag or a new namespace.  Two mechanisms to accomplish the same
thing sound wrong, and hard to manage.

   - sysadmin creates /mnt/usermounts writable to all users, with
 sticky bit (same as /tmp), does mount --bind /mnt/usermounts
 /mnt/usermounts and sets the allow unpriv submounts on
 /mnt/usermounts.

 All of these are perfectly safe wrt userdel and backup (assuming it
 doesn't try back up /mnt).

I also don't understand at all the user= mount flag and options.
All it seemed to be used for was adding permissions to unmount.  In user
space to deal with the lack of any form of untrusted mounts I can understand
this.  In kernel space this seems to be more of a problem.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-17 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

  I'm still not sure, what your problem is.
 
 My problem right now is that I see a serious complexity escalation in
 the user interface that we must support indefinitely.
 
 I see us taking a nice powerful concept and seriously watering it down.
 To some extent we have to avoid confusing suid applications.  (I would
 so love to remove the SUID bit...).
 
 I'm being contrary to ensure we have a good code review.

 OK.  And it's very much appreciated :)

 I have heard it said that there are two kinds of design.  Something
 so simple it obviously has no deficiencies.  Something so complex it has
 no obvious deficiencies.  I am very much afraid we are slipping the
 mount namespace into the latter category of work.  Which is a bad
 bad thing for core OS feature.

 I've tried to make this unprivileged mount thing as simple as
 possible, and no simpler.  If we can make it even simpler, all the
 better.

We are certainly much more complex then the code in plan9 (just
read through it) so I think we have room for improvement.

Just for reference what I saw in plan 9 was:
- No super user checks in it's mount, unmount, or namespace creation paths.
- A flag to deny new mounts but not new bind mounts (for administrative purposes
  the comment said).

Our differences from plan9.
- suid capable binaries. (SUID please go away).
- A history of programs assuming only root could call mount/unmount.

 In part this really disturbs me because we now have two mechanisms for
 controlling the scope of what a user can do.

 You mean rbind+chroot and clone(CLONE_NS)?  Yes, those are two
 different mechanisms achieving very similar results.  But what has
 this to do with unprivileged mounts?

The practical question is how do we limit what a user can mount and
unmount.


I would contend that at first glance stuffing a user in their own
mount namespace is sufficient, on a system with utilities aware
of the consequences of mount/unmount.

So we may not need a unprivileged mount disable except as a way
to allow an old user space to run a new kernel.

 A flag or a new namespace.  Two mechanisms to accomplish the same
 thing sound wrong, and hard to manage.

 The flag permitting the unprivileged mounts (which we now agreed to
 name allowusermnt) is used in both cases.

 Just creating a new namespace doesn't always imply that you want to
 allow user mounts inside, does it?  These are orthogonal features.

After user space has been updated we always want to allow unprivileged
mounts. 

If I get pushed I will say that we need to remove suid exec capability
from user space as well.  At which point we don't even need directory
security checks, there is enough benefit there I certainly think
it is worth considering having an entire NOSUID user space.

Removing suid is probably excessive but if it isn't much harder
then sane mount namespace support we should probably consider it.


- sysadmin creates /mnt/usermounts writable to all users, with
  sticky bit (same as /tmp), does mount --bind /mnt/usermounts
  /mnt/usermounts and sets the allow unpriv submounts on
  /mnt/usermounts.
 
  All of these are perfectly safe wrt userdel and backup (assuming it
  doesn't try back up /mnt).
 
 I also don't understand at all the user= mount flag and options.

 The user=UID or (or MNT_USER flag) serves multiple purposes:

   - help mount(8) move away from /etc/mtab
   - allow unprivileged umounts
   - account user mounts

 All it seemed to be used for was adding permissions to unmount.  In user
 space to deal with the lack of any form of untrusted mounts I can understand
 this.  In kernel space this seems to be more of a problem.

 Why is handling unprivileged mounts in kernel different from handling
 them in userspace in this respect?

Ok.  I just looked at what user space is doing.  The difference is that
what user space is doing predates mount namespaces, and was there as
far as I can tell to keep one user from causing problems for
another user.   If we choose to make mount namespaces to be
the unit of granularity we don't need this capability.

All we have to do is deny unmounts that would confuse a suid
executable.  Which mounts are those?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/8] unprivileged mount syscall

2007-04-16 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 Arn't there ways to escape chroot jails? Serge had pointed me to a URL
 which showed chroots can be escaped. And if that is true than having all
 user's private mount tree in the same namespace can be a security issue?

 No.  In fact chrooting the user into /share/$USER will actually
 _grant_ a privilege to the user, instead of taking it away.  It allows
 the user to modify it's root namespace, which it wouldn't be able to
 in the initial namespace.

 So even if the user could escape from the chroot (which I doubt), s/he
 would not be able to do any harm, since unprivileged mounting would be
 restricted to /share.  Also /share/$USER should only have read/search
 permission for $USER or no permissions at all, which would mean, that
 other users' namespaces would be safe from tampering as well.

A couple of points.
- chroot can be escaped, it is just a chdir for the root directory it
  is not a security feature.  The only security is that you have to be root to 
call chdir.
  A carefully done namespace setup won't have that issue.

- While it may not violate security as far as what a user is allowed to modify 
it may
  violate security as far as what a user is allowed to see.

There are interesting per login cases as well such as allowing a user to 
replicate
their mount tree from another machine when they log in.  When /home is on a 
network
filesystem this can be very practical and can allow propagation of mounts across
machines not just across a single login session.

Eric




-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 That depends.  Current patches check the unprivileged submounts
 allowed under this mount flag only on the requested mount and not on
 the propagated mounts.  Do you see a problem with this?

I think privileges of this sort should propagate.  If I read what you
just said correctly if I have a private mount namespace I won't be able
to mount anything unless when it was setup the unprivileged submount
command was explicitly set.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-16 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

  That depends.  Current patches check the unprivileged submounts
  allowed under this mount flag only on the requested mount and not on
  the propagated mounts.  Do you see a problem with this?
 
 I think privileges of this sort should propagate.  If I read what you
 just said correctly if I have a private mount namespace I won't be able
 to mount anything unless when it was setup the unprivileged submount
 command was explicitly set.

 By design yes.  Why is that a problem?

It certainly doesn't match my intuition.

Why are directory permissions not sufficient to allow/deny non-priveleged 
mounts?
I don't understand that contention yet.

I should probably go back and look and see how plan9 handles mount/unmount
permissions.  Plan9 gets away with a lot more because it doesn't have
a suid bit and mount namespaces were always present, so they don't have
backwards compatibility problems.

My best guess at the moment is that plan9 treated mount/unmount as
completely unprivileged and used the mount namespaces to limit the
scope of what would be affected by a mount/unmount operation.  I think
that may be reasonable in linux as well but it will require the
presence of a mount namespace to limit the affects of what a user can
do.

So short of a more thorough audit I believe the final semantics should
be: 
- mount/unmount for non-priveleged processes should only be limited
  by the mount namespace and directory permissions.
- CLONE_NEWNS should not be a privileged operation. 

What prevents us from allowing these things?

- Unprivileged CLONE_NEWNS and unprivileged mounts needs resource
  accounting so we don't have a denial of service attack.

- Unprivileged mounts must be limited to directories that we have
  permission to modify in a way that we could get the same effect
  as the mount or unmount operation in terms of what files are visible
  otherwise we can mess up SUID executables.

- Anything else?

There are user space issues such as a reasonable pam module and how
to do backups.  However those are user space issues.

What am I missing that requires us to add MNT_USER and MNT_USERMNT?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 02/10] allow unprivileged umount

2007-04-16 Thread Eric W. Biederman
Miklos Szeredi [EMAIL PROTECTED] writes:

 From: Miklos Szeredi [EMAIL PROTECTED]

 The owner doesn't need sysadmin capabilities to call umount().

 Similar behavior as umount(8) on mounts having user=UID option in
 /etc/mtab.  The difference is that umount also checks /etc/fstab,
 presumably to exclude another mount on the same mountpoint.


bool in the kernel?

int would be much more recognizable as this is not C++

Or do you have place to convert the rest of the kernel that is using
int to return a true/false value to bool?

 +static bool permit_umount(struct vfsmount *mnt, int flags)
 +{
 + if (capable(CAP_SYS_ADMIN))
 + return true;
 +
 + if (!(mnt-mnt_flags  MNT_USER))
 + return false;
 +
 + if (flags  MNT_FORCE)
 + return false;
 +
 + return mnt-mnt_uid == current-uid;
 +}

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-13 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Miklos Szeredi ([EMAIL PROTECTED]):
  Given the existence of shared subtrees allowing/denying this at the mount
  namespace level is silly and wrong.
  
  If we need more than just the filesystem permission checks can we
  make it a mount flag settable with mount and remount that allows
  non-privileged users the ability to create mount points under it
  in directories they have full read/write access to.
 
 OK, that makes sense.
 
  I don't like the use of clone flags for this purpose but in this
  case the shared subtress are a much more fundamental reasons for not
  doing this at the namespace level.
 
 I'll drop the clone flag, and add a mount flag instead.
 
 Thanks,
 Miklos

 Makes sense, so then on login pam has to spawn a new user namespace and
 construct a root fs with no shared subtrees and with the
 user-mounts-allowed flag specified?

I was expecting the usage in the normal case to be the Al Viro style
with shared subtrees setup for each user, with the shared subtree marked
with user-mounts-allowed.  Then on login pam would unshare the namespace
and restrict the user to their specific portion of the shared subtree.

If you don't use multiple mount namespaces all of the users have to
agree on what they want the non-privileged part of the namespace
to look like.

If you don't use shared subtrees you have to deal with all of the
joys of implementing enter, or else multiple logins from the same user
have problems.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 05/10] add permit user mounts in new namespace clone flag

2007-04-12 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Miklos Szeredi ([EMAIL PROTECTED]):
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 If CLONE_NEWNS and CLONE_NEWNS_USERMNT are given to clone(2) or
 unshare(2), then allow user mounts within the new namespace.
 
 This is not flexible enough, because user mounts can't be enabled for
 the initial namespace.
 
 The remaining clone bits also getting dangerously few...
 
 Alternatives are:
 
   - prctl() flag
   - setting through the containers filesystem

 Sorry, I know I had mentioned it, but this is definately my least
 favorite approach.

 Curious whether are any other suggestions/opinions from the containers
 list?

Given the existence of shared subtrees allowing/denying this at the mount
namespace level is silly and wrong.

If we need more than just the filesystem permission checks can we
make it a mount flag settable with mount and remount that allows
non-privileged users the ability to create mount points under it
in directories they have full read/write access to.

I don't like the use of clone flags for this purpose but in this
case the shared subtress are a much more fundamental reasons for not
doing this at the namespace level.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Implement shadow directory support for device classes.

2007-01-24 Thread Eric W. Biederman

Modify the device class code so that normal manipulations work
in the presence of shadow directories.  Some of the shadow directory
support still needs to be implemented in the implementation of the
class but these modifications are sufficient to make that simple.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 drivers/base/class.c   |   34 --
 include/linux/device.h |4 
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/base/class.c b/drivers/base/class.c
index 8bf2ca2..f72ddc0 100644
--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -136,6 +136,18 @@ static void remove_class_attrs(struct class * cls)
}
 }
 
+static int class_setup_shadow(struct class *cls)
+{
+   if (!cls-class_device_dparent  !cls-class_follow_link)
+   return 0;
+
+   if (!cls-class_device_dparent || !cls-class_device_dparent)
+   return -EINVAL;
+
+   return sysfs_make_shadowed_dir(cls-subsys.kset.kobj,
+   cls-class_follow_link);
+}
+
 int class_register(struct class * cls)
 {
int error;
@@ -154,6 +166,11 @@ int class_register(struct class * cls)
 
error = subsystem_register(cls-subsys);
if (!error) {
+   error = class_setup_shadow(cls);
+   if (error)
+   subsystem_unregister(cls-subsys);
+   }
+   if (!error) {
error = add_class_attrs(class_get(cls));
class_put(cls);
}
@@ -582,6 +599,7 @@ int class_device_add(struct class_device *class_dev)
struct class *parent_class = NULL;
struct class_device *parent_class_dev = NULL;
struct class_interface *class_intf;
+   struct dentry *dparent;
int error = -EINVAL;
 
class_dev = class_device_get(class_dev);
@@ -610,7 +628,11 @@ int class_device_add(struct class_device *class_dev)
else
class_dev-kobj.parent = parent_class-subsys.kset.kobj;
 
-   error = kobject_add(class_dev-kobj);
+   if (parent_class-class_device_dparent)
+   dparent = parent_class-class_device_dparent(class_dev);
+   else
+   dparent = parent_class-subsys.kset.kobj.dentry;
+   error = kobject_shadow_add(class_dev-kobj, dparent);
if (error)
goto out2;
 
@@ -841,11 +863,15 @@ int class_device_rename(struct class_device *class_dev, 
char *new_name)
 {
int error = 0;
char *old_class_name = NULL, *new_class_name = NULL;
+   struct class *class;
+   struct dentry *new_parent;
 
class_dev = class_device_get(class_dev);
if (!class_dev)
return -EINVAL;
 
+   class = class_dev-class;
+
pr_debug(CLASS: renaming '%s' to '%s'\n, class_dev-class_id,
 new_name);
 
@@ -857,7 +883,11 @@ int class_device_rename(struct class_device *class_dev, 
char *new_name)
 
strlcpy(class_dev-class_id, new_name, KOBJ_NAME_LEN);
 
-   error = kobject_rename(class_dev-kobj, new_name);
+   if (class-class_device_dparent)
+   new_parent = class-class_device_dparent(class_dev);
+   else
+   new_parent = class-subsys.kset.kobj.dentry;
+   error = kobject_shadow_rename(class_dev-kobj, new_parent, new_name);
 
 #ifdef CONFIG_SYSFS_DEPRECATED
if (class_dev-dev) {
diff --git a/include/linux/device.h b/include/linux/device.h
index f44247f..162e840 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -33,6 +33,7 @@ struct device;
 struct device_driver;
 struct class;
 struct class_device;
+struct nameidata;
 
 struct bus_type {
const char  * name;
@@ -197,6 +198,9 @@ struct class {
 
int (*suspend)(struct device *, pm_message_t state);
int (*resume)(struct device *);
+
+   struct dentry *(*class_device_dparent)(struct class_device *);
+   void *(*class_follow_link)(struct dentry *dentry, struct nameidata *nd);
 };
 
 extern int __must_check class_register(struct class *);
-- 
1.4.4.1.g278f

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Implement shadow directory support for device classes.

2007-01-24 Thread Eric W. Biederman
Greg KH [EMAIL PROTECTED] writes:

 Or you can try it with any of the subsystems that have already been
 converted over to use the struct device code, like misc, mem, and many
 others.  Just look for the symlinks in /sys/class/CLASS_NAME/ instead of
 real subdirectories.

 So I'd really prefer to not apply this patch, and just do this kind of
 work for 'struct device' instead.

I don't want to get stuck waiting for cleanups so I will keep this one,
even if you don't apply it :)  

But I will go figure out the equivalent version for struct device, as
it doesn't look like you are that far off from merging that version. :)

That way my code will work whichever code base I have to merge against.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Eric W. Biederman

Matthew Wilcox [EMAIL PROTECTED] writes:

 On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
  What do you mean by problems 5 years down the road?  The real issue is that
  this 32-bit block count limit affects composite devices like MD RAID and
  LVM today, not just individual disks.  There have been several postings
  I have seen with people having a problem _today_ with a 2TB limit on
  devices.
 
 people who can afford 2TB of disc can afford to buy a 64-bit processor.

Currently that doesn't solve the problem as block_nr is held in an int.
And as gcc compiles an int to a 32bit number on a 64bit processor, the
problem still isn't solved.

That at least we need to address.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Eric W. Biederman

LA Walsh [EMAIL PROTECTED] writes:

 I vaguely remember a discussion about this a few months back.
 If I remember, the reasoning was it would unnecessarily slow
 down smaller systems that would never have block devices in
 the 4-28T range attached.  

With classic 512 byte sectors the top size is right about 2TB.

The basic thought is that 64bit numbers tend to suck, so we don't
want then in any fast paths on a 32bit system.

 However, isn't it possible there will continue to be a series
 of P-IV,V,VI,VII ...etc, addons that will be used for sometime
 to come.  I've even heard it suggested that we might see
 2 or more CPU's on a single chip as a way to increase cpu
 capacity w/o driving up clock speed.  Given the cheapness of
 .25T drives now, seeing the possibility of 4T drives doesn't seem
 that remote (maybe 5 years?).  
 
 Side question: does the 32-bit block size limit also apply to 
 RAID disks or does it use a different block-nr type?
For now yes it does.

 
 So...is it the plan, or has it been though about -- 'abstracting'
 block numbes as a typedef 'block_nr', then at compile time
 having it be selectable as to whether or not this was to
 be a 32-bit or 64 bit quantity -- that way older systems would
 lose no efficiency.  Drivers that couldn't be or hadn't been
 ported to use 'block_nr' could default to being disabled if
 64-bit blocks were selected, etc.
 
 So has this idea been tossed about and or previously thrashed?

Using a 64bit number of 32bit systems has so far been trashed.
Though this does look like a real problem that needs to be solved
at some point.  I doubt we can wait past 2.5 though if we want the
code ready when the hardware is.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: [RFC] sane access to per-fs metadata (was Re: [PATCH] Documentation/ioctl-number.txt)

2001-03-23 Thread Eric W. Biederman

Alexander Viro [EMAIL PROTECTED] writes:

 IOW, you can get normal filesystem view (meaning that you have all usual
 UNIX toolkit available) for per-fs control stuff. And keep the ability to
 do proper locking - it's the same driver that handles the main fs and you
 have access to superblock. No need to change the API - everything is already
 there...
   I'll post an example patch for ext2 (safe access to superblock,
 group descriptors, inode table and bitmaps on a live fs) after this weekend
 (== when misc shit will somewhat settle down).
   Cheers,
   Al
 
 PS: Folks[1], I hope it explains why I'm very sceptical about "let's add new
 A{B,P}I" sort of ideas - approach above can be used for almost all stuff
 I've seen proposed. You can have multiple views of the same object. And
 have all of them available via normal API.


This is a cool idea.  But I couple of places where this might fall down.
1) If a filesystem has multiple name spaces and we use different mounts
   to handle them, will this break anything?  Fat32 with it's short and long
   names, and the Novell filesystem are the examples I can think of.

2) An API is still being developed it just uses the existing infrastructure
   which is good, but we still need to standardize what is exported.  It's
   a cleaner way to build a new API but a new API is being built.

3) What is a safe way to do this so a non-root user can call mount?

4) What is appropriate way using open,read,write,close,mount to handle stat data
   and extended attributes.  The stat data is the big one because it is used
   so frequently.  Possibly a mountopenread/writecloseumount syscall is needed.
   
   I keep thinking open("/path/to/file/stat_data") but that feels excessively heavy
   at the API level.   But if we involve mount (at least semantically)
   that could work for directories as well. 

The goal here is to push your ideas to the limits so we can where
using ioctl or new a syscall is appropriate.  If indeed there is such
a case.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: [patch] O_SYNC patch 3/3, add inode dirty buffer list support to ext2

2000-11-24 Thread Eric W. Biederman

"Jeff V. Merkey" [EMAIL PROTECTED] writes:

 Cool.  ORACLE is going to **SMOKE** on EXT2 with this change.

Pessimism

Hmm I don't see how ORACLE is going to **SMOKE**.
Last I looked ORACLE would need a query optimizer that always
would find the best possible index and much less overhead to **SMOKE**.

Last I looked table reads were 10x slower than file reads.

/Pessimism

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]