Re: Re: Thoughts on credential switching
On Tuesday, April 22, 2014 15:14:33 Boaz Harrosh wrote: > On 04/22/2014 02:37 PM, Florian Weimer wrote: > > On 03/27/2014 02:33 PM, Boaz Harrosh wrote: > >> POSIX or not it just does not have any real programming mining > >> at all. > > > > What do you mean with "mining" in this context? > > Sorry I saw this mistake after I posted. I meant "meaning". > > What I'm saying is that the mess starts when you are trying > to keep patching a very wrong API. the POSIX politics aside, > in regard to user switching (and current directory and etc...) > this API is plain WRONG. I mean in the mathematical sense wrong. > > All these application mess is not the application programmers > fault. He had to do what he had to do. The mess starts when you > are trying to keep a mathematical contradiction in your proof. > > It is glibc mess for trying to maintain compatibility with > these "PROCESS WIDE OPERATIONS". And naming it holy names like > POSIX will not cover the mess that they are. As long as you try > to keep them there will be mess. If you want to honestly clean > things up is by throwing the true garbage out. Convert all legacy > code to new mathematically sound API's. > > Peace > Boaz I don't want to keep this churning thread going. I'd rather solve today's problems. So let's put this in perspective and move on. Our project has a problem to solve. I wouldn't necessarily say wrong, simply very out of date. All of these issues with POSIX and pthreads in particular come from one common source, the actual capabilities of the systems, BSD, and SysV at the time. The goofy localtime_r and friends were (and still are) hacks to work around the simplistic interfaces in the original Bell Labs code that deeply assumed a single process with a single thread, i.e. there were only 3 segments, no shared libraries or shared r/w memory and definitely no (real) TLS. The pthreads interfaces deeply assumed the same. Even worse, it had to also run on VMS which (I don't believe) ever had kernel support for multiple user space theads. ASTs are nasty enough and only worked because everybody knew that you were either in app code or the AST... So all of this stuff is based on lowest common denominator. The "Single UNIX Specification" was a peace treaty among warring computer companies out looking for lock-ins and competitive advantages while reluctantly realizing that the ISV community (Oracle) really didn't care about anything but a common base. Let's move on. Glibc has to keep some semblance of POSIX in order to cope with legacy code from that era and I commend them for their perseverance. But that should not hold us back from leveraging today's architectures and, more importantly, today's multi-client workloads. If that means extensions to POSIX fine. But if POSIX gets hung up because *BSD doesn't have kernel support for thread level creds and *bsd.org doesn't want to do it, we just do what fits today's requirements and what Linux is capable of, Jim -- Jim Lieb Linux Systems Engineer Panasas Inc. "If ease of use was the only requirement, we would all be riding tricycles" - Douglas Engelbart 1925–2013 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 04/22/2014 02:37 PM, Florian Weimer wrote: > On 03/27/2014 02:33 PM, Boaz Harrosh wrote: >> POSIX or not it just does not have any real programming mining >> at all. > > What do you mean with "mining" in this context? > Sorry I saw this mistake after I posted. I meant "meaning". What I'm saying is that the mess starts when you are trying to keep patching a very wrong API. the POSIX politics aside, in regard to user switching (and current directory and etc...) this API is plain WRONG. I mean in the mathematical sense wrong. All these application mess is not the application programmers fault. He had to do what he had to do. The mess starts when you are trying to keep a mathematical contradiction in your proof. It is glibc mess for trying to maintain compatibility with these "PROCESS WIDE OPERATIONS". And naming it holy names like POSIX will not cover the mess that they are. As long as you try to keep them there will be mess. If you want to honestly clean things up is by throwing the true garbage out. Convert all legacy code to new mathematically sound API's. Peace Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 03/27/2014 02:33 PM, Boaz Harrosh wrote: man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE and be done with it POSIX or no POSIX who cares? The glibc side cares, and there's also this bit: "It aims towards POSIX and Single UNIX Specification compliance.", which should be familiar. I get that POSIX doesn't do what people want here (umask and current directory handling are another story). But it hasn't really helped that some folks just said "POSIX sucks" and did their own thing, completely bypassing the system interfaces, instead of getting things fixed properly. Now we have to balance the kernel behavior, POSIX demands, glibc dirty hacks to implement the questionable POSIX behavior on top of the kernel interfaces, and more dirty hacks to work around glibc because some applications do not like it. I would really like to clean up this mess, but it's not clear to me where to start. Please show me one user of that and declare it brain dead. Safely and demonstratively relinquishing elevated privileges. POSIX or not it just does not have any real programming mining at all. What do you mean with "mining" in this context? -- Florian Weimer / Red Hat Product Security Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
> Do we include credfd fds sitting in Unix sockets? Ouch > Hmm. What if we had initial_creds and creds, and initial_creds never > changed unless explicitly requested. There wouldn't be any way to > revert to initial_creds, but may_ptrace would check initial_creds > *and* creds. This is a little hacky, and it might break Android (I'm > not really sure how the zygote thing works), but it could do the > trick. That mirros setfsuid and seteuid type behaviour so seems sensible enough. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mon, Mar 31, 2014 at 1:14 PM, Trond Myklebust wrote: > > On Mar 31, 2014, at 15:26, Andy Lutomirski wrote: > >> On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust >> wrote: >>> >>> On Mar 31, 2014, at 7:51, Jeff Layton wrote: >>> On Sun, 30 Mar 2014 09:03:29 -0400 "Theodore Ts'o" wrote: > On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: >> I had some time to think about this last night... >> >> While using a fd to pass around credentials is convenient, the danger >> is that it's pretty opaque. You have a fd that you know has creds >> attached to it, but it's hard to be certain what is going to change. > > I don't think that's a particularly tough problem. In general, the fd > isn't something that you would want to pass around, and so the process > which generated it will know exactly what it contained. > I think there's a bit more of a use-case for passing around such an fd via socket... Part of the problem is that the traditional uid/gid switching glibc wrappers are per-process. If we're proposing doing something like: seteuid() setegid() setgroups() fd = open() (...and then revert the creds using same syscalls) ...during the time that you're doing all of that, you can't really allow any thread in the process to be doing something that requires _other_ creds until you've completed the above. >>> >>> Umm... open() isn't the only operation that you want to be able to do with >>> an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... >>> Pretty much any interaction with the underlying filesystem needs to use the >>> right identity. >>> So, I could envision a program like ganesha firing up a separate process to handle the credential switching and fd creation and then handing those back to the main process via a unix domain socket. >>> >>> How about using the keyrings interface to atomically cache and retrieve >>> these user identities? We already have support for different types of keys >>> that store/retrieve different types of structured information. How is this >>> so different? >> >> This sounds considerably more complicated than just using fds. What's >> the advantage? > > The advantage is that it's considerably _less_ complicated because it uses > interfaces that were designed to carry security related information, and to > share them across threads. > >> I guess using keys for local fs credentials fits in with using keys to >> access things like AFS, but I'm still not sure I see why this helps >> here. > > All you want to do is atomically store and retrieve a user identity (in > practice a credential) and share it between members of a thread. As I said, > that kind of storage/retrieval of structured data is what keyrings do. As far > as I know, there are already LSM interfaces in place (see > security_key_alloc/permission/free), and the generic keyring/keyctl interface > exists with appropriate process sharing/inheritance rules via the kernel > keyring interface. > > So basically, the recipe is: > > - You set up a kernel key type that takes a reference to the current process > credential on instantiation (no upcalls, etc needed). Ganesha can then use > standard libkeyutils methods to create the key and store it in its user, > process or thread keyring. > - You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that > has access to the key, via the keyring in which it is stored, to apply the > stored process credential to itself (and perhaps cache it's old credential in > a new key?). I can see pros and cons of using keys instead of fds: Pros: - You can change may_ptrace to check that you can legally ptrace any creds in the keyring. (It's debatable how useful this is.) - You could plausibly stick subsidiary uids into a user keyring and use them for uid_map, removing the need for newuidmap. Cons: - You can't send them via SCM_RIGHTS. I still think that the ability to do that is extremely useful. Neutral: What happens if the filesystem you're trying to share requires keys? I don't think that credfd or credkeys can really do this. I suppose that if the keys live in the user keyring corresponding to the user being impersonated, everything just works regardless of how this is implemented. Cue brain explosion: thread_keyring lives inside struct cred. I suspect that any sane implementation of credential switching will need to change that. Sigh. Anyway, the actual parts that seem like they need solving one way or another are: - The keyring pointers in struct cred are a problem. (Solved with credkeys, but only if credkeys can only live on the thread keyring. Also, there will be a problem with circular references.) - may_ptrace and all its users (e.g. procfs). "Solved" by real_creds, but no one has really defined the semantics of real_creds, since it's not currently
Re: Thoughts on credential switching
On Mar 31, 2014, at 15:26, Andy Lutomirski wrote: > On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust > wrote: >> >> On Mar 31, 2014, at 7:51, Jeff Layton wrote: >> >>> On Sun, 30 Mar 2014 09:03:29 -0400 >>> "Theodore Ts'o" wrote: >>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: > I had some time to think about this last night... > > While using a fd to pass around credentials is convenient, the danger > is that it's pretty opaque. You have a fd that you know has creds > attached to it, but it's hard to be certain what is going to change. I don't think that's a particularly tough problem. In general, the fd isn't something that you would want to pass around, and so the process which generated it will know exactly what it contained. >>> >>> I think there's a bit more of a use-case for passing around such an fd >>> via socket... >>> >>> Part of the problem is that the traditional uid/gid switching glibc >>> wrappers are per-process. If we're proposing doing something like: >>> >>> seteuid() >>> setegid() >>> setgroups() >>> fd = open() >>> (...and then revert the creds using same syscalls) >>> >>> ...during the time that you're doing all of that, you can't really >>> allow any thread in the process to be doing something that requires >>> _other_ creds until you've completed the above. >> >> Umm... open() isn't the only operation that you want to be able to do with >> an assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... >> Pretty much any interaction with the underlying filesystem needs to use the >> right identity. >> >>> So, I could envision a program like ganesha firing up a separate >>> process to handle the credential switching and fd creation and then >>> handing those back to the main process via a unix domain socket. >> >> How about using the keyrings interface to atomically cache and retrieve >> these user identities? We already have support for different types of keys >> that store/retrieve different types of structured information. How is this >> so different? > > This sounds considerably more complicated than just using fds. What's > the advantage? The advantage is that it’s considerably _less_ complicated because it uses interfaces that were designed to carry security related information, and to share them across threads. > I guess using keys for local fs credentials fits in with using keys to > access things like AFS, but I'm still not sure I see why this helps > here. All you want to do is atomically store and retrieve a user identity (in practice a credential) and share it between members of a thread. As I said, that kind of storage/retrieval of structured data is what keyrings do. As far as I know, there are already LSM interfaces in place (see security_key_alloc/permission/free), and the generic keyring/keyctl interface exists with appropriate process sharing/inheritance rules via the kernel keyring interface. So basically, the recipe is: - You set up a kernel key type that takes a reference to the current process credential on instantiation (no upcalls, etc needed). Ganesha can then use standard libkeyutils methods to create the key and store it in its user, process or thread keyring. - You add a new keyctl (KEYCTL_APPLY?) that enables any thread/process that has access to the key, via the keyring in which it is stored, to apply the stored process credential to itself (and perhaps cache it’s old credential in a new key?). _ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mon, Mar 31, 2014 at 11:06 AM, Trond Myklebust wrote: > > On Mar 31, 2014, at 7:51, Jeff Layton wrote: > >> On Sun, 30 Mar 2014 09:03:29 -0400 >> "Theodore Ts'o" wrote: >> >>> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: I had some time to think about this last night... While using a fd to pass around credentials is convenient, the danger is that it's pretty opaque. You have a fd that you know has creds attached to it, but it's hard to be certain what is going to change. >>> >>> I don't think that's a particularly tough problem. In general, the fd >>> isn't something that you would want to pass around, and so the process >>> which generated it will know exactly what it contained. >>> >> >> I think there's a bit more of a use-case for passing around such an fd >> via socket... >> >> Part of the problem is that the traditional uid/gid switching glibc >> wrappers are per-process. If we're proposing doing something like: >> >> seteuid() >> setegid() >> setgroups() >> fd = open() >> (...and then revert the creds using same syscalls) >> >> ...during the time that you're doing all of that, you can't really >> allow any thread in the process to be doing something that requires >> _other_ creds until you've completed the above. > > Umm... open() isn't the only operation that you want to be able to do with an > assumed user identity. You want mknod(), mkdir(), link(), unlink(), ... > Pretty much any interaction with the underlying filesystem needs to use the > right identity. > >> So, I could envision a program like ganesha firing up a separate >> process to handle the credential switching and fd creation and then >> handing those back to the main process via a unix domain socket. > > How about using the keyrings interface to atomically cache and retrieve these > user identities? We already have support for different types of keys that > store/retrieve different types of structured information. How is this so > different? This sounds considerably more complicated than just using fds. What's the advantage? I guess using keys for local fs credentials fits in with using keys to access things like AFS, but I'm still not sure I see why this helps here. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mon, Mar 31, 2014 at 11:44:59AM +0100, One Thousand Gnomes wrote: > On Wed, 26 Mar 2014 17:23:24 -0700 > Andy Lutomirski wrote: > > > Hi various people who care about user-space NFS servers and/or > > security-relevant APIs. > > > > I propose the following set of new syscalls: > > > > int credfd_create(unsigned int flags): returns a new credfd that > > corresponds to current's creds. > > > > int credfd_activate(int fd, unsigned int flags): Change current's > > creds to match the creds stored in fd. To be clear, this changes both > > the "subjective" and "objective" (aka real_cred and cred) because > > there aren't any real semantics for what happens when userspace code > > runs with real_cred != cred. > > What is the sematic of a simultaneous ptrace racing a credfd_activate on > another processor core ? > > What are the rules for simultaneous threads doing I/O and and credential > changes ? > > What is the rule for a faulting of an mmapped page in a multithreaded app > one thread of which has changed credentials ? > > Who owns a file created while you are changing credentials ? > > > - credfd_activate fails (-EINVAL) if dumpable. This is because we > > don't want a privileged daemon to be ptraced while impersonating > > someone else. > > That's one of the obvious problems but if you have that problem then > you've got races against signals and ptrace etc to deal with. FYI, Any process using pthreads and glibc already has to cope with these races as setresuid on glibc on Linux is not atomic. That's why Samba eventually changed to using the raw system calls on Linux due to an interesting bug with glibc aio interacting with setresuid races (receiving signal thread was uid 0, sending aio thread was non-zero - signal couldn't be delivered, glibc aio wakeup lost). Jeremy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mon, 31 Mar 2014 14:06:01 -0400 Trond Myklebust wrote: > > On Mar 31, 2014, at 7:51, Jeff Layton wrote: > > > On Sun, 30 Mar 2014 09:03:29 -0400 > > "Theodore Ts'o" wrote: > > > >> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: > >>> I had some time to think about this last night... > >>> > >>> While using a fd to pass around credentials is convenient, the danger > >>> is that it's pretty opaque. You have a fd that you know has creds > >>> attached to it, but it's hard to be certain what is going to change. > >> > >> I don't think that's a particularly tough problem. In general, the fd > >> isn't something that you would want to pass around, and so the process > >> which generated it will know exactly what it contained. > >> > > > > I think there's a bit more of a use-case for passing around such an fd > > via socket... > > > > Part of the problem is that the traditional uid/gid switching glibc > > wrappers are per-process. If we're proposing doing something like: > > > > seteuid() > > setegid() > > setgroups() > > fd = open() > > (...and then revert the creds using same syscalls) > > > > ...during the time that you're doing all of that, you can't really > > allow any thread in the process to be doing something that requires > > _other_ creds until you've completed the above. > > Umm… open() isn’t the only operation that you want to be able to do with an > assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty > much any interaction with the underlying filesystem needs to use the right > identity. > The proposal that Al originally had was to have userland set up the credentials that it wanted to use, and then open("/dev/null", ...) to get a fd that has those creds attached. Then, we'd add a syscall that takes such an fd and switches to the creds attached to it. So the open() in the above example isn't to do an actual operation but rather to create a fd to act as a credential "handle". > > So, I could envision a program like ganesha firing up a separate > > process to handle the credential switching and fd creation and then > > handing those back to the main process via a unix domain socket. > > How about using the keyrings interface to atomically cache and retrieve these > user identities? We already have support for different types of keys that > store/retrieve different types of structured information. How is this so > different? > That's an interesting idea. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mar 31, 2014, at 7:51, Jeff Layton wrote: > On Sun, 30 Mar 2014 09:03:29 -0400 > "Theodore Ts'o" wrote: > >> On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: >>> I had some time to think about this last night... >>> >>> While using a fd to pass around credentials is convenient, the danger >>> is that it's pretty opaque. You have a fd that you know has creds >>> attached to it, but it's hard to be certain what is going to change. >> >> I don't think that's a particularly tough problem. In general, the fd >> isn't something that you would want to pass around, and so the process >> which generated it will know exactly what it contained. >> > > I think there's a bit more of a use-case for passing around such an fd > via socket... > > Part of the problem is that the traditional uid/gid switching glibc > wrappers are per-process. If we're proposing doing something like: > > seteuid() > setegid() > setgroups() > fd = open() > (...and then revert the creds using same syscalls) > > ...during the time that you're doing all of that, you can't really > allow any thread in the process to be doing something that requires > _other_ creds until you've completed the above. Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity. > So, I could envision a program like ganesha firing up a separate > process to handle the credential switching and fd creation and then > handing those back to the main process via a unix domain socket. How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different? Cheers Trond _ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Mar 31, 2014 3:45 AM, "One Thousand Gnomes" wrote: > > On Wed, 26 Mar 2014 17:23:24 -0700 > Andy Lutomirski wrote: > > > Hi various people who care about user-space NFS servers and/or > > security-relevant APIs. > > > > I propose the following set of new syscalls: > > > > int credfd_create(unsigned int flags): returns a new credfd that > > corresponds to current's creds. > > > > int credfd_activate(int fd, unsigned int flags): Change current's > > creds to match the creds stored in fd. To be clear, this changes both > > the "subjective" and "objective" (aka real_cred and cred) because > > there aren't any real semantics for what happens when userspace code > > runs with real_cred != cred. > > What is the sematic of a simultaneous ptrace racing a credfd_activate on > another processor core ? Exactly the same as for the raw syscall setresuid. That is, we require sequential consistency. If the kernel fails to provide sequential consistency, it's a bug. > > What are the rules for simultaneous threads doing I/O and and credential > changes ? None. credfd_activate affects only the calling thread. > > What is the rule for a faulting of an mmapped page in a multithreaded app > one thread of which has changed credentials ? Same as in current kernels. Note that checking credentials in readpage, etc. is generally a bug. > > Who owns a file created while you are changing credentials ? Can't happen. See above. > > > - credfd_activate fails (-EINVAL) if dumpable. This is because we > > don't want a privileged daemon to be ptraced while impersonating > > someone else. > > That's one of the obvious problems but if you have that problem then > you've got races against signals and ptrace etc to deal with. > > One way to implement it I think safely but which requires a fair bit more > work elsewhere is to apply the debug and signal type checks as > >'you may only do X if you also posess the right to do so for *all* > credentials accessible to this task' > > which I think is the correct logical check. Do we include credfd fds sitting in Unix sockets? More generally, there are plenty of cases where ptracing something might give you access to one of its fds that you shouldn't normally have access to. Even real file fds are like this, and they could come from outside a namespace, they could be a result of odd selinux policy, etc. The only real new issue is that, while using a subsidiary credential, you don't want people who couldn't ptrace you before to suddenly be able to ptrace you. This is where "subjective" and "objective" credentials came from, but the current in-kernel implementation is really half-baked. Fixing it for real will probably run into all kinds of funny corner cases involving the existing syscalls, and it would be nice to have a way to handle this that doesn't involve auditing every commit_creds caller. Hmm. What if we had initial_creds and creds, and initial_creds never changed unless explicitly requested. There wouldn't be any way to revert to initial_creds, but may_ptrace would check initial_creds *and* creds. This is a little hacky, and it might break Android (I'm not really sure how the zygote thing works), but it could do the trick. Or we can rely on dumpable, which I think can prevent unprivileged ptracers from attaching. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Sun, 30 Mar 2014 09:03:29 -0400 "Theodore Ts'o" wrote: > On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: > > I had some time to think about this last night... > > > > While using a fd to pass around credentials is convenient, the danger > > is that it's pretty opaque. You have a fd that you know has creds > > attached to it, but it's hard to be certain what is going to change. > > I don't think that's a particularly tough problem. In general, the fd > isn't something that you would want to pass around, and so the process > which generated it will know exactly what it contained. > I think there's a bit more of a use-case for passing around such an fd via socket... Part of the problem is that the traditional uid/gid switching glibc wrappers are per-process. If we're proposing doing something like: seteuid() setegid() setgroups() fd = open() (...and then revert the creds using same syscalls) ...during the time that you're doing all of that, you can't really allow any thread in the process to be doing something that requires _other_ creds until you've completed the above. So, I could envision a program like ganesha firing up a separate process to handle the credential switching and fd creation and then handing those back to the main process via a unix domain socket. > > Perhaps we can use the flags field for that. So, assuming we have a fd > > with the creds attached, we could do something like: > > > > err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS); > > > > ...then the switch_creds syscall could be set up to fail if the new > > credentials had other fields that didn't match those in the current > > task credentials. So if (for instance) the cred->euid were > > different between the two, the above could fail with -EINVAL or > > something. > > Huh? The whole *point* is that the creds value will be different, of > course they won't match! I would think this would be over > complicating the interface. > You know that *some* of the credentials won't match, but presumably some of them (e.g. the real uid/gid) should match. Using a flags field could act as a mask of what creds are allowed to change. > > A couple of other things. What I would suggest is that we create a > few new fd flags, to join FD_CLOEXEC: > > FD_NOPROCFS disallow being able to open the inode via /proc//fd > (but in the case of a creds fd, for bonus points, the > target of the pseudo-symlink could be something like: > "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging > a userspace file server). This also answers Jeff's concern > if for some reason --- I don't know how --- a process > doesn't know what the contents of the creds fd that > it created itself. > > FD_NOPASSFD disallow being able to pass the fd via a unix domain socket > > FD_LOCKFLAGS if this bit is set, disallow any further changes of FD_CLOEXEC, > FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags. > > Some of the functionality requested by the folks suggesting the "SEAL" > API would also be covered by these fd flags. > > In order to solve some potential race concerns, a credsfd must be > created with FD_CLOEXEC and FD_NOPROCFS enabled. > > Why is this important even if the anon_inode is owned by root with a > mode of 0? Because if the system is set up to use SELinux or full > Posix capabilities, merely having the a uid of 0 is not special, and > we don't want to allow a process with uid of 0 to be able modify the > mode with the /proc//fd/ and then proceed to open the inode > using open. This way, instead of adding special case code to prevent > this from happening, we can add a more general facility which can be > used to solve a few other problems. > -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 17:23:24 -0700 Andy Lutomirski wrote: > Hi various people who care about user-space NFS servers and/or > security-relevant APIs. > > I propose the following set of new syscalls: > > int credfd_create(unsigned int flags): returns a new credfd that > corresponds to current's creds. > > int credfd_activate(int fd, unsigned int flags): Change current's > creds to match the creds stored in fd. To be clear, this changes both > the "subjective" and "objective" (aka real_cred and cred) because > there aren't any real semantics for what happens when userspace code > runs with real_cred != cred. What is the sematic of a simultaneous ptrace racing a credfd_activate on another processor core ? What are the rules for simultaneous threads doing I/O and and credential changes ? What is the rule for a faulting of an mmapped page in a multithreaded app one thread of which has changed credentials ? Who owns a file created while you are changing credentials ? > - credfd_activate fails (-EINVAL) if dumpable. This is because we > don't want a privileged daemon to be ptraced while impersonating > someone else. That's one of the obvious problems but if you have that problem then you've got races against signals and ptrace etc to deal with. One way to implement it I think safely but which requires a fair bit more work elsewhere is to apply the debug and signal type checks as 'you may only do X if you also posess the right to do so for *all* credentials accessible to this task' which I think is the correct logical check. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Sun, Mar 30, 2014 at 6:03 AM, Theodore Ts'o wrote: > On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: >> I had some time to think about this last night... >> >> While using a fd to pass around credentials is convenient, the danger >> is that it's pretty opaque. You have a fd that you know has creds >> attached to it, but it's hard to be certain what is going to change. > > I don't think that's a particularly tough problem. In general, the fd > isn't something that you would want to pass around, and so the process > which generated it will know exactly what it contained. > >> Perhaps we can use the flags field for that. So, assuming we have a fd >> with the creds attached, we could do something like: >> >> err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS); >> >> ...then the switch_creds syscall could be set up to fail if the new >> credentials had other fields that didn't match those in the current >> task credentials. So if (for instance) the cred->euid were >> different between the two, the above could fail with -EINVAL or >> something. > > Huh? The whole *point* is that the creds value will be different, of > course they won't match! I would think this would be over > complicating the interface. > > > A couple of other things. What I would suggest is that we create a > few new fd flags, to join FD_CLOEXEC: > > FD_NOPROCFS disallow being able to open the inode via /proc//fd > (but in the case of a creds fd, for bonus points, the > target of the pseudo-symlink could be something like: > "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging > a userspace file server). This also answers Jeff's concern > if for some reason --- I don't know how --- a process > doesn't know what the contents of the creds fd that > it created itself. > > FD_NOPASSFD disallow being able to pass the fd via a unix domain socket > > FD_LOCKFLAGSif this bit is set, disallow any further changes of > FD_CLOEXEC, > FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags. > > Some of the functionality requested by the folks suggesting the "SEAL" > API would also be covered by these fd flags. > > In order to solve some potential race concerns, a credsfd must be > created with FD_CLOEXEC and FD_NOPROCFS enabled. It might be nice to try to coordinate this with the Capsicum people, who are doing something along these lines. They even have tentative Linux patches. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: > I had some time to think about this last night... > > While using a fd to pass around credentials is convenient, the danger > is that it's pretty opaque. You have a fd that you know has creds > attached to it, but it's hard to be certain what is going to change. I don't think that's a particularly tough problem. In general, the fd isn't something that you would want to pass around, and so the process which generated it will know exactly what it contained. > Perhaps we can use the flags field for that. So, assuming we have a fd > with the creds attached, we could do something like: > > err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS); > > ...then the switch_creds syscall could be set up to fail if the new > credentials had other fields that didn't match those in the current > task credentials. So if (for instance) the cred->euid were > different between the two, the above could fail with -EINVAL or > something. Huh? The whole *point* is that the creds value will be different, of course they won't match! I would think this would be over complicating the interface. A couple of other things. What I would suggest is that we create a few new fd flags, to join FD_CLOEXEC: FD_NOPROCFS disallow being able to open the inode via /proc//fd (but in the case of a creds fd, for bonus points, the target of the pseudo-symlink could be something like: "uid: 15806 gid: 100: grps: 27, 50" to aid in debugging a userspace file server). This also answers Jeff's concern if for some reason --- I don't know how --- a process doesn't know what the contents of the creds fd that it created itself. FD_NOPASSFD disallow being able to pass the fd via a unix domain socket FD_LOCKFLAGSif this bit is set, disallow any further changes of FD_CLOEXEC, FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags. Some of the functionality requested by the folks suggesting the "SEAL" API would also be covered by these fd flags. In order to solve some potential race concerns, a credsfd must be created with FD_CLOEXEC and FD_NOPROCFS enabled. Why is this important even if the anon_inode is owned by root with a mode of 0? Because if the system is set up to use SELinux or full Posix capabilities, merely having the a uid of 0 is not special, and we don't want to allow a process with uid of 0 to be able modify the mode with the /proc//fd/ and then proceed to open the inode using open. This way, instead of adding special case code to prevent this from happening, we can add a more general facility which can be used to solve a few other problems. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
Jeff Layton wrote: > On Wed, 26 Mar 2014 20:25:35 -0700 > Jeff Layton wrote: > >> On Wed, 26 Mar 2014 20:05:16 -0700 >> Andy Lutomirski wrote: >> >> > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton >> > wrote: >> > > On Wed, 26 Mar 2014 17:23:24 -0700 >> > > Andy Lutomirski wrote: >> > > >> > >> Hi various people who care about user-space NFS servers and/or >> > >> security-relevant APIs. >> > >> >> > >> I propose the following set of new syscalls: >> > >> >> > >> int credfd_create(unsigned int flags): returns a new credfd that >> > >> corresponds to current's creds. >> > >> >> > >> int credfd_activate(int fd, unsigned int flags): Change current's >> > >> creds to match the creds stored in fd. To be clear, this changes >> > >> both the "subjective" and "objective" (aka real_cred and cred) >> > >> because there aren't any real semantics for what happens when >> > >> userspace code runs with real_cred != cred. >> > >> >> > >> Rules: >> > >> >> > >> - credfd_activate fails (-EINVAL) if fd is not a credfd. >> > >> - credfd_activate fails (-EPERM) if the fd's userns doesn't >> > >> match current's userns. credfd_activate is not intended to be a >> > >> substitute for setns. >> > >> - credfd_activate will fail (-EPERM) if LSM does not allow the >> > >> switch. This probably needs to be a new selinux action -- >> > >> dyntransition is too restrictive. >> > >> >> > >> >> > >> Optional: >> > >> - credfd_create always sets cloexec, because the alternative is >> > >> silly. >> > >> - credfd_activate fails (-EINVAL) if dumpable. This is because >> > >> we don't want a privileged daemon to be ptraced while >> > >> impersonating someone else. >> > >> - optional: both credfd_create and credfd_activate fail if >> > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). >> > >> >> > >> The first question: does this solve Ganesha's problem? >> > >> >> > >> The second question: is this safe? I can see two major concerns. >> > >> The bigger concern is that having these syscalls available will >> > >> allow users to exploit things that were previously secure. For >> > >> example, maybe some configuration assumes that a task running as >> > >> uid==1 can't switch to uid==2, even with uid 2's consent. >> > >> Similar issues happen with capabilities. If CAP_SYS_ADMIN is not >> > >> required, then this is no longer really true. >> > >> >> > >> Alternatively, something running as uid == 0 with heavy >> > >> capability restrictions in a mount namespace (but not a uid >> > >> namespace) could pass a credfd out of the namespace. This could >> > >> break things like Docker pretty badly. CAP_SYS_ADMIN guards >> > >> against this to some extent. But I think that Docker is already >> > >> totally screwed if a Docker root task can receive an O_DIRECTORY >> > >> or O_PATH fd out of the container, so it's not entirely clear >> > >> that the situation is any worse, even without requiring >> > >> CAP_SYS_ADMIN. >> > >> >> > >> The second concern is that it may be difficult to use this >> > >> correctly. There's a reason that real_cred and cred exist, but >> > >> it's not really well set up for being used. >> > >> >> > >> As a simple way to stay safe, Ganesha could only use credfds that >> > >> have real_uid == 0. >> > >> >> > >> --Andy >> > > >> > > >> > > I still don't quite grok why having this special credfd_create >> > > call buys you anything over simply doing what Al had originally >> > > suggested -- switch creds using all of the different syscalls and >> > > then simply caching that in a "normal" fd: >> > > >> > > fd = open("/dev/null", O_PATH...); >> > > >> > > ...it seems to me that the credfd_activate call will still need to >> > > do the same permission checking that all of the individual >> > > set*id() calls require (and all of the other stuff like changing >> > > selinux contexts, etc). >> > > >> > > IOW, this fd is just a "handle" for passing around a struct cred, >> > > but I don't see why having access to that handle would allow you >> > > to do something you couldn't already do anyway. >> > > >> > > Am I missing something obvious here? >> > >> > Not really. I think I didn't adequately explain a piece of this. >> > >> > I think that what you're suggesting is for an fd to encode a set of >> > credentials but not to grant permission to use those credentials. >> > So switch_creds(fd) is more or less the same thing as >> > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac >> > label, ...). switch_creds needs to verify that the caller can >> > dyntransition to the label, set all the ids, etc., but it avoids >> > allocating anything and running RCU callbacks. >> > >> > The trouble with this is that the verification needed is complicated >> > and expensive. And I think that my proposal is potentially more >> > useful. >> > >> >> Is it really though? My understanding of the problem was that it was >> the syscall (context switching) overhead + having to do a bunch of RCU >> critical
Re: Re: Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 1:47 PM, Jim Lieb wrote: > On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote: >> On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb wrote: >> > Rather than inline, I'm responding in the context of Jeremy's comments but >> > I have to answer others as well. It is Jeremy after all who said my baby >> > was ugly ;). >> > >> > Jeremy is right about overloading "fd". Maybe I can call it something >> > else >> > but an fd (in implementation) has merit because a creds struct hangs off >> > of >> > either/or/both task_struct and file but nothing else. Coming up with a >> > creds token struct adds/intros another struct to the kernel that has to >> > justify its existence. This fd points to an anon inode so the resource >> > consumption is pretty low and all the paths for managing it are well >> > known and *work*. I'm trying to get across the swamp, not build a new >> > Golden Gate bridge... >> > >> > As for POSIX, all of the pthreads API was based on CDE threads whose >> > initial implementation was on Ultrix (BSD 4.2). Process wide was assumed >> > because utheeads was a user space hack and setuid was process wide >> > because a proc was a just that. Good grief, that kernel was UP... When >> > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its >> > proc + thread(s) model to be compatible with the old world. In other >> > words, we incrementally backed ourselves into a corner. Declaring POSIX >> > now avoids admitting that we didn't see all that far into the future. >> > Enough said. These calls are *outside* POSIX. Pthreads in 2014 is >> > broken/obsolete. >> > >> > For the interface, I changed it from what is in the cited lkml below. It >> > is> >> > now: >> >int switch_creds(struct user_creds *creds); >> >> What is struct user_creds? And why is this called switch_creds? It >> doesn't switch anything. > > Sorry, it was in the cited lkml... It is: > > struct user_creds { > uid_t fsuid; > gid_t fsgid; > int ngroups; > gid_t altgroups[]; > }; > > ngroups is limited by NGROUPS. > > It is called switch_creds because it does switch them to the contents of > creds. When you return from switch_creds "bob" your fsuid is "Bob". The > return value is a handle to those creds so the next time you need to be "Bob" > you can just: > > use_creds(bob_creds); > >> >> > Behavior is the same except the mux'd syscall idea is gone. Adding a >> > flags arg to this is a useful idea both for current use and future >> > proofing the API. But this requires a second call >> > >> >int use_creds(int fd); >> > >> > This call does the "use an fd" case but adds -1 to revert to real creds. >> > Its guts are either an override_creds(fd->file->creds) or a >> > revert_creds(). Nice and quick. Note that use_creds(fd) is used if I >> > have cached the fd/token from switch_creds(). Also nice and quick. >> > >> > Given that I've done the checking in switch_creds and I don't pass >> > anything >> > back other than the token/fd and this token/fd is/will be confined to the >> > thread group, use_creds(fd) only needs to validate that it is a >> > switch_creds one, not from an arbitrary open(). I do this. >> >> Not so fast... what if you start privileged, create a cred fd, call >> unshare, do a dyntransition, chroot, drop privileges, and call >> use_creds? I don't immediately see why this is insecure, but having >> it be secure seems to be more or less the same condition as having my >> credfd_activate be secure. >> >> And I still don't see why you need revert at all. Just create a >> second token/fd/whatever representing your initial creds and switch to >> that. > > The creds you get have also had the capabilities reduced. You can't do a > chroot because it will EPERM. Same with a lot of others. This is a > restricted file access jail in itself. We *reduce* privs+caps here. Why not just leave the capabilities as-is? The caller will have to remove CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH from the effective set first, but that's okay. > > The use_creds has a shortcut of -1 to revert to real creds. So if you do a > series of these, how do you get back to your priv'd state? You can either > keep track of that yourself or just pass in -1 to use_creds() and know you > are... int fully_privileged_cred_fd; >> >> > I have focused this down to "fsuid" because I intend this for ops that >> > file >> > perms. Other stuff is out of scope unless someone can come up with a use >> > case and add flag defs... The other variations on the theme uid, euid, >> > and that other one I don't understand the use for, are for long lasting >> > creds change, i.e. become user "bob" and go off an fork a shell... I am >> > wrapping I/O. >> Isn't there euid for that? > > Right. A crappy interface given that creds are really more than just euid or > even egid but it does that deed just fine. It is for long term as in "become > Dave and run this program..."
Re: Re: Re: Thoughts on credential switching
On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote: > On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb wrote: > > Rather than inline, I'm responding in the context of Jeremy's comments but > > I have to answer others as well. It is Jeremy after all who said my baby > > was ugly ;). > > > > Jeremy is right about overloading "fd". Maybe I can call it something > > else > > but an fd (in implementation) has merit because a creds struct hangs off > > of > > either/or/both task_struct and file but nothing else. Coming up with a > > creds token struct adds/intros another struct to the kernel that has to > > justify its existence. This fd points to an anon inode so the resource > > consumption is pretty low and all the paths for managing it are well > > known and *work*. I'm trying to get across the swamp, not build a new > > Golden Gate bridge... > > > > As for POSIX, all of the pthreads API was based on CDE threads whose > > initial implementation was on Ultrix (BSD 4.2). Process wide was assumed > > because utheeads was a user space hack and setuid was process wide > > because a proc was a just that. Good grief, that kernel was UP... When > > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its > > proc + thread(s) model to be compatible with the old world. In other > > words, we incrementally backed ourselves into a corner. Declaring POSIX > > now avoids admitting that we didn't see all that far into the future. > > Enough said. These calls are *outside* POSIX. Pthreads in 2014 is > > broken/obsolete. > > > > For the interface, I changed it from what is in the cited lkml below. It > > is> > > now: > >int switch_creds(struct user_creds *creds); > > What is struct user_creds? And why is this called switch_creds? It > doesn't switch anything. Sorry, it was in the cited lkml... It is: struct user_creds { uid_t fsuid; gid_t fsgid; int ngroups; gid_t altgroups[]; }; ngroups is limited by NGROUPS. It is called switch_creds because it does switch them to the contents of creds. When you return from switch_creds "bob" your fsuid is "Bob". The return value is a handle to those creds so the next time you need to be "Bob" you can just: use_creds(bob_creds); > > > Behavior is the same except the mux'd syscall idea is gone. Adding a > > flags arg to this is a useful idea both for current use and future > > proofing the API. But this requires a second call > > > >int use_creds(int fd); > > > > This call does the "use an fd" case but adds -1 to revert to real creds. > > Its guts are either an override_creds(fd->file->creds) or a > > revert_creds(). Nice and quick. Note that use_creds(fd) is used if I > > have cached the fd/token from switch_creds(). Also nice and quick. > > > > Given that I've done the checking in switch_creds and I don't pass > > anything > > back other than the token/fd and this token/fd is/will be confined to the > > thread group, use_creds(fd) only needs to validate that it is a > > switch_creds one, not from an arbitrary open(). I do this. > > Not so fast... what if you start privileged, create a cred fd, call > unshare, do a dyntransition, chroot, drop privileges, and call > use_creds? I don't immediately see why this is insecure, but having > it be secure seems to be more or less the same condition as having my > credfd_activate be secure. > > And I still don't see why you need revert at all. Just create a > second token/fd/whatever representing your initial creds and switch to > that. The creds you get have also had the capabilities reduced. You can't do a chroot because it will EPERM. Same with a lot of others. This is a restricted file access jail in itself. We *reduce* privs+caps here. The use_creds has a shortcut of -1 to revert to real creds. So if you do a series of these, how do you get back to your priv'd state? You can either keep track of that yourself or just pass in -1 to use_creds() and know you are... > > > I have focused this down to "fsuid" because I intend this for ops that > > file > > perms. Other stuff is out of scope unless someone can come up with a use > > case and add flag defs... The other variations on the theme uid, euid, > > and that other one I don't understand the use for, are for long lasting > > creds change, i.e. become user "bob" and go off an fork a shell... I am > > wrapping I/O. > Isn't there euid for that? Right. A crappy interface given that creds are really more than just euid or even egid but it does that deed just fine. It is for long term as in "become Dave and run this program..." I don't change that. I'm dealing with "As Bob, do a pwrite(s)+fstat" across N cores and M ops per core... > > > I do not like the idea of spinning off a separate proc to do creds work. > > It doesn't buy anything in performance (everybody is a task to the > > kernel) but it does open a door to scary places. Jeremy and I agree that > > this
Re: Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb wrote: > Rather than inline, I'm responding in the context of Jeremy's comments but I > have to answer others as well. It is Jeremy after all who said my baby was > ugly ;). > > Jeremy is right about overloading "fd". Maybe I can call it something else > but an fd (in implementation) has merit because a creds struct hangs off of > either/or/both task_struct and file but nothing else. Coming up with a creds > token struct adds/intros another struct to the kernel that has to justify its > existence. This fd points to an anon inode so the resource consumption is > pretty low and all the paths for managing it are well known and *work*. I'm > trying to get across the swamp, not build a new Golden Gate bridge... > > As for POSIX, all of the pthreads API was based on CDE threads whose initial > implementation was on Ultrix (BSD 4.2). Process wide was assumed because > utheeads was a user space hack and setuid was process wide because a proc was > a just that. Good grief, that kernel was UP... When OSF/1 appeared, the Mach > 2.5 kernel just carried that forward with its proc + thread(s) model to be > compatible with the old world. In other words, we incrementally backed > ourselves into a corner. Declaring POSIX now avoids admitting that we didn't > see all that far into the future. Enough said. These calls are *outside* > POSIX. Pthreads in 2014 is broken/obsolete. > > For the interface, I changed it from what is in the cited lkml below. It is > now: > >int switch_creds(struct user_creds *creds); What is struct user_creds? And why is this called switch_creds? It doesn't switch anything. > > Behavior is the same except the mux'd syscall idea is gone. Adding a flags > arg > to this is a useful idea both for current use and future proofing the API. > But > this requires a second call > >int use_creds(int fd); > > This call does the "use an fd" case but adds -1 to revert to real creds. Its > guts are either an override_creds(fd->file->creds) or a revert_creds(). Nice > and quick. Note that use_creds(fd) is used if I have cached the fd/token from > switch_creds(). Also nice and quick. > > Given that I've done the checking in switch_creds and I don't pass anything > back other than the token/fd and this token/fd is/will be confined to the > thread group, use_creds(fd) only needs to validate that it is a switch_creds > one, not from an arbitrary open(). I do this. Not so fast... what if you start privileged, create a cred fd, call unshare, do a dyntransition, chroot, drop privileges, and call use_creds? I don't immediately see why this is insecure, but having it be secure seems to be more or less the same condition as having my credfd_activate be secure. And I still don't see why you need revert at all. Just create a second token/fd/whatever representing your initial creds and switch to that. > > I have focused this down to "fsuid" because I intend this for ops that file > perms. Other stuff is out of scope unless someone can come up with a use case > and add flag defs... The other variations on the theme uid, euid, and that > other one I don't understand the use for, are for long lasting creds change, > i.e. become user "bob" and go off an fork a shell... I am wrapping I/O. Isn't there euid for that? > > I do not like the idea of spinning off a separate proc to do creds work. It > doesn't buy anything in performance (everybody is a task to the kernel) but it > does open a door to scary places. Jeremy and I agree that this token/fd must > stay within the thread group, aka, process. I have already (in the newer > patchset) tied off inheritance by opening the anon fd with close-on-exec. I > think I tied off passing the fd thru a unix socket but if not, I will in my > next submission. This fd/token should stay within the thread group, period. Maybe I'm uniquely not scared of adding sane interfaces. setuid(2) is insane. Impersonating a token is quite sane and even has lots of prior art. > > As to the "get an fd and then do set*id", you have to do this twice because > that fd gets the creds at the time of open, not after fooling around. I am > trying to avoid multiple RCU cycles, not add more. Second, the above path > makes the creds switch atomic because I use the creds swapping infrastructure. > Popping back up to user space before that *all* happens opens all kinds of > ptrace+signal+??? holes. I assume you're planning on caching these things. So spending some cycles setting this thing up shouldn't matter much. If you want to add a totally separate syscall setresfsuidgidgroupscaps, be my guest :) It would actually be generally useful. > > Note also that I mask caps as well in the override creds including the caps to > do things like set*id. That is intentional. This whole idea is to constrain > the thread, not open a door *but* still provide a way to get back home > (safely). That is via use_creds(-1). > > Some ha
Re: Re: Thoughts on credential switching
Rather than inline, I'm responding in the context of Jeremy's comments but I have to answer others as well. It is Jeremy after all who said my baby was ugly ;). Jeremy is right about overloading "fd". Maybe I can call it something else but an fd (in implementation) has merit because a creds struct hangs off of either/or/both task_struct and file but nothing else. Coming up with a creds token struct adds/intros another struct to the kernel that has to justify its existence. This fd points to an anon inode so the resource consumption is pretty low and all the paths for managing it are well known and *work*. I'm trying to get across the swamp, not build a new Golden Gate bridge... As for POSIX, all of the pthreads API was based on CDE threads whose initial implementation was on Ultrix (BSD 4.2). Process wide was assumed because utheeads was a user space hack and setuid was process wide because a proc was a just that. Good grief, that kernel was UP... When OSF/1 appeared, the Mach 2.5 kernel just carried that forward with its proc + thread(s) model to be compatible with the old world. In other words, we incrementally backed ourselves into a corner. Declaring POSIX now avoids admitting that we didn't see all that far into the future. Enough said. These calls are *outside* POSIX. Pthreads in 2014 is broken/obsolete. For the interface, I changed it from what is in the cited lkml below. It is now: int switch_creds(struct user_creds *creds); Behavior is the same except the mux'd syscall idea is gone. Adding a flags arg to this is a useful idea both for current use and future proofing the API. But this requires a second call int use_creds(int fd); This call does the "use an fd" case but adds -1 to revert to real creds. Its guts are either an override_creds(fd->file->creds) or a revert_creds(). Nice and quick. Note that use_creds(fd) is used if I have cached the fd/token from switch_creds(). Also nice and quick. Given that I've done the checking in switch_creds and I don't pass anything back other than the token/fd and this token/fd is/will be confined to the thread group, use_creds(fd) only needs to validate that it is a switch_creds one, not from an arbitrary open(). I do this. I have focused this down to "fsuid" because I intend this for ops that file perms. Other stuff is out of scope unless someone can come up with a use case and add flag defs... The other variations on the theme uid, euid, and that other one I don't understand the use for, are for long lasting creds change, i.e. become user "bob" and go off an fork a shell... I am wrapping I/O. I do not like the idea of spinning off a separate proc to do creds work. It doesn't buy anything in performance (everybody is a task to the kernel) but it does open a door to scary places. Jeremy and I agree that this token/fd must stay within the thread group, aka, process. I have already (in the newer patchset) tied off inheritance by opening the anon fd with close-on-exec. I think I tied off passing the fd thru a unix socket but if not, I will in my next submission. This fd/token should stay within the thread group, period. As to the "get an fd and then do set*id", you have to do this twice because that fd gets the creds at the time of open, not after fooling around. I am trying to avoid multiple RCU cycles, not add more. Second, the above path makes the creds switch atomic because I use the creds swapping infrastructure. Popping back up to user space before that *all* happens opens all kinds of ptrace+signal+??? holes. Note also that I mask caps as well in the override creds including the caps to do things like set*id. That is intentional. This whole idea is to constrain the thread, not open a door *but* still provide a way to get back home (safely). That is via use_creds(-1). Some have proposed that we personalize a worker thread (the rpc op processor) to the creds of the client user right off. Ganesha only needs to do this user creds work for VFS (kernel local) filesystems. Most of our cluster filesystems have apis that allow us to pass creds directly on calls. We only need this for that local mounted namespace. The server core owns rpc and ops, the backend (FSAL) owns the shim layer. User creds are backend... Having a personalized thread complicates the core. As I mentioned at LSF, I have another set pending that needs a bit more polish to answer issues from the last cycle. I need to fix the issue of handling syscalls that would do their own creds swapping inside the switch_creds ... use_creds(-1) region. The patch causes these syscalls, e.g. setuid() to EPERM. Again, I like this because I want the client creds impersonating thread to only be able to do I/O, not escape into the wild. Jim On Thursday, March 27, 2014 11:26:17 Jeremy Allison wrote: > On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote: > > On Thu, 27 Mar 2014 14:06:32 +0100 >
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 11:56 AM, Jeremy Allison wrote: > On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote: >> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison wrote: >> > >> > Amen to that :-). >> > >> > However, after talking with Jeff and Jim at CollabSummit, >> > I was 'encouraged' to make my opinions known on the list. >> > >> > To me, calling the creds handle a file descriptor just >> > feels wrong. IT *isn't* an fd, you can't read/write/poll >> > on it, and it's only done as a convenience to get the >> > close-on-exec semantics and the fact that the creds are >> > already hung off the fd's in kernel space. >> >> Windows calls these things "handles." Linux has "file descriptors," >> and there's plenty of precedent for things that aren't files. > > Sure, but there's a set of expectations around > fd's that these things don't satisfy - IO-ops. eventfd, timerfd, and signalfd barely satisfy those. namespace fds don't satisfy those expectations at all. And /proc/pid/fd is really quite useful for debugging. > >> > That way we can also make it clear this thing only has >> > meaning to a thread group, and SHOULD NOT (and indeed >> > preferably CAN NOT) be passed between processes. >> > >> >> If you want those semantics, then stick a struct pid * in there for >> the tgid of the cretor and make sure that current's tgid matches when >> you try to use it. >> >> I think they'd be more useful without that check, though. > > I'm more worried about leakage and unintended consequences > here. > >> BTW, what do you want to have happen on fork? I think they should keep >> working. > > Yeah, that's true. I want them to keep > working across fork, but not across exec > or any other method of fd-passing. > This seems like an unfortunate restriction to put in the kernel to prevent userspace from shooting itself in the foot. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote: > On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison wrote: > > > > Amen to that :-). > > > > However, after talking with Jeff and Jim at CollabSummit, > > I was 'encouraged' to make my opinions known on the list. > > > > To me, calling the creds handle a file descriptor just > > feels wrong. IT *isn't* an fd, you can't read/write/poll > > on it, and it's only done as a convenience to get the > > close-on-exec semantics and the fact that the creds are > > already hung off the fd's in kernel space. > > Windows calls these things "handles." Linux has "file descriptors," > and there's plenty of precedent for things that aren't files. Sure, but there's a set of expectations around fd's that these things don't satisfy - IO-ops. > > That way we can also make it clear this thing only has > > meaning to a thread group, and SHOULD NOT (and indeed > > preferably CAN NOT) be passed between processes. > > > > If you want those semantics, then stick a struct pid * in there for > the tgid of the cretor and make sure that current's tgid matches when > you try to use it. > > I think they'd be more useful without that check, though. I'm more worried about leakage and unintended consequences here. > BTW, what do you want to have happen on fork? I think they should keep > working. Yeah, that's true. I want them to keep working across fork, but not across exec or any other method of fd-passing. Jeremy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison wrote: > > Amen to that :-). > > However, after talking with Jeff and Jim at CollabSummit, > I was 'encouraged' to make my opinions known on the list. > > To me, calling the creds handle a file descriptor just > feels wrong. IT *isn't* an fd, you can't read/write/poll > on it, and it's only done as a convenience to get the > close-on-exec semantics and the fact that the creds are > already hung off the fd's in kernel space. Windows calls these things "handles." Linux has "file descriptors," and there's plenty of precedent for things that aren't files. > > I'd rather any creads call use a different type, even if > it's a typedef of 'int -> creds_handle_t', just to make > it really clear it's *NOT* an fd. > > That way we can also make it clear this thing only has > meaning to a thread group, and SHOULD NOT (and indeed > preferably CAN NOT) be passed between processes. > If you want those semantics, then stick a struct pid * in there for the tgid of the cretor and make sure that current's tgid matches when you try to use it. I think they'd be more useful without that check, though. BTW, what do you want to have happen on fork? I think they should keep working. > Cheers, > > Jeremy. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote: > On Thu, 27 Mar 2014 14:06:32 +0100 > Florian Weimer wrote: > > > On 03/27/2014 02:02 PM, Jeff Layton wrote: > > > > >> This interface does not address the long-term lack of POSIX > > >> compliance in setuid and friends, which are required to be > > >> process-global and not thread-specific (as they are on the kernel > > >> side). > > >> > > >> glibc works around this by reserving a signal and running set*id on > > >> every thread in a special signal handler. This is just crass, and > > >> it is likely impossible to restore the original process state in > > >> case of partial failure. We really need kernel support to perform > > >> the process-wide switch in an all-or-nothing manner. > > >> > > > > > > I disagree. We're treading new ground here with this syscall. It's > > > not defined by POSIX so we're under no obligation to follow its > > > silly directives in this regard. Per-process cred switching doesn't > > > really make much sense in this day and age, IMO. Wasn't part of the > > > spec was written before threading existed > > > > Okay, then we need to add a separate set of system calls. > > > > I really, really want to get rid of that signal handler mess in > > glibc, with its partial failures. > > > > I agree, it's a hack, but hardly anyone these days really wants to > switch creds on a per-process basis. It's just that we're saddled with > a spec for those calls that was written before threads really existed. > > The kernel syscalls already do the right thing as far as I'm concerned. > What would be nice however is a blessed glibc interface to them > that didn't involve all of the signal handling stuff. Then samba et. al. > wouldn't need to call syscall() directly to get at them. Amen to that :-). However, after talking with Jeff and Jim at CollabSummit, I was 'encouraged' to make my opinions known on the list. To me, calling the creds handle a file descriptor just feels wrong. IT *isn't* an fd, you can't read/write/poll on it, and it's only done as a convenience to get the close-on-exec semantics and the fact that the creds are already hung off the fd's in kernel space. I'd rather any creads call use a different type, even if it's a typedef of 'int -> creds_handle_t', just to make it really clear it's *NOT* an fd. That way we can also make it clear this thing only has meaning to a thread group, and SHOULD NOT (and indeed preferably CAN NOT) be passed between processes. Cheers, Jeremy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, Mar 27, 2014 at 8:41 AM, Florian Weimer wrote: > On 03/27/2014 02:01 AM, Andy Lutomirski wrote: > >> Essentially, it's a performance problem. knfsd has override_creds, >> and it can cache struct cred. But userspace doing the same thing >> (i.e. impersonating a user) has to do setresuid, setresgid, and >> setgroups, which kills performance, since it results in something like >> five RCU callbacks per impersonation round-trip. > > > Do you mean setfsuid instead of setresuid? No, but I should have given context. setfsuid sucks, since its return value is garbage. It's also pointless nowadays, since setresuid can be used to sanely change only the effective uid, and doing so should be secure. See: http://article.gmane.org/gmane.linux.kernel/1540880 --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 03/27/2014 02:01 AM, Andy Lutomirski wrote: Essentially, it's a performance problem. knfsd has override_creds, and it can cache struct cred. But userspace doing the same thing (i.e. impersonating a user) has to do setresuid, setresgid, and setgroups, which kills performance, since it results in something like five RCU callbacks per impersonation round-trip. Do you mean setfsuid instead of setresuid? -- Florian Weimer / Red Hat Product Security Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 20:25:35 -0700 Jeff Layton wrote: > On Wed, 26 Mar 2014 20:05:16 -0700 > Andy Lutomirski wrote: > > > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton > > wrote: > > > On Wed, 26 Mar 2014 17:23:24 -0700 > > > Andy Lutomirski wrote: > > > > > >> Hi various people who care about user-space NFS servers and/or > > >> security-relevant APIs. > > >> > > >> I propose the following set of new syscalls: > > >> > > >> int credfd_create(unsigned int flags): returns a new credfd that > > >> corresponds to current's creds. > > >> > > >> int credfd_activate(int fd, unsigned int flags): Change current's > > >> creds to match the creds stored in fd. To be clear, this changes > > >> both the "subjective" and "objective" (aka real_cred and cred) > > >> because there aren't any real semantics for what happens when > > >> userspace code runs with real_cred != cred. > > >> > > >> Rules: > > >> > > >> - credfd_activate fails (-EINVAL) if fd is not a credfd. > > >> - credfd_activate fails (-EPERM) if the fd's userns doesn't > > >> match current's userns. credfd_activate is not intended to be a > > >> substitute for setns. > > >> - credfd_activate will fail (-EPERM) if LSM does not allow the > > >> switch. This probably needs to be a new selinux action -- > > >> dyntransition is too restrictive. > > >> > > >> > > >> Optional: > > >> - credfd_create always sets cloexec, because the alternative is > > >> silly. > > >> - credfd_activate fails (-EINVAL) if dumpable. This is because > > >> we don't want a privileged daemon to be ptraced while > > >> impersonating someone else. > > >> - optional: both credfd_create and credfd_activate fail if > > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). > > >> > > >> The first question: does this solve Ganesha's problem? > > >> > > >> The second question: is this safe? I can see two major concerns. > > >> The bigger concern is that having these syscalls available will > > >> allow users to exploit things that were previously secure. For > > >> example, maybe some configuration assumes that a task running as > > >> uid==1 can't switch to uid==2, even with uid 2's consent. > > >> Similar issues happen with capabilities. If CAP_SYS_ADMIN is not > > >> required, then this is no longer really true. > > >> > > >> Alternatively, something running as uid == 0 with heavy > > >> capability restrictions in a mount namespace (but not a uid > > >> namespace) could pass a credfd out of the namespace. This could > > >> break things like Docker pretty badly. CAP_SYS_ADMIN guards > > >> against this to some extent. But I think that Docker is already > > >> totally screwed if a Docker root task can receive an O_DIRECTORY > > >> or O_PATH fd out of the container, so it's not entirely clear > > >> that the situation is any worse, even without requiring > > >> CAP_SYS_ADMIN. > > >> > > >> The second concern is that it may be difficult to use this > > >> correctly. There's a reason that real_cred and cred exist, but > > >> it's not really well set up for being used. > > >> > > >> As a simple way to stay safe, Ganesha could only use credfds that > > >> have real_uid == 0. > > >> > > >> --Andy > > > > > > > > > I still don't quite grok why having this special credfd_create > > > call buys you anything over simply doing what Al had originally > > > suggested -- switch creds using all of the different syscalls and > > > then simply caching that in a "normal" fd: > > > > > > fd = open("/dev/null", O_PATH...); > > > > > > ...it seems to me that the credfd_activate call will still need to > > > do the same permission checking that all of the individual > > > set*id() calls require (and all of the other stuff like changing > > > selinux contexts, etc). > > > > > > IOW, this fd is just a "handle" for passing around a struct cred, > > > but I don't see why having access to that handle would allow you > > > to do something you couldn't already do anyway. > > > > > > Am I missing something obvious here? > > > > Not really. I think I didn't adequately explain a piece of this. > > > > I think that what you're suggesting is for an fd to encode a set of > > credentials but not to grant permission to use those credentials. > > So switch_creds(fd) is more or less the same thing as > > switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac > > label, ...). switch_creds needs to verify that the caller can > > dyntransition to the label, set all the ids, etc., but it avoids > > allocating anything and running RCU callbacks. > > > > The trouble with this is that the verification needed is complicated > > and expensive. And I think that my proposal is potentially more > > useful. > > > > Is it really though? My understanding of the problem was that it was > the syscall (context switching) overhead + having to do a bunch of RCU > critical stuff that was the problem. If we can do all of this in the > context of a single RCU critical section, isn't that still a win? > > A
Re: Thoughts on credential switching
On Thu, 27 Mar 2014 14:06:32 +0100 Florian Weimer wrote: > On 03/27/2014 02:02 PM, Jeff Layton wrote: > > >> This interface does not address the long-term lack of POSIX > >> compliance in setuid and friends, which are required to be > >> process-global and not thread-specific (as they are on the kernel > >> side). > >> > >> glibc works around this by reserving a signal and running set*id on > >> every thread in a special signal handler. This is just crass, and > >> it is likely impossible to restore the original process state in > >> case of partial failure. We really need kernel support to perform > >> the process-wide switch in an all-or-nothing manner. > >> > > > > I disagree. We're treading new ground here with this syscall. It's > > not defined by POSIX so we're under no obligation to follow its > > silly directives in this regard. Per-process cred switching doesn't > > really make much sense in this day and age, IMO. Wasn't part of the > > spec was written before threading existed > > Okay, then we need to add a separate set of system calls. > > I really, really want to get rid of that signal handler mess in > glibc, with its partial failures. > I agree, it's a hack, but hardly anyone these days really wants to switch creds on a per-process basis. It's just that we're saddled with a spec for those calls that was written before threads really existed. The kernel syscalls already do the right thing as far as I'm concerned. What would be nice however is a blessed glibc interface to them that didn't involve all of the signal handling stuff. Then samba et. al. wouldn't need to call syscall() directly to get at them. > > The per-process credential switching is pretty universally a pain in > > the ass for anyone who wants to write something like a threaded file > > server. Jeremy Allison had to jump through some rather major hoops > > to work around it for Samba [1]. I think we want to strive to make > > this a per-task credential switch and ignore that part of the POSIX > > spec. > > Yeah, I get that, setfsuid/setfsgid already behaves this way. > > (Current directory and umask are equally problematic, but it's > possible to avoid most issues.) > > > That said, I think we will need to understand and document what we > > expect to occur if someone does this new switch_creds(fd) call and > > then subsequently calls something like setuid(), if only to ensure > > that we don't get blindsided by it. > > Currently, from the kernel perspective, this is not really a problem > because the credentials are always per-task. It's just that a > conforming user space needs the process-wide credentials. > -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 03/27/2014 03:06 PM, Florian Weimer wrote: > On 03/27/2014 02:02 PM, Jeff Layton wrote: > >>> This interface does not address the long-term lack of POSIX >>> compliance in setuid and friends, which are required to be >>> process-global and not thread-specific (as they are on the kernel >>> side). >>> >>> glibc works around this by reserving a signal and running set*id on >>> every thread in a special signal handler. This is just crass, and it >>> is likely impossible to restore the original process state in case of >>> partial failure. We really need kernel support to perform the >>> process-wide switch in an all-or-nothing manner. >>> >> >> I disagree. We're treading new ground here with this syscall. It's >> not defined by POSIX so we're under no obligation to follow its silly >> directives in this regard. Per-process cred switching doesn't really >> make much sense in this day and age, IMO. Wasn't part of the spec was >> written before threading existed > > Okay, then we need to add a separate set of system calls. > separate set of system calls is exactly what we do not need. The syscalls are all per thread. What we do need is a new set of glibc APIs to follow what everyone who cares does now. Because no body uses the current code - setuid and friends - The huge majority of utilities are just single threaded and could careless. The few threaded apps that do use setuid and friends use the SYSCALL directly. Both samba and Ganesha have their own define of setthread_uid/setthread_gid/ setthread_groops. and avoid glibc as a plague. man setuid should be saying DEPRECATED, EMULATED and SIGNAL NOT SAFE and be done with it POSIX or no POSIX who cares? and a new well define set is to be defined by glibc implemented easily not only on Linux (MACOX FreeBSD are the same as linux) And that is that > I really, really want to get rid of that signal handler mess in glibc, > with its partial failures. > I would not spend one second on an API no one uses. As I said most users are single threaded and could care less. Few threaded users already do direct syscalls >> The per-process credential switching is pretty universally a pain in >> the ass for anyone who wants to write something like a threaded file >> server. Jeremy Allison had to jump through some rather major hoops to >> work around it for Samba [1]. I think we want to strive to make this a >> per-task credential switch and ignore that part of the POSIX spec. > > Yeah, I get that, setfsuid/setfsgid already behaves this way. > > (Current directory and umask are equally problematic, but it's possible > to avoid most issues.) > >> That said, I think we will need to understand and document what we >> expect to occur if someone does this new switch_creds(fd) call and then >> subsequently calls something like setuid(), if only to ensure that we >> don't get blindsided by it. > > Currently, from the kernel perspective, this is not really a problem > because the credentials are always per-task. It's just that a > conforming user space needs the process-wide credentials. > Please show me one user of that and declare it brain dead. POSIX or not it just does not have any real programming mining at all. Plain and simple. Even if you made process-wide credentials atomic it would still mean nothing. That said. I second Jeff's point of Jim's implementation doing all we really need. It is fast and simple and can give us all we need and with a single added SYSCALL. The patch is already out there. I thought it was even considered for inclusion through Al viro. Cheers Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 03/27/2014 02:02 PM, Jeff Layton wrote: This interface does not address the long-term lack of POSIX compliance in setuid and friends, which are required to be process-global and not thread-specific (as they are on the kernel side). glibc works around this by reserving a signal and running set*id on every thread in a special signal handler. This is just crass, and it is likely impossible to restore the original process state in case of partial failure. We really need kernel support to perform the process-wide switch in an all-or-nothing manner. I disagree. We're treading new ground here with this syscall. It's not defined by POSIX so we're under no obligation to follow its silly directives in this regard. Per-process cred switching doesn't really make much sense in this day and age, IMO. Wasn't part of the spec was written before threading existed Okay, then we need to add a separate set of system calls. I really, really want to get rid of that signal handler mess in glibc, with its partial failures. The per-process credential switching is pretty universally a pain in the ass for anyone who wants to write something like a threaded file server. Jeremy Allison had to jump through some rather major hoops to work around it for Samba [1]. I think we want to strive to make this a per-task credential switch and ignore that part of the POSIX spec. Yeah, I get that, setfsuid/setfsgid already behaves this way. (Current directory and umask are equally problematic, but it's possible to avoid most issues.) That said, I think we will need to understand and document what we expect to occur if someone does this new switch_creds(fd) call and then subsequently calls something like setuid(), if only to ensure that we don't get blindsided by it. Currently, from the kernel perspective, this is not really a problem because the credentials are always per-task. It's just that a conforming user space needs the process-wide credentials. -- Florian Weimer / Red Hat Product Security Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, 27 Mar 2014 13:46:06 +0100 Florian Weimer wrote: > On 03/27/2014 01:23 AM, Andy Lutomirski wrote: > > > I propose the following set of new syscalls: > > > > int credfd_create(unsigned int flags): returns a new credfd that > > corresponds to current's creds. > > > > int credfd_activate(int fd, unsigned int flags): Change current's > > creds to match the creds stored in fd. To be clear, this changes > > both the "subjective" and "objective" (aka real_cred and cred) > > because there aren't any real semantics for what happens when > > userspace code runs with real_cred != cred. > > This interface does not address the long-term lack of POSIX > compliance in setuid and friends, which are required to be > process-global and not thread-specific (as they are on the kernel > side). > > glibc works around this by reserving a signal and running set*id on > every thread in a special signal handler. This is just crass, and it > is likely impossible to restore the original process state in case of > partial failure. We really need kernel support to perform the > process-wide switch in an all-or-nothing manner. > I disagree. We're treading new ground here with this syscall. It's not defined by POSIX so we're under no obligation to follow its silly directives in this regard. Per-process cred switching doesn't really make much sense in this day and age, IMO. Wasn't part of the spec was written before threading existed The per-process credential switching is pretty universally a pain in the ass for anyone who wants to write something like a threaded file server. Jeremy Allison had to jump through some rather major hoops to work around it for Samba [1]. I think we want to strive to make this a per-task credential switch and ignore that part of the POSIX spec. That said, I think we will need to understand and document what we expect to occur if someone does this new switch_creds(fd) call and then subsequently calls something like setuid(), if only to ensure that we don't get blindsided by it. [1]: http://sourceware.org/ml/libc-help/2012-07/msg4.html -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On 03/27/2014 01:23 AM, Andy Lutomirski wrote: I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the "subjective" and "objective" (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. This interface does not address the long-term lack of POSIX compliance in setuid and friends, which are required to be process-global and not thread-specific (as they are on the kernel side). glibc works around this by reserving a signal and running set*id on every thread in a special signal handler. This is just crass, and it is likely impossible to restore the original process state in case of partial failure. We really need kernel support to perform the process-wide switch in an all-or-nothing manner. -- Florian Weimer / Red Hat Product Security Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 20:05:16 -0700 Andy Lutomirski wrote: > On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton > wrote: > > On Wed, 26 Mar 2014 17:23:24 -0700 > > Andy Lutomirski wrote: > > > >> Hi various people who care about user-space NFS servers and/or > >> security-relevant APIs. > >> > >> I propose the following set of new syscalls: > >> > >> int credfd_create(unsigned int flags): returns a new credfd that > >> corresponds to current's creds. > >> > >> int credfd_activate(int fd, unsigned int flags): Change current's > >> creds to match the creds stored in fd. To be clear, this changes > >> both the "subjective" and "objective" (aka real_cred and cred) > >> because there aren't any real semantics for what happens when > >> userspace code runs with real_cred != cred. > >> > >> Rules: > >> > >> - credfd_activate fails (-EINVAL) if fd is not a credfd. > >> - credfd_activate fails (-EPERM) if the fd's userns doesn't match > >> current's userns. credfd_activate is not intended to be a > >> substitute for setns. > >> - credfd_activate will fail (-EPERM) if LSM does not allow the > >> switch. This probably needs to be a new selinux action -- > >> dyntransition is too restrictive. > >> > >> > >> Optional: > >> - credfd_create always sets cloexec, because the alternative is > >> silly. > >> - credfd_activate fails (-EINVAL) if dumpable. This is because we > >> don't want a privileged daemon to be ptraced while impersonating > >> someone else. > >> - optional: both credfd_create and credfd_activate fail if > >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). > >> > >> The first question: does this solve Ganesha's problem? > >> > >> The second question: is this safe? I can see two major concerns. > >> The bigger concern is that having these syscalls available will > >> allow users to exploit things that were previously secure. For > >> example, maybe some configuration assumes that a task running as > >> uid==1 can't switch to uid==2, even with uid 2's consent. Similar > >> issues happen with capabilities. If CAP_SYS_ADMIN is not > >> required, then this is no longer really true. > >> > >> Alternatively, something running as uid == 0 with heavy capability > >> restrictions in a mount namespace (but not a uid namespace) could > >> pass a credfd out of the namespace. This could break things like > >> Docker pretty badly. CAP_SYS_ADMIN guards against this to some > >> extent. But I think that Docker is already totally screwed if a > >> Docker root task can receive an O_DIRECTORY or O_PATH fd out of > >> the container, so it's not entirely clear that the situation is > >> any worse, even without requiring CAP_SYS_ADMIN. > >> > >> The second concern is that it may be difficult to use this > >> correctly. There's a reason that real_cred and cred exist, but > >> it's not really well set up for being used. > >> > >> As a simple way to stay safe, Ganesha could only use credfds that > >> have real_uid == 0. > >> > >> --Andy > > > > > > I still don't quite grok why having this special credfd_create call > > buys you anything over simply doing what Al had originally > > suggested -- switch creds using all of the different syscalls and > > then simply caching that in a "normal" fd: > > > > fd = open("/dev/null", O_PATH...); > > > > ...it seems to me that the credfd_activate call will still need to > > do the same permission checking that all of the individual set*id() > > calls require (and all of the other stuff like changing selinux > > contexts, etc). > > > > IOW, this fd is just a "handle" for passing around a struct cred, > > but I don't see why having access to that handle would allow you to > > do something you couldn't already do anyway. > > > > Am I missing something obvious here? > > Not really. I think I didn't adequately explain a piece of this. > > I think that what you're suggesting is for an fd to encode a set of > credentials but not to grant permission to use those credentials. So > switch_creds(fd) is more or less the same thing as switch_creds(ruid, > euid, suid, rgid, egid, sgid, groups, mac label, ...). switch_creds > needs to verify that the caller can dyntransition to the label, set > all the ids, etc., but it avoids allocating anything and running RCU > callbacks. > > The trouble with this is that the verification needed is complicated > and expensive. And I think that my proposal is potentially more > useful. > Is it really though? My understanding of the problem was that it was the syscall (context switching) overhead + having to do a bunch of RCU critical stuff that was the problem. If we can do all of this in the context of a single RCU critical section, isn't that still a win? As to the complicated part...maybe but it doesn't seem like it would have to be. We could simply return -EINVAL or something if the old struct cred doesn't have fields that match the ones we're replacing and that we don't expect to see changed. > A credfd is like a struct cred, bu
Re: Thoughts on credential switching
On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton wrote: > On Wed, 26 Mar 2014 17:23:24 -0700 > Andy Lutomirski wrote: > >> Hi various people who care about user-space NFS servers and/or >> security-relevant APIs. >> >> I propose the following set of new syscalls: >> >> int credfd_create(unsigned int flags): returns a new credfd that >> corresponds to current's creds. >> >> int credfd_activate(int fd, unsigned int flags): Change current's >> creds to match the creds stored in fd. To be clear, this changes both >> the "subjective" and "objective" (aka real_cred and cred) because >> there aren't any real semantics for what happens when userspace code >> runs with real_cred != cred. >> >> Rules: >> >> - credfd_activate fails (-EINVAL) if fd is not a credfd. >> - credfd_activate fails (-EPERM) if the fd's userns doesn't match >> current's userns. credfd_activate is not intended to be a substitute >> for setns. >> - credfd_activate will fail (-EPERM) if LSM does not allow the >> switch. This probably needs to be a new selinux action -- >> dyntransition is too restrictive. >> >> >> Optional: >> - credfd_create always sets cloexec, because the alternative is >> silly. >> - credfd_activate fails (-EINVAL) if dumpable. This is because we >> don't want a privileged daemon to be ptraced while impersonating >> someone else. >> - optional: both credfd_create and credfd_activate fail if >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). >> >> The first question: does this solve Ganesha's problem? >> >> The second question: is this safe? I can see two major concerns. The >> bigger concern is that having these syscalls available will allow >> users to exploit things that were previously secure. For example, >> maybe some configuration assumes that a task running as uid==1 can't >> switch to uid==2, even with uid 2's consent. Similar issues happen >> with capabilities. If CAP_SYS_ADMIN is not required, then this is no >> longer really true. >> >> Alternatively, something running as uid == 0 with heavy capability >> restrictions in a mount namespace (but not a uid namespace) could pass >> a credfd out of the namespace. This could break things like Docker >> pretty badly. CAP_SYS_ADMIN guards against this to some extent. But >> I think that Docker is already totally screwed if a Docker root task >> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's >> not entirely clear that the situation is any worse, even without >> requiring CAP_SYS_ADMIN. >> >> The second concern is that it may be difficult to use this correctly. >> There's a reason that real_cred and cred exist, but it's not really >> well set up for being used. >> >> As a simple way to stay safe, Ganesha could only use credfds that have >> real_uid == 0. >> >> --Andy > > > I still don't quite grok why having this special credfd_create call > buys you anything over simply doing what Al had originally suggested -- > switch creds using all of the different syscalls and then simply caching > that in a "normal" fd: > > fd = open("/dev/null", O_PATH...); > > ...it seems to me that the credfd_activate call will still need to do > the same permission checking that all of the individual set*id() calls > require (and all of the other stuff like changing selinux contexts, > etc). > > IOW, this fd is just a "handle" for passing around a struct cred, but I > don't see why having access to that handle would allow you to do > something you couldn't already do anyway. > > Am I missing something obvious here? Not really. I think I didn't adequately explain a piece of this. I think that what you're suggesting is for an fd to encode a set of credentials but not to grant permission to use those credentials. So switch_creds(fd) is more or less the same thing as switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac label, ...). switch_creds needs to verify that the caller can dyntransition to the label, set all the ids, etc., but it avoids allocating anything and running RCU callbacks. The trouble with this is that the verification needed is complicated and expensive. And I think that my proposal is potentially more useful. A credfd is like a struct cred, but possession of a credfd carries the permission to use those credentials. So, for example, credfd_activate to switch to a given uid might work even if setresuid to that uid would be disallowed. But, for this to be secure, the act of giving someone a credfd needs to be explicit. Programs implicitly send other programs their credentials by means of f_cred all the time, and they don't expect to allow the receiver to impersonate them. credfd has other uses. A file server, for example, could actually delegate creation of the credfds to a separate process, and that process could validate that the request is for a credfd that the file server really should be able to obtain. This would enable that process to make sure that the user in question has actually authenticated itsel
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 17:23:24 -0700 Andy Lutomirski wrote: > Hi various people who care about user-space NFS servers and/or > security-relevant APIs. > > I propose the following set of new syscalls: > > int credfd_create(unsigned int flags): returns a new credfd that > corresponds to current's creds. > > int credfd_activate(int fd, unsigned int flags): Change current's > creds to match the creds stored in fd. To be clear, this changes both > the "subjective" and "objective" (aka real_cred and cred) because > there aren't any real semantics for what happens when userspace code > runs with real_cred != cred. > > Rules: > > - credfd_activate fails (-EINVAL) if fd is not a credfd. > - credfd_activate fails (-EPERM) if the fd's userns doesn't match > current's userns. credfd_activate is not intended to be a substitute > for setns. > - credfd_activate will fail (-EPERM) if LSM does not allow the > switch. This probably needs to be a new selinux action -- > dyntransition is too restrictive. > > > Optional: > - credfd_create always sets cloexec, because the alternative is > silly. > - credfd_activate fails (-EINVAL) if dumpable. This is because we > don't want a privileged daemon to be ptraced while impersonating > someone else. > - optional: both credfd_create and credfd_activate fail if > !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). > > The first question: does this solve Ganesha's problem? > > The second question: is this safe? I can see two major concerns. The > bigger concern is that having these syscalls available will allow > users to exploit things that were previously secure. For example, > maybe some configuration assumes that a task running as uid==1 can't > switch to uid==2, even with uid 2's consent. Similar issues happen > with capabilities. If CAP_SYS_ADMIN is not required, then this is no > longer really true. > > Alternatively, something running as uid == 0 with heavy capability > restrictions in a mount namespace (but not a uid namespace) could pass > a credfd out of the namespace. This could break things like Docker > pretty badly. CAP_SYS_ADMIN guards against this to some extent. But > I think that Docker is already totally screwed if a Docker root task > can receive an O_DIRECTORY or O_PATH fd out of the container, so it's > not entirely clear that the situation is any worse, even without > requiring CAP_SYS_ADMIN. > > The second concern is that it may be difficult to use this correctly. > There's a reason that real_cred and cred exist, but it's not really > well set up for being used. > > As a simple way to stay safe, Ganesha could only use credfds that have > real_uid == 0. > > --Andy I still don't quite grok why having this special credfd_create call buys you anything over simply doing what Al had originally suggested -- switch creds using all of the different syscalls and then simply caching that in a "normal" fd: fd = open("/dev/null", O_PATH...); ...it seems to me that the credfd_activate call will still need to do the same permission checking that all of the individual set*id() calls require (and all of the other stuff like changing selinux contexts, etc). IOW, this fd is just a "handle" for passing around a struct cred, but I don't see why having access to that handle would allow you to do something you couldn't already do anyway. Am I missing something obvious here? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, Mar 26, 2014 at 5:42 PM, Serge Hallyn wrote: > Quoting Andy Lutomirski (l...@amacapital.net): >> Hi various people who care about user-space NFS servers and/or >> security-relevant APIs. >> >> I propose the following set of new syscalls: >> >> int credfd_create(unsigned int flags): returns a new credfd that >> corresponds to current's creds. >> >> int credfd_activate(int fd, unsigned int flags): Change current's >> creds to match the creds stored in fd. To be clear, this changes both >> the "subjective" and "objective" (aka real_cred and cred) because >> there aren't any real semantics for what happens when userspace code >> runs with real_cred != cred. > > Is there a URL where I can find the motivation, and why the existing > features can't be used? It was an LSF talk. There's this, though: http://thread.gmane.org/gmane.linux.file-systems/79234 Essentially, it's a performance problem. knfsd has override_creds, and it can cache struct cred. But userspace doing the same thing (i.e. impersonating a user) has to do setresuid, setresgid, and setgroups, which kills performance, since it results in something like five RCU callbacks per impersonation round-trip. Windows has something called an "impersonation token" that more or less solves this problem. > > My guess would be, uid 10 is root in a container, and you want > him to be able to send a request to a root daemon on the host, on > behalf of uid 15 in the container, over which 10 has > privilege. (Which is sort of what we need for the cgmanager proxy; > there we do it by checking checking that 10 is mapped to 0 in > the requestor's uid_map, and that 15 is mapped in that uid_map) > The credfd would be useful there, especially combined with a > credfd_access(credfd, fd, perms) call. This requires uid 15 to send 10 a credfd, right? In general, making this same probably requires a way to make it safe to call credfd_activate on an untrusted credfd. You don't want to expose yourself to ptrace or proc attacks from the credential provider. Nor do you want to suddenly get hit by rlimits, perhaps. So maybe there really does need to be a separate subjective and objective state. Ugh. Ganesha can avoid this because the caller of credfd_create is trusted. > > But I'd like to hear exactly how nfs and ganesha would use these. knfsd will presumably not use it. Ganesha will, and Jim can probably comment further. Samba might want to use it, too. > > What all would be assiciated with the credfd? Everything that is > in the kernel cred? I assume so. --Andy > >> Rules: >> >> - credfd_activate fails (-EINVAL) if fd is not a credfd. >> - credfd_activate fails (-EPERM) if the fd's userns doesn't match >> current's userns. credfd_activate is not intended to be a substitute >> for setns. >> - credfd_activate will fail (-EPERM) if LSM does not allow the >> switch. This probably needs to be a new selinux action -- >> dyntransition is too restrictive. >> >> >> Optional: >> - credfd_create always sets cloexec, because the alternative is silly. >> - credfd_activate fails (-EINVAL) if dumpable. This is because we >> don't want a privileged daemon to be ptraced while impersonating >> someone else. >> - optional: both credfd_create and credfd_activate fail if >> !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). >> >> The first question: does this solve Ganesha's problem? >> >> The second question: is this safe? I can see two major concerns. The >> bigger concern is that having these syscalls available will allow >> users to exploit things that were previously secure. For example, >> maybe some configuration assumes that a task running as uid==1 can't >> switch to uid==2, even with uid 2's consent. Similar issues happen >> with capabilities. If CAP_SYS_ADMIN is not required, then this is no >> longer really true. >> >> Alternatively, something running as uid == 0 with heavy capability >> restrictions in a mount namespace (but not a uid namespace) could pass >> a credfd out of the namespace. This could break things like Docker >> pretty badly. CAP_SYS_ADMIN guards against this to some extent. But >> I think that Docker is already totally screwed if a Docker root task >> can receive an O_DIRECTORY or O_PATH fd out of the container, so it's >> not entirely clear that the situation is any worse, even without >> requiring CAP_SYS_ADMIN. >> >> The second concern is that it may be difficult to use this correctly. >> There's a reason that real_cred and cred exist, but it's not really >> well set up for being used. >> >> As a simple way to stay safe, Ganesha could only use credfds that have >> real_uid == 0. >> >> --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
Quoting Andy Lutomirski (l...@amacapital.net): > Hi various people who care about user-space NFS servers and/or > security-relevant APIs. > > I propose the following set of new syscalls: > > int credfd_create(unsigned int flags): returns a new credfd that > corresponds to current's creds. > > int credfd_activate(int fd, unsigned int flags): Change current's > creds to match the creds stored in fd. To be clear, this changes both > the "subjective" and "objective" (aka real_cred and cred) because > there aren't any real semantics for what happens when userspace code > runs with real_cred != cred. Is there a URL where I can find the motivation, and why the existing features can't be used? My guess would be, uid 10 is root in a container, and you want him to be able to send a request to a root daemon on the host, on behalf of uid 15 in the container, over which 10 has privilege. (Which is sort of what we need for the cgmanager proxy; there we do it by checking checking that 10 is mapped to 0 in the requestor's uid_map, and that 15 is mapped in that uid_map) The credfd would be useful there, especially combined with a credfd_access(credfd, fd, perms) call. But I'd like to hear exactly how nfs and ganesha would use these. What all would be assiciated with the credfd? Everything that is in the kernel cred? > Rules: > > - credfd_activate fails (-EINVAL) if fd is not a credfd. > - credfd_activate fails (-EPERM) if the fd's userns doesn't match > current's userns. credfd_activate is not intended to be a substitute > for setns. > - credfd_activate will fail (-EPERM) if LSM does not allow the > switch. This probably needs to be a new selinux action -- > dyntransition is too restrictive. > > > Optional: > - credfd_create always sets cloexec, because the alternative is silly. > - credfd_activate fails (-EINVAL) if dumpable. This is because we > don't want a privileged daemon to be ptraced while impersonating > someone else. > - optional: both credfd_create and credfd_activate fail if > !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). > > The first question: does this solve Ganesha's problem? > > The second question: is this safe? I can see two major concerns. The > bigger concern is that having these syscalls available will allow > users to exploit things that were previously secure. For example, > maybe some configuration assumes that a task running as uid==1 can't > switch to uid==2, even with uid 2's consent. Similar issues happen > with capabilities. If CAP_SYS_ADMIN is not required, then this is no > longer really true. > > Alternatively, something running as uid == 0 with heavy capability > restrictions in a mount namespace (but not a uid namespace) could pass > a credfd out of the namespace. This could break things like Docker > pretty badly. CAP_SYS_ADMIN guards against this to some extent. But > I think that Docker is already totally screwed if a Docker root task > can receive an O_DIRECTORY or O_PATH fd out of the container, so it's > not entirely clear that the situation is any worse, even without > requiring CAP_SYS_ADMIN. > > The second concern is that it may be difficult to use this correctly. > There's a reason that real_cred and cred exist, but it's not really > well set up for being used. > > As a simple way to stay safe, Ganesha could only use credfds that have > real_uid == 0. > > --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Thoughts on credential switching
Hi various people who care about user-space NFS servers and/or security-relevant APIs. I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the "subjective" and "objective" (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. Rules: - credfd_activate fails (-EINVAL) if fd is not a credfd. - credfd_activate fails (-EPERM) if the fd's userns doesn't match current's userns. credfd_activate is not intended to be a substitute for setns. - credfd_activate will fail (-EPERM) if LSM does not allow the switch. This probably needs to be a new selinux action -- dyntransition is too restrictive. Optional: - credfd_create always sets cloexec, because the alternative is silly. - credfd_activate fails (-EINVAL) if dumpable. This is because we don't want a privileged daemon to be ptraced while impersonating someone else. - optional: both credfd_create and credfd_activate fail if !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). The first question: does this solve Ganesha's problem? The second question: is this safe? I can see two major concerns. The bigger concern is that having these syscalls available will allow users to exploit things that were previously secure. For example, maybe some configuration assumes that a task running as uid==1 can't switch to uid==2, even with uid 2's consent. Similar issues happen with capabilities. If CAP_SYS_ADMIN is not required, then this is no longer really true. Alternatively, something running as uid == 0 with heavy capability restrictions in a mount namespace (but not a uid namespace) could pass a credfd out of the namespace. This could break things like Docker pretty badly. CAP_SYS_ADMIN guards against this to some extent. But I think that Docker is already totally screwed if a Docker root task can receive an O_DIRECTORY or O_PATH fd out of the container, so it's not entirely clear that the situation is any worse, even without requiring CAP_SYS_ADMIN. The second concern is that it may be difficult to use this correctly. There's a reason that real_cred and cred exist, but it's not really well set up for being used. As a simple way to stay safe, Ganesha could only use credfds that have real_uid == 0. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/