On Tue, Sep 8, 2015 at 4:07 PM, Eric W. Biederman <ebied...@xmission.com> wrote: > Andy Lutomirski <l...@amacapital.net> writes: > >> On Tue, Sep 8, 2015 at 3:35 PM, Eric W. Biederman <ebied...@xmission.com> >> wrote: >>> >>> I was thinking a bit about the problem of allowing another process to >>> perform a subset of what your process can perform, and it occured to me >>> there might be something conceptually simple we can do. >>> >>> Have a system call fsyscall that takes a file descriptor the system call >>> number and the parameters to that system call as arguments. AKA >>> long fsyscall(int fd, long number, ...); AKA syscall with a file >>> desciptor argument. >>> >>> The fd would hold a struct cred, and a filter that limits what system >>> calls and which parameters may be passed. >>> >>> The implementation of fsyscall would be something like: >>> old = override_creds(f->f_cred); >>> /* Perform filtered syscallf */ >>> revert_creds(old); >>> >>> Then we have another system call call it fsyscall_create(...) that takes >>> a bpf filter and returns a file descriptor, that can be used with >>> fsyscall. >>> >>> I'm not certain that bpf is the best way to create such a filter but it >>> seems plausible, and we already have the infrastructure in place, so if >>> nothing else there would be synergy in syscall filtering. >>> >>> My two concerns with bpf are (a) it seems a little complex for the >>> simplest use cases. (b) I think there cases like inspecting the data >>> passed into write, or send, or the structure passed into ioctl that it >>> doesn't handle well yet. >>> >>> Andy does a fsyscall system call sound like something that would be not >>> be too bad to implement? (You have just been through all of the x86 >>> system call paths recently). >> >> It's not possible yet due to nasty calling convention issues. >> (Entries in the x86 syscall table aren't actually functions callable >> using the C ABI right now.) My pending monster patchset will make it >> possible to implement for 32-bit syscalls (native and compat). I'm >> planning on addressing 64-bit, and I want to do almost the reverse of >> what you're proposing: have a way that one task can trap into a >> special mode in which another process can do syscalls on its behalf. > > Hmm. That seems comparatively dangerous to me. > >> There are some syscalls for which this simply makes no sense. >> Setresuid, capset, and similar come to mind. Clone and friends may >> screw up impressively if you try this. fsyscall should not be allowed >> to call itself. If you call write(2) like this and it has any >> meaningful effect, something's wrong. > > If you peak into the data that is being written it can be meaningful on > write(2). > > Hmm. But yes for file descriptor based system calls this is much less > interesting. Having some kind of wrapper that embeds one file > descriptor in another and does the filtering that way seems more > interesting, for the file descriptor based methods. > >> keyctl(2) does really awful >> things wrt struct cred, and I don't really want to think about what >> happens if you try calling it like this. >> >> override_creds is IMO awful. Serge and I had an old discussion on how >> to maybe fix it. >> >> Honestly, I think the way to go might be to get Capsicum, or at least >> Capsicum's fd model, merged and to add a mode in which the *at >> operations on a specially marked fd use the passed fd's f_cred instead >> of the caller's. (Cc: David Drysdale -- that feature might be really >> nice.) > > Perhaps I had missed it but I don't recall capsicum being able to wrap > things like reboot(2). >
Ah, so you want to be able to grant BPF-defined capabilities :) Off the top of my head, I think that doing this using a nice IPC mechanism (which barely exists in Linux, but which seL4 and binder (!) can do very cleanly) would be simpler and more general, if less self-contained. (Aside: how on earth does anyone think that replacing binder with kdbus makes any sense? Binder can pass capabilities, and kdbus can't. OTOH, maybe Android doesn't use the capability-passing ability.) > Which really describes what I am trying to tackle. How do we create an > object that we can pass between processes that limits what we can do in > the case of the oddball syscalls that require special privileges. > > At the same time I still want the caller to be able to pass in data to > the system calls being called such as REBOOT_CMD_POWER_OFF versus > REBOOT_CMD_HALT, while being able to filter it and say you may not pass > REBOOT_CMD_CAD_OFF. > We could have a conservative whitelist of syscalls for which we allow this usage. I'm a bit worried that there will be very limited use cases, given that a lot of use cases will want to follow pointers, which has TOCTOU problems. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/