On Wed, Oct 28, 2020 at 07:39:41PM +0100, Jann Horn wrote: > On Wed, Oct 28, 2020 at 7:35 PM Rich Felker <dal...@libc.org> wrote: > > On Wed, Oct 28, 2020 at 07:25:45PM +0100, Jann Horn wrote: > > > On Wed, Oct 28, 2020 at 6:52 PM Rich Felker <dal...@libc.org> wrote: > > > > On Wed, Oct 28, 2020 at 06:34:56PM +0100, Jann Horn wrote: > > > > > On Wed, Oct 28, 2020 at 5:49 PM Rich Felker <dal...@libc.org> wrote: > > > > > > On Wed, Oct 28, 2020 at 01:42:13PM +0100, Jann Horn wrote: > > > > > > > On Wed, Oct 28, 2020 at 12:18 PM Camille Mougey > > > > > > > <comm...@gmail.com> wrote: > > > > > > > You're just focusing on execve() - I think it's important to keep > > > > > > > in > > > > > > > mind what happens after execve() for normal, dynamically-linked > > > > > > > binaries: The next step is that the dynamic linker runs, and it > > > > > > > will > > > > > > > poke around in the file system with access() and openat() and > > > > > > > fstat(), > > > > > > > it will mmap() executable libraries into memory, it will > > > > > > > mprotect() > > > > > > > some memory regions, it will set up thread-local storage (e.g. > > > > > > > using > > > > > > > arch_prctl(); even if the process is single-threaded), and so on. > > > > > > > > > > > > > > The earlier you install the seccomp filter, the more of these > > > > > > > steps > > > > > > > you have to permit in the filter. And if you want the filter to > > > > > > > take > > > > > > > effect directly after execve(), the syscalls you'll be forced to > > > > > > > permit are sufficient to cobble something together in userspace > > > > > > > that > > > > > > > effectively does almost the same thing as execve(). > > > > > > > > > > > > I would assume you use SECCOMP_RET_USER_NOTIF to implement policy > > > > > > for > > > > > > controlling these operations and allowing only the ones that are > > > > > > valid > > > > > > during dynamic linking. This also allows you to defer application of > > > > > > the filter until after execve. So unless I'm missing some reason why > > > > > > this doesn't work, I think the requested functionality is already > > > > > > available. > > > > > > > > > > Ah, yeah, good point. > > > > > > > > > > > If you really just want the "activate at exec" behavior, it might be > > > > > > possible (depending on how SECCOMP_RET_USER_NOTIF behaves when > > > > > > there's > > > > > > no notify fd open; I forget) > > > > > > > > > > syscall returns -ENOSYS. Yeah, that'd probably do the job. (Even > > > > > though it might be a bit nicer if userspace had control over the errno > > > > > there, such that it could be EPERM instead... oh well.) > > > > > > > > EPERM is a major bug in current sandbox implementations, so ENOSYS is > > > > at least mildly better, but indeed it should be controllable, probably > > > > by allowing a code path for the BPF to continue with a jump to a > > > > different logic path if the notify listener is missing. > > > > > > I guess we might be able to expose the listener status through a bit / > > > a field in the struct seccomp_data, and then filters could branch on > > > that. (And the kernel would run the filter twice if we raced with > > > filter detachment.) I don't know whether it would look pretty, but I > > > think it should be doable... > > > > I was thinking the race wouldn't be salvagable, but indeed since the > > filter is side-effect-free you can just re-run it if the status > > changes between start of filter processing and the attempt at > > notification. This sounds like it should work. > > > > I guess it's not possible to chain two BPF filters to do this, because > > that only works when the first one allows? Or am I misunderstanding > > the multiple-filters case entirely? (I've never gotten that far with > > programming it.) > > I'm not sure if I'm understanding the question correctly... > At the moment you basically can't have multiple filters with notifiers. > The rule with multiple filters is always that all the filters get run, > and the actual action taken is the most restrictive result of all of > them.
I probably just don't understand how multiple filters work then, which is pretty much what I expected. But in any case it seems correct that they're not a tool for solving the problem here. Rich