On Thu, May 26, 2016 at 2:04 PM, Kees Cook <keesc...@chromium.org> wrote: > One problem with seccomp was that ptrace could be used to change a > syscall after seccomp filtering had completed. This was a well documented > limitation, and it was recommended to block ptrace when defining a filter > to avoid this problem. This can be quite a limitation for containers or > other places where ptrace is desired even under seccomp filters. > > Since seccomp filtering has been split into pre-trace and trace phases > (phase1 and phase2 respectively), it's possible to re-run phase1 seccomp > after ptrace. This makes that change, and updates the test suite for > both SECCOMP_RET_TRACE and PTRACE_SYSCALL manipulation.
I like fixing the hole, but I don't like this fix. The two-phase seccomp mechanism is messy. I wrote it because it was a huge speedup. Since then, I've made a ton of changes to the way that x86 syscalls work, and there are two relevant effects: the slow path is quite fast, and the phase-1-only path isn't really a win any more. I suggest that we fix the by simplifying the code instead of making it even more complicated. Let's back out the two-phase mechanism (but keep the ability for arch code to supply seccomp_data) and then just reorder it so that seccomp happens after ptrace. The result should be considerably simpler. (We'll still have to answer the question of what happens when a SECCOMP_RET_TRACE event changes the syscall, but maybe the answer is to just let it through -- after all, SECCOMP_RET_TRACE might be a request by a tracer to do its own internal filtering.) --Andy