On Wed, Aug 3, 2016 at 4:51 PM, Robert O'Callahan <rob...@ocallahan.org> wrote: > I work on rr (http://rr-project.org/), a record-and-replay reverse-execution > debugger which is a heavy user of ptrace and seccomp. The recent change to > perform syscall-entry PTRACE_SYSCALL stops before PTRACE_EVENT_SECCOMP stops > broke rr, which is fine because I'm fixing rr and this change actually makes > rr faster (thanks!). However, it exposed an existing kernel bug which > creates a problem for us, and which I'm not sure how to fix. > > The problem is that if a tracee task is in a PTRACE_EVENT_SECCOMP trap, or > has been resumed after such a trap but not yet been scheduled, and another > task in the thread-group calls exit_group(), then the tracee task exits > without the ptracer receiving a PTRACE_EVENT_EXIT notification. Small-ish > testcase here: > https://gist.github.com/rocallahan/1344f7d01183c233d08a2c6b93413068. > > The bug happens because when __seccomp_filter() detects > fatal_signal_pending(), it calls do_exit() without dequeuing the fatal > signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and that > task is descheduled, __schedule() notices that there is a fatal signal > pending and changes its state from TASK_TRACED to TASK_RUNNING. That > prevents the ptracer's waitpid() from returning the ptrace event. A more > detailed analysis is here: > https://github.com/mozilla/rr/issues/1762#issuecomment-237396255. > > This bug has been in the kernel for a while. rr never hit it before because > we trace all threads and mostly run only one tracee thread at a time. > Immediately after each PTRACE_EVENT_SECCOMP notification we'd issue a > PTRACE_SYSCALL to get that task to the syscall-entry PTRACE_SYSCALL stop, so > there was never an opportunity for one tracee thread to call exit_group > while another tracee was in the problematic part of __seccomp_filter(). > Unfortunately now there is no way for us to avoid that possibility. > > My guess is that __seccomp_filter() should dequeue the fatal signal it > detects before calling do_exit(), to behave more like get_signal(). Is that > correct, and if so, what would be the right way to do that?
Thanks for the detailed analysis! I'll take a look at what can be done here. Off the top of my head, I don't see a problem with what you're suggesting. Let me see what I can come up with. -Kees > > Thanks, > Robert O'Callahan > -- > lbir ye,ea yer.tnietoehr rdn rdsme,anea lurpr edna e hnysnenh hhe uresyf > toD > selthor stor edna siewaoeodm or v sstvr esBa kbvted,t > rdsme,aoreseoouoto > o l euetiuruewFa kbn e hnystoivateweh uresyf tulsa rehr rdm or rnea lurpr > .a war hsrer holsa rodvted,t nenh hneireseoouot.tniesiewaoeivatewt sstvr > esn -- Kees Cook Brillo & Chrome OS Security