[tip:perf/urgent] perf: Fix race in perf_event_exit_task_context( )

tip-bot for Peter Zijlstra Thu, 28 Jan 2016 11:11:31 -0800

Commit-ID:  6a3351b612b72c558910c88a43e2ef6d7d68bc97
Gitweb:     http://git.kernel.org/tip/6a3351b612b72c558910c88a43e2ef6d7d68bc97
Author:     Peter Zijlstra <pet...@infradead.org>
AuthorDate: Mon, 25 Jan 2016 14:09:54 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Thu, 28 Jan 2016 20:06:36 +0100


perf: Fix race in perf_event_exit_task_context()

There is a race between perf_event_exit_task_context() and
orphans_remove_work() which results in a use-after-free.

We mark ctx->task with TASK_TOMBSTONE to indicate a context is
'dead', under ctx->lock. After which point event_function_call()
on any event of that context will NOP

A concurrent orphans_remove_work() will only hold ctx->mutex for
the list iteration and not serialize against this. Therefore its
possible that orphans_remove_work()'s perf_remove_from_context()
call will fail, but we'll continue to free the event, with the
result of free'd memory still being on lists and everything.

Once perf_event_exit_task_context() gets around to acquiring
ctx->mutex it too will iterate the event list, encounter the
already free'd event and proceed to free it _again_. This fails
with the WARN in free_event().

Plug the race by having perf_event_exit_task_context() hold
ctx::mutex over the whole tear-down, thereby 'naturally'
serializing against all other sites, including the orphan work.

Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: Arnaldo Carvalho de Melo <a...@redhat.com>
Cc: Jiri Olsa <jo...@redhat.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Stephane Eranian <eran...@google.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Vince Weaver <vincent.wea...@maine.edu>
Cc: alexander.shish...@linux.intel.com
Cc: dsah...@gmail.com
Cc: namhy...@kernel.org
Link: 
http://lkml.kernel.org/r/20160125130954.gy6...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/events/core.c | 50 +++++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6759f2a..1d243fa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8748,14 +8748,40 @@ static void perf_event_exit_task_context(struct 
task_struct *child, int ctxn)
 {
        struct perf_event_context *child_ctx, *clone_ctx = NULL;
        struct perf_event *child_event, *next;
-       unsigned long flags;
 
        WARN_ON_ONCE(child != current);
 
-       child_ctx = perf_lock_task_context(child, ctxn, &flags);
+       child_ctx = perf_pin_task_context(child, ctxn);
        if (!child_ctx)
                return;
 
+       /*
+        * In order to reduce the amount of tricky in ctx tear-down, we hold
+        * ctx::mutex over the entire thing. This serializes against almost
+        * everything that wants to access the ctx.
+        *
+        * The exception is sys_perf_event_open() /
+        * perf_event_create_kernel_count() which does find_get_context()
+        * without ctx::mutex (it cannot because of the move_group double mutex
+        * lock thing). See the comments in perf_install_in_context().
+        *
+        * We can recurse on the same lock type through:
+        *
+        *   __perf_event_exit_task()
+        *     sync_child_event()
+        *       put_event()
+        *         mutex_lock(&ctx->mutex)
+        *
+        * But since its the parent context it won't be the same instance.
+        */
+       mutex_lock(&child_ctx->mutex);
+
+       /*
+        * In a single ctx::lock section, de-schedule the events and detach the
+        * context from the task such that we cannot ever get it scheduled back
+        * in.
+        */
+       raw_spin_lock_irq(&child_ctx->lock);
        task_ctx_sched_out(__get_cpu_context(child_ctx), child_ctx);
 
        /*
@@ -8767,14 +8793,8 @@ static void perf_event_exit_task_context(struct 
task_struct *child, int ctxn)
        WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE);
        put_task_struct(current); /* cannot be last */
 
-       /*
-        * If this context is a clone; unclone it so it can't get
-        * swapped to another process while we're removing all
-        * the events from it.
-        */
        clone_ctx = unclone_ctx(child_ctx);
-       update_context_time(child_ctx);
-       raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
+       raw_spin_unlock_irq(&child_ctx->lock);
 
        if (clone_ctx)
                put_ctx(clone_ctx);
@@ -8786,18 +8806,6 @@ static void perf_event_exit_task_context(struct 
task_struct *child, int ctxn)
         */
        perf_event_task(child, child_ctx, 0);
 
-       /*
-        * We can recurse on the same lock type through:
-        *
-        *   __perf_event_exit_task()
-        *     sync_child_event()
-        *       put_event()
-        *         mutex_lock(&ctx->mutex)
-        *
-        * But since its the parent context it won't be the same instance.
-        */
-       mutex_lock(&child_ctx->mutex);
-
        list_for_each_entry_safe(child_event, next, &child_ctx->event_list, 
event_entry)
                __perf_event_exit_task(child_event, child_ctx, child);

[tip:perf/urgent] perf: Fix race in perf_event_exit_task_context( )

Reply via email to