On Mon, Mar 06, 2017 at 02:14:59PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
> 
> > ==================================================================
> > BUG: KASAN: use-after-free in atomic_dec_and_test
> > arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_task_struct
> > include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
> 
> FWIW, this output is very confusing, is this a result of your
> post-processing replicating the line for every 'inlined' part?
> 
> > kernel/events/core.c:1131 at addr ffff880079c30158
> > Write of size 4 by task syz-executor6/25698
> 
> >  atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
> >  put_task_struct include/linux/sched/task.h:93 [inline]
> >  put_ctx+0xcf/0x110 kernel/events/core.c:1131
> >  perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
> >  perf_release+0x37/0x50 kernel/events/core.c:4338
> >  __fput+0x332/0x800 fs/file_table.c:209
> >  ____fput+0x15/0x20 fs/file_table.c:245
> >  task_work_run+0x197/0x260 kernel/task_work.c:116
> >  exit_task_work include/linux/task_work.h:21 [inline]
> >  do_exit+0xb38/0x29c0 kernel/exit.c:880
> >  do_group_exit+0x149/0x420 kernel/exit.c:984
> >  get_signal+0x7e0/0x1820 kernel/signal.c:2318
> >  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
> >  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
> >  syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
> >  do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
> >  entry_SYSCALL64_slow_path+0x25/0x25
> 
> So this is fput()..
> 
> 
> > Freed:
> > PID = 25681
> >  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> >  set_track mm/kasan/kasan.c:525 [inline]
> >  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
> >  __cache_free mm/slab.c:3514 [inline]
> >  kmem_cache_free+0x71/0x240 mm/slab.c:3774
> >  free_task_struct kernel/fork.c:158 [inline]
> >  free_task+0x151/0x1d0 kernel/fork.c:370
> >  copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
> >  copy_process kernel/fork.c:1531 [inline]
> >  _do_fork+0x200/0x1010 kernel/fork.c:1994
> >  SYSC_clone kernel/fork.c:2104 [inline]
> >  SyS_clone+0x37/0x50 kernel/fork.c:2098
> >  do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
> >  return_from_SYSCALL_64+0x0/0x7a
> 
> and this is a failed fork().
> 
> 
> However, inherited events don't have a filedesc to fput(), and
> similarly, a task that fails for has never been visible to attach a perf
> event to because it never hits the pid-hash.
> 
> Or so it is assumed.
> 
> I'm forever getting lost in the PID code. Oleg, is there any way
> find_task_by_vpid() can return a task that can still fail fork() ?

So I _think_ find_task_by_vpid() can return an already dead task; and
we'll happily increase task->usage.

Dmitry; I have no idea how easy it is for you to reproduce the thing;
but so far I've not had much success. Could you perhaps stick the below
in?

Once we convert task_struct to refcount_t that should generate a WARN of
its own I suppose.

---

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 000fdb2..612d652 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -763,6 +763,7 @@ struct perf_event_context {
 #ifdef CONFIG_CGROUP_PERF
        int                             nr_cgroups;      /* cgroup evts */
 #endif
+       int                             switches;
        void                            *task_ctx_data; /* pmu specific data */
        struct rcu_head                 rcu_head;
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6f41548f..6455b7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2902,6 +2902,8 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
        if (!parent && !next_parent)
                goto unlock;
 
+       ctx->switches++;
+
        if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
                /*
                 * Looks like the two contexts are clones, so we might be
@@ -3780,6 +3782,12 @@ find_lively_task_by_vpid(pid_t vpid)
                task = current;
        else
                task = find_task_by_vpid(vpid);
+
+       if (task) {
+               if (WARN_ON_ONCE(task->flags & PF_EXITING))
+                       task = NULL;
+       }
+
        if (task)
                get_task_struct(task);
        rcu_read_unlock();
@@ -10432,6 +10440,10 @@ void perf_event_free_task(struct task_struct *task)
 
                mutex_unlock(&ctx->mutex);
 
+               WARN_ON_ONCE(ctx->switches);
+               WARN_ON_ONCE(atomic_read(&ctx->refcount) != 1);
+               WARN_ON_ONCE(ctx->task != task);
+
                put_ctx(ctx);
        }
 }

Reply via email to