Alexander Shishkin <alexander.shish...@linux.intel.com> writes: > Peter Zijlstra <pet...@infradead.org> writes: > >> Alex, any clue? > > Let me look into it. Definitely haven't seen anything like that in my > tests. > >> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote: >>> On Thu, 11 Jun 2015, Vince Weaver wrote: >>> >>> > and while I was trying to cut and paste that, the locked haswell just >>> > took >>> > down the network switch so I can't get the rest until tomorrow. >>> >>> here are the full dumps if anyone is interested >>> >>> the warning is reproducible, the spinlock disaster is not. >>> >>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410 >>> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: >>> perf_fuzzer/3410, .owner_cpu: 4 >>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W >>> 4.1.0-rc7+ #155 >>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS >>> 01/26/2014 >>> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 >>> 0000000000000054 >>> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 >>> ffff88011edf7cd0 >>> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 >>> ffff88011edf7cd0 >>> [36299.046338] Call Trace: >>> [36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57 >>> [36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0 >>> [36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30 >>> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180 >>> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40 >>> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620 >>> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620 >>> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550 >>> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40 >>> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0 >>> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50 >>> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0 > > This one goes to free aux pages from nmi context, looks like aux buffer > was unmapped while the event was running, so here it dropped the last > reference.
Ok, here's what I propose for this one. >From b8cd18bc440c318e8b30825bf654c815b42aa1e0 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin <alexander.shish...@linux.intel.com> Date: Tue, 16 Jun 2015 14:14:04 +0300 Subject: [PATCH] perf: Free AUX area pages in rcu callback Currently, if the user unmaps AUX area while the corresponding event is active, perf_aux_output_end() may be the last one to drop the aux area refcount, and end up freeing the pages in NMI context or scheduler's fast path. Same can happen in the error path of perf_aux_output_begin(). To avoid the bug, this patch moves actual freeing code to rb_free_rcu(), which will know whether it is called for AUX area or the ring buffer proper and act accordingly. Signed-off-by: Alexander Shishkin <alexander.shish...@linux.intel.com> Reported-by: Vince Weaver <vincent.wea...@maine.edu> --- kernel/events/core.c | 31 ++++++++++++++++++++++++++++++- kernel/events/internal.h | 1 + kernel/events/ring_buffer.c | 8 +------- 3 files changed, 32 insertions(+), 8 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index eddf1ed415..5f1cc5976f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4381,7 +4381,36 @@ static void rb_free_rcu(struct rcu_head *rcu_head) struct ring_buffer *rb; rb = container_of(rcu_head, struct ring_buffer, rcu_head); - rb_free(rb); + + /* + * are we called for AUX or the rb: + * AUX always goes first, then if rb::refcount drops to zero, + * free rb synchronously + */ + if (atomic_read(&rb->refcount)) { + __rb_free_aux(rb); + + /* matches the increment in rb_free_aux() */ + if (atomic_dec_and_test(&rb->refcount)) + rb_free(rb); + } else { + rb_free(rb); + } +} + +void rb_free_aux(struct ring_buffer *rb) +{ + /* + * hold rb::refcount to make sure rb doesn't disappear + * before aux pages are freed + */ + if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount))) + return; + + if (atomic_dec_and_test(&rb->aux_refcount)) + call_rcu(&rb->rcu_head, rb_free_rcu); + else + ring_buffer_put(rb); /* matches the increment above */ } struct ring_buffer *ring_buffer_get(struct perf_event *event) diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 9f6ce9ba4a..7f8242ed85 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -61,6 +61,7 @@ extern void perf_event_wakeup(struct perf_event *event); extern int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event, pgoff_t pgoff, int nr_pages, long watermark, int flags); extern void rb_free_aux(struct ring_buffer *rb); +extern void __rb_free_aux(struct ring_buffer *rb); extern struct ring_buffer *ring_buffer_get(struct perf_event *event); extern void ring_buffer_put(struct ring_buffer *rb); diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 725c416085..343121e943 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -537,7 +537,7 @@ out: return ret; } -static void __rb_free_aux(struct ring_buffer *rb) +void __rb_free_aux(struct ring_buffer *rb) { int pg; @@ -554,12 +554,6 @@ static void __rb_free_aux(struct ring_buffer *rb) rb->aux_nr_pages = 0; } -void rb_free_aux(struct ring_buffer *rb) -{ - if (atomic_dec_and_test(&rb->aux_refcount)) - __rb_free_aux(rb); -} - #ifndef CONFIG_PERF_USE_VMALLOC /* -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/