In perf_output_put_handle(), an IRQ/NMI can happen in below location and write records to the same ring buffer: ... local_dec_and_test(&rb->nest) ... <-- an IRQ/NMI can happen here rb->user_page->data_head = head; ...
In this case, a value A is written to data_head in the IRQ, then a value B is written to data_head after the IRQ. And A > B. As a result, data_head is temporarily decreased from A to B. And a reader may see data_head < data_tail if it read the buffer frequently enough, which creates unexpected behaviors. This can be fixed by moving dec(&rb->nest) to after updating data_head, which prevents the IRQ/NMI above from updating data_head. Signed-off-by: Yabin Cui <yab...@google.com> --- kernel/events/ring_buffer.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 674b35383491..0b9aefe13b04 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -54,8 +54,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * IRQ/NMI can happen here, which means we can miss a head update. */ - if (!local_dec_and_test(&rb->nest)) + if (local_read(&rb->nest) > 1) { + local_dec(&rb->nest); goto out; + } /* * Since the mmap() consumer (userspace) can run on a different CPU: @@ -86,6 +88,13 @@ static void perf_output_put_handle(struct perf_output_handle *handle) smp_wmb(); /* B, matches C */ rb->user_page->data_head = head; + /* + * Clear rb->nest after updating data_head. This prevents IRQ/NMI from + * updating data_head before us. If that happens, we will expose a + * temporarily decreased data_head. + */ + local_set(&rb->nest, 0); + /* * Now check if we missed an update -- rely on previous implied * compiler barriers to force a re-read. -- 2.21.0.1020.gf2820cf01a-goog