The breakpoints core deals with limited architecture resources that need
to be shared among all the breakpoint perf events. For example x86 has
4 breakpoint registers per-CPU. This means that a task can't open more
than 4 breakpoints in this architecture.

To control that we must track the number of breakpoints held for each
task. This is simply implemented with a global list of all breakpoints
that we walk and count on top of their target task.

But this wrongly assume that the target task of a perf event remains
stable after the event creation. Context tasks can get swapped on context
switches and it's fairly common for a perf event's initial target
(event->hw.target) to become irrelevantly out of date at
struct perf_event::destroy() time.

Here is an interesting scenario where it matters: TASK A and B are clones
that share 4 equivalent breakpoints. A and B contexts are swapped on
context switch. So task A now carries the breakpoints that were initialized
in B. Task A quits and release its breakpoints whose target (event->hw.target)
still point to B. Therefore the breakpoints slots are spuriously released
on B. Afterward, if task B were to create a 5th breakpoint then it would
be accepted. But on architecture breakpoint installation time, that 5th
breakpoint will be eventually rejected:

        Can't find any breakpoint slot
         WARNING: CPU: 6 PID: 2433 at arch/x86/kernel/hw_breakpoint.c:109 
arch_install_hw_breakpoint+0x176/0x180
         Modules linked in:
         CPU: 6 PID: 2433 Comm: bp5 Not tainted 5.2.0-rc6+ #20
         Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 
08/18/2013
         RIP: 0010:arch_install_hw_breakpoint+0x176/0x180
         Code: 57 ff ff ff 0f 23 c8 e9 4f ff ff ff 0f 23 d0 e9 47 ff ff ff 48 
c7 c7 d8 cd 2b 82 89 45 d0 c6 05 d4 bc 4f 01 01 e8 0a b8 09 00 <0f> 0b 8b 45 d0 
e9 ed fe ff ff 0f 1f 44 00 00 55 48 89 e5 41 57 41
         RSP: 0018:ffffc900006dfa48 EFLAGS: 00010082
         RAX: 0000000000000000 RBX: ffff88821fb17f78 RCX: 0000000000000000
         RDX: ffffffff81148b59 RSI: 0000000000000001 RDI: ffffffff81148bb5
         RBP: ffffc900006dfa78 R08: 0000000000000000 R09: 0000000000000000
         R10: ffffc900006dfaa0 R11: 0000000000000981 R12: 0000000000000003
         R13: ffff88821206b800 R14: 0000000000017f80 R15: 0000000000000004
         FS:  00007f46d1a54440(0000) GS:ffff88821fb00000(0000) 
knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 00007f46d1590450 CR3: 000000020f1c8004 CR4: 00000000001706e0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000999906aa
         Call Trace:
          hw_breakpoint_add+0x44/0x50
          event_sched_in.isra.119+0xab/0x240
          group_sched_in+0x60/0x150
          pinned_sched_in+0x85/0x170
          ? group_sched_in+0x150/0x150
          visit_groups_merge+0x10a/0x180
          ctx_sched_in.isra.89+0x12b/0x150
          perf_event_sched_in.isra.91+0x2f/0x70
          ctx_resched+0x54/0x90
          __perf_install_in_context+0x144/0x170
          ? perf_duration_warn+0x40/0x40
          remote_function+0x4a/0x60
          generic_exec_single+0xad/0x110
          ? perf_duration_warn+0x40/0x40
          smp_call_function_single+0x9c/0x150
          ? perf_duration_warn+0x40/0x40
          task_function_call+0x4b/0x80
          ? task_function_call+0x4b/0x80
          ? __perf_event_enable+0x140/0x140
          perf_install_in_context+0x99/0x160
          __do_sys_perf_event_open+0xaf5/0xd80
          __x64_sys_perf_event_open+0x20/0x30
          do_syscall_64+0x4f/0x1b0
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
         RIP: 0033:0x7f46d1590469
         Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 
ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
         RSP: 002b:00007ffc86f0f488 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
         RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f46d1590469
         RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000564f0c3a0080
         RBP: 00007ffc86f0f4b0 R08: 0000000000000000 R09: 00007ffc86f0f598
         R10: 00000000ffffffff R11: 0000000000000246 R12: 0000564f0c19f7e0
         R13: 00007ffc86f0f590 R14: 0000000000000000 R15: 0000000000000000
         irq event stamp: 12252
         hardirqs last  enabled at (12251): [<ffffffff81282909>] 
kmem_cache_alloc+0xb9/0x6e0
         hardirqs last disabled at (12252): [<ffffffff81184637>] 
generic_exec_single+0xa7/0x110
         softirqs last  enabled at (12234): [<ffffffff81e003e7>] 
__do_softirq+0x3e7/0x497
         softirqs last disabled at (12227): [<ffffffff810d4558>] 
irq_exit+0xc8/0xd0

We can't easily support dynamic event context swap when breakpoints are
involved. The easiest solution to fix the issue is to pin a context as
long as it contains a breakpoint PMU.

Reported-by: [email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
 kernel/events/hw_breakpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index c5cd852fe86b..514d704d60a2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -656,6 +656,7 @@ static struct pmu perf_breakpoint = {
        .start          = hw_breakpoint_start,
        .stop           = hw_breakpoint_stop,
        .read           = hw_breakpoint_pmu_read,
+       .pin_ctx        = 1,
 };
 
 int __init init_hw_breakpoint(void)
-- 
2.21.0

Reply via email to