On Fri, Oct 30 2020 at 10:00, Peter Zijlstra wrote: > On Fri, Oct 30, 2020 at 12:27:22AM -0400, Steven Rostedt wrote: >> I found a bug in the recursion protection that prevented function >> tracing from running in NMI context. Applying this fix to 5.9 worked >> fine (tested by running perf record and function tracing at the same >> time). But when I applied the patch to 5.10-rc1, it blew up with a >> stack overflow: > > So we just blew away our NMI stack, right?
Looks like that: >> RSP: 0018:fffffe000003c000 EFLAGS: 00010046 Clearly a page boundary. >> RAX: 000000000000001c RBX: ffff928ada27b400 RCX: 0000000000000000 >> RDX: ffff928ada07b200 RSI: fffffe000003c028 RDI: ffff928ada27b400 >> RBP: ffff928ada27b4f0 R08: 0000000000000001 R09: 0000000000000000 >> R10: fffffe000003c440 R11: ffff928a7383cc60 R12: fffffe000003c028 >> R13: 00000000000003e8 R14: 0000000000000046 R15: 0000000000110001 >> FS: 00007f25d43cf780(0000) GS:ffff928adaa40000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: fffffe000003bff8 CR3: 00000000b52a8005 CR4: 00000000001707e0 and CR2 says it tried below. >> I bisected it down to: >> >> 35d1ce6bec133679ff16325d335217f108b84871 ("perf/x86/intel/ds: Fix >> x86_pmu_stop warning for large PEBS") >> >> Which looks to be storing an awful lot on the stack: >> >> static void __intel_pmu_pebs_event(struct perf_event *event, >> struct pt_regs *iregs, >> void *base, void *top, >> int bit, int count, >> void (*setup_sample)(struct perf_event *, >> struct pt_regs *, >> void *, >> struct perf_sample_data *, >> struct pt_regs *)) >> { >> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> struct hw_perf_event *hwc = &event->hw; >> struct perf_sample_data data; >> struct x86_perf_regs perf_regs; >> struct pt_regs *regs = &perf_regs.regs; >> void *at = get_next_pebs_record_by_bit(base, top, bit); >> struct pt_regs dummy_iregs; > > The only thing I can come up with in a hurry is that that dummy_iregs > thing really should be static. That's 168 bytes of stack out the window > right there. What's worse is perf_sample_data which is 384 bytes and is 64 bytes aligned. > Still, this seems to suggest (barring some actual issue hidding in those > 135 lost lines, we're very close to the limit on the NMI stack, which is > a single 4k page IIRC. Yes, unless KASAN is enabled Thanks, tglx