On Sat, May 05, 2018 at 01:29:12PM -0500, Josh Poimboeuf wrote: > On Sat, May 05, 2018 at 11:38:16AM -0400, Vince Weaver wrote: > > On Fri, 4 May 2018, Josh Poimboeuf wrote: > > > > > > The 'nmi_restore' warning points to a bug in my patch, but the others > > > are head scratchers. Here's a patch which combines the first two > > > patches, plus improves the existing warnings a bit. Can you try it? > > > > with that updated patch I hit > > > > May 4 21:51:20 haswell kernel: [19245.450607] WARNING: stack recursion on > > stack type 2 > > May 4 22:21:29 haswell kernel: [21055.268717] WARNING: can't dereference > > registers at 000000006546ba71 for ip ret_from_intr+0x6/0x1d > > May 4 22:36:22 haswell kernel: [21948.106762] WARNING: stack going in the > > wrong direction? ip=native_sched_clock+0xe/0x90 > > May 4 22:36:22 haswell kernel: [21948.115377] WARNING: stack going in the > > wrong direction? ip=native_sched_clock+0xe/0x90 > > May 4 22:36:22 haswell kernel: [21948.124086] WARNING: stack going in the > > wrong direction? ip=native_sched_clock+0xd/0x90 > > May 4 22:36:22 haswell kernel: [21948.124088] WARNING: stack going in the > > wrong direction? ip=intel_pmu_handle_irq+0x12/0x4a0 > > May 4 22:36:22 haswell kernel: [21948.124097] WARNING: stack going in the > > wrong direction? ip=native_sched_clock+0xe/0x90 > > May 4 22:36:22 haswell kernel: [21948.150189] WARNING: stack going in the > > wrong direction? ip=native_sched_clock+0xe/0x90 > > May 4 22:36:22 haswell kernel: [21948.150199] WARNING: stack going in the > > wrong direction? ip=intel_pmu_handle_irq+0xe/0x4a0 > > > > the last bit repeated for a few minutes (flooding the log with a few > > thousand entries that look mostly similar) > > Thanks. I can recreate now, so I'll stop bugging you for a bit. This > fuzzer is really good at finding unwinder issues.
Deja vu. Most of these are related to perf PEBS, similar to the following issue: b8000586c90b ("perf/x86/intel: Cure bogus unwind from PEBS entries") This is basically the ORC version of that. setup_pebs_sample_data() is assembling a franken-pt_regs which ORC isn't happy about. RIP is inconsistent with some of the other registers (like RSP and RBP). Peter, any ideas? -- Josh