On Tue, 12 Feb 2019 17:27:56 -0800 Kees Cook <keesc...@chromium.org> wrote:
> On Wed, May 9, 2018 at 10:18 PM Dmitry Vyukov <dvyu...@google.com> wrote: > > > > On Sat, Jan 27, 2018 at 8:58 PM, syzbot > > <syzbot+352bd10e338d9a90e...@syzkaller.appspotmail.com> wrote: > > > Hello, > > > > > > syzbot hit the following crash on bpf-next commit > > > 8223967fe0b8eb2448cca5cfe3c64a0838e6f60d (Sat Jan 27 01:06:23 2018 +0000) > > > Merge branch 'fix-lpm-map' > > > > > > So far this crash happened 2 times on bpf-next. > > > C reproducer is attached. > > > syzkaller reproducer is attached. > > > Raw console output is attached. > > > compiler: gcc (GCC) 7.1.1 20170620 > > > .config is attached. > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+352bd10e338d9a90e...@syzkaller.appspotmail.com > > > It will help syzbot understand when the bug is fixed. See footer for > > > details. > > > If you forward the report, please keep this part and the footer. > > > > > > This was bisected to: > > > > commit 4ee7c60de83ac01fa4c33c55937357601631e8ad > > Author: Steven Rostedt (VMware) <rost...@goodmis.org> > > Date: Fri Mar 23 10:18:03 2018 -0400 > > > > init, tracing: Add initcall trace events > > > > > > > > > WARNING: CPU: 1 PID: 5105 at kernel/events/core.c:331 > > > event_function_local.constprop.102+0x494/0x560 kernel/events/core.c:331 > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > CPU: 1 PID: 5105 Comm: syzkaller318382 Not tainted 4.15.0-rc8+ #8 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > Google 01/01/2011 > > > Call Trace: > > > <IRQ> > > > __dump_stack lib/dump_stack.c:17 [inline] > > > dump_stack+0x194/0x257 lib/dump_stack.c:53 > > > panic+0x1e4/0x41c kernel/panic.c:183 > > > __warn+0x1dc/0x200 kernel/panic.c:547 > > > report_bug+0x211/0x2d0 lib/bug.c:184 > > > fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 > > > fixup_bug arch/x86/kernel/traps.c:247 [inline] > > > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 > > > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 > > > invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1096 > > > RIP: 0010:event_function_local.constprop.102+0x494/0x560 > > > kernel/events/core.c:331 > > > RSP: 0018:ffff8801db307d00 EFLAGS: 00010006 > > > RAX: ffff8801d902a180 RBX: ffff8801d60beac0 RCX: ffffffff81862a54 > > > RDX: 0000000000010000 RSI: ffff8801db3079f0 RDI: ffff8801d60bebc8 > > > RBP: ffff8801db307da0 R08: 1ffff1003b660edf R09: 000000000000000c > > > R10: ffff8801db307bb8 R11: 0000000000000044 R12: ffff8801db330e00 > > > R13: 1ffff1003b660fa3 R14: ffff8801d90841c0 R15: ffff8801d979ce40 > > > perf_event_disable_local kernel/events/core.c:1979 [inline] > > > perf_pending_event+0x1e5/0x240 kernel/events/core.c:5446 > > > irq_work_run_list+0x184/0x240 kernel/irq_work.c:155 > > > irq_work_run+0x1d/0x50 kernel/irq_work.c:170 > > > smp_irq_work_interrupt+0x103/0x640 arch/x86/kernel/irq_work.c:21 > > > irq_work_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:968 > > > </IRQ> > > I got curious why this report went silent, and bisected the "fix" to: It's still tagged as "TODO" in my INBOX, along with 1000 other emails tagged with TODO :-p > > 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") > > This contains new tracepoints, and reverting them re-exposes the bug: > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index 89d20ed1d2e8..462fab402bfa 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -1468,15 +1468,21 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu > *current_vcpu, u64 ingpa, u64 outgpa, > all_cpus = false; > valid_bank_mask = BIT_ULL(0); > > + panic("how am I using hyperV?!"); > + /* > trace_kvm_hv_send_ipi(vector, sparse_banks[0]); > + */ > } else { > if (unlikely(kvm_read_guest(kvm, ingpa, &send_ipi_ex, > sizeof(send_ipi_ex)))) > return HV_STATUS_INVALID_HYPERCALL_INPUT; > > + panic("how am I using hyperV?! (ex)"); > + /* > trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector, > send_ipi_ex.vp_set.format, > send_ipi_ex.vp_set.valid_bank_mask); > + */ > > vector = send_ipi_ex.vector; > valid_bank_mask = send_ipi_ex.vp_set.valid_bank_mask; > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h > index 6432d08c7de7..895bd184feea 100644 > --- a/arch/x86/kvm/trace.h > +++ b/arch/x86/kvm/trace.h > @@ -1424,6 +1424,7 @@ TRACE_EVENT(kvm_hv_flush_tlb_ex, > /* > * Tracepoints for kvm_hv_send_ipi. > */ > +#if 0 > TRACE_EVENT(kvm_hv_send_ipi, > TP_PROTO(u32 vector, u64 processor_mask), > TP_ARGS(vector, processor_mask), > @@ -1462,6 +1463,7 @@ TRACE_EVENT(kvm_hv_send_ipi_ex, > __entry->vector, __entry->format, > __entry->valid_bank_mask) > ); > +#endif > #endif /* _TRACE_KVM_H */ > > #undef TRACE_INCLUDE_PATH > > Steve, what could possibly be happening here? Just adding more > tracepoints causes some kind of race where the task vs current test > trips in event_function_local(): > > if (WARN_ON_ONCE(task != current)) > goto unlock; That's perf code. How are you getting there? What's special about this run? You have perf running? -- Steve > > Is this maybe just an unlucky condition with the event loop running in > an IRQ? Should the WARN be expected, or is running under an IRQ > unexpected? > > It'd be nice to find the source of this, since it seems to have gotten > papered over instead of actually fixed. (Especially since the original > "introduction" bisect shows similar "just adding new trace events.) >