On Tue, Jan 31, 2017 at 2:27 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Mon, Jan 30, 2017 at 11:04:08PM -0800, Alexei Starovoitov wrote: >> Hi Peter, >> >> rarely I'm seeing the following crash: >> [40196.164255] BUG: unable to handle kernel paging request at >> 000000000000a11a >> [40196.179636] IP: perf_event_read+0xd3/0x1a0 >> [40196.188669] PGD 82e93a067 >> [40196.188670] PUD 7e1ddf067 >> [40196.194629] PMD 0 >> [40196.200589] >> [40196.208284] Oops: 0000 [#1] SMP >> [40196.208285] Modules linked in: >> [40196.208299] CPU: 24 PID: 4423 Comm: dynoKernelMon Not tainted >> 4.10.0-rc5-01189-gc6e0ad0ee5b0 #599 >> [40196.208300] Hardware name: Quanta Mono Lake-M.2 SATA 20F20BU0270/Mono >> Lake-M.2 SATA, BIOS F20_3A12 10/24/2016 >> [40196.208301] task: ffff8807e3b65580 task.stack: ffffc90009748000 >> [40196.208302] RIP: 0010:perf_event_read+0xd3/0x1a0 >> [40196.208303] RSP: 0018:ffffc9000974bd48 EFLAGS: 00010202 >> [40196.208304] RAX: 000000000000a040 RBX: ffff8807b79fd000 RCX: >> 0000000000000000 >> [40196.208304] RDX: 0000000000000018 RSI: 0000000000000000 RDI: >> 00000000ffffffff >> [40196.208305] RBP: ffffc9000974bd80 R08: 0000000000000000 R09: >> 0000000000000000 >> [40196.208305] R10: ffff8807cf8c7038 R11: 0000000000000000 R12: >> ffffc9000974bde8 >> [40196.208306] R13: 0000000000000000 R14: ffff8807b79fd000 R15: >> 00000000000004e0 >> [40196.208307] FS: 00007ff1b45ff700(0000) GS:ffff88085f400000(0000) >> knlGS:0000000000000000 >> [40196.208307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [40196.208308] CR2: 000000000000a11a CR3: 0000000850298000 CR4: >> 00000000003406e0 >> [40196.208308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [40196.208309] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> [40196.208309] Call Trace: >> [40196.208313] ? __might_sleep+0x4a/0x80 >> [40196.208316] perf_event_read_value+0x45/0x130 >> [40196.208318] perf_read+0x84/0x2d0 >> [40196.208322] __vfs_read+0x28/0x110 >> [40196.208325] ? security_file_permission+0x9b/0xc0 >> [40196.208327] vfs_read+0xa5/0x170 >> [40196.208327] SyS_read+0x46/0xa0 >> [40196.208329] do_syscall_64+0x4d/0xb0 >> [40196.208332] entry_SYSCALL64_slow_path+0x25/0x25 >> [40196.208333] RIP: 0033:0x7ff1b6f5716d >> [40196.208334] RSP: 002b:00007ff1b45fd080 EFLAGS: 00000293 ORIG_RAX: >> 0000000000000000 >> [40196.208335] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: >> 00007ff1b6f5716d >> [40196.208335] RDX: 0000000000000020 RSI: 00007ff1b45fd090 RDI: >> 0000000000000054 >> [40196.208336] RBP: 00007ff1b45fd0e0 R08: 00007ff1b65e45c0 R09: >> 00007ff1b45fc9b1 >> [40196.208336] R10: 0000000000000020 R11: 0000000000000293 R12: >> 0000000000000000 >> [40196.208337] R13: 00007ff1b5687000 R14: 00000004a817c7fe R15: >> 00000000000004e0 >> [40196.208337] Code: 60 02 00 00 74 30 48 63 cf 48 c7 c0 40 a0 00 00 48 8b >> 34 cd c0 93 d0 81 48 63 ca 48 8b 0c cd c0 93 d0 81 0f b7 8c 08 da 00 00 00 >> <66> 39 8c 30 da 00 00 00 0f 44 fa 48 8d 55 d0 b9 01 00 00 00 48 >> [40196.208353] RIP: perf_event_read+0xd3/0x1a0 RSP: ffffc9000974bd48 >> [40196.208353] CR2: 000000000000a11a >> >> The RIP points to this asm: >> 0xffffffff8115fc9b <+155>: mov %gs:0x7eeaa486(%rip),%edx # >> 0xa128 <cpu_number> >> 0xffffffff8115fca2 <+162>: testb $0x2,0x68(%rbx) >> 0xffffffff8115fca6 <+166>: mov 0x260(%rdi),%edi >> 0xffffffff8115fcac <+172>: je 0xffffffff8115fcde >> <perf_event_read+222> >> 0xffffffff8115fcae <+174>: movslq %edi,%rcx >> 0xffffffff8115fcb1 <+177>: mov $0xa040,%rax >> 0xffffffff8115fcb8 <+184>: mov -0x7e2f6c40(,%rcx,8),%rsi >> 0xffffffff8115fcc0 <+192>: movslq %edx,%rcx >> 0xffffffff8115fcc3 <+195>: mov -0x7e2f6c40(,%rcx,8),%rcx >> 0xffffffff8115fccb <+203>: movzwl 0xda(%rax,%rcx,1),%ecx >> 0xffffffff8115fcd3 <+211>: cmp %cx,0xda(%rax,%rsi,1) >> >> which is this C code: >> perf_event_read(): >> local_cpu = get_cpu(); >> cpu_to_read = find_cpu_to_read(event, local_cpu); >> put_cpu(); >> >> find_cpu_to_read(): >> event_pkg = topology_physical_package_id(event_cpu); >> local_pkg = topology_physical_package_id(local_cpu); >> if (event_pkg == local_pkg) >> >> If I read the asm correctly at the time of the crash >> event_cpu == RDI == 00000000ffffffff >> or in other words event->oncpu == -1 >> which I think is technically possible here. >> >> Any suggestions how to fix this? >> Happy to test any patches, though I don't know how to reproduce reliably. > > Right you are, that's buggered. > > Something like the below (compile tested only) ought to cure things I > think. > > --- > Subject: perf: Fix crash in perf_event_read() > > Alexei had his box explode because doing read() on a package > (rapl/uncore) event that isn't currently scheduled in ends up doing an > out-of-bounds load. > > Rework the code to more explicitly deal with event->oncpu being -1. > Tested-by: David Carrillo-Cisneros <davi...@google.com> > Reported-by: Alexei Starovoitov <alexei.starovoi...@gmail.com> > Cc: David Carrillo-Cisneros <davi...@google.com> > Cc: Stephane Eranian <eran...@google.com> > Fixes: d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG") > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > kernel/events/core.c | 23 ++++++++++++++--------- > 1 file changed, 14 insertions(+), 9 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 88676ff98c0f..8d479be0c9a9 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -3538,14 +3538,15 @@ struct perf_read_data { > int ret; > }; > > -static int find_cpu_to_read(struct perf_event *event, int local_cpu) > +static int __perf_event_read_cpu(struct perf_event *event, int event_cpu) > { > - int event_cpu = event->oncpu; > u16 local_pkg, event_pkg; > > if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) { > - event_pkg = topology_physical_package_id(event_cpu); > - local_pkg = topology_physical_package_id(local_cpu); > + int local_cpu = smp_processor_id(); > + > + event_pkg = topology_physical_package_id(event_cpu); > + local_pkg = topology_physical_package_id(local_cpu); > > if (event_pkg == local_pkg) > return local_cpu; > @@ -3675,7 +3676,7 @@ u64 perf_event_read_local(struct perf_event *event) > > static int perf_event_read(struct perf_event *event, bool group) > { > - int ret = 0, cpu_to_read, local_cpu; > + int event_cpu, ret = 0; > > /* > * If event is enabled and currently active on a CPU, update the > @@ -3688,9 +3689,12 @@ static int perf_event_read(struct perf_event *event, > bool group) > .ret = 0, > }; > > - local_cpu = get_cpu(); > - cpu_to_read = find_cpu_to_read(event, local_cpu); > - put_cpu(); > + event_cpu = READ_ONCE(event->oncpu); > + if ((unsigned)event_cpu >= nr_cpu_ids) > + return 0;
Yeah, I missed that one. Thanks for fixing it. > + > + preempt_disable(); > + event_cpu = __perf_event_read_cpu(event, event_cpu); > > /* > * Purposely ignore the smp_call_function_single() return > @@ -3702,7 +3706,8 @@ static int perf_event_read(struct perf_event *event, > bool group) > * Therefore, either way, we'll have an up-to-date event count > * after this. > */ > - (void)smp_call_function_single(cpu_to_read, > __perf_event_read, &data, 1); > + (void)smp_call_function_single(event_cpu, __perf_event_read, > &data, 1); > + preempt_enable(); > ret = data.ret; > } else if (event->state == PERF_EVENT_STATE_INACTIVE) { > struct perf_event_context *ctx = event->ctx;