On Tue, Feb 23, 2016 at 04:48:49PM +0100, Jiri Olsa wrote: > On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote: > > On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote: > > > Oleg reported that enable_on_exec results in weird scale factors. > > > > > > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event > > > scheduling") caused this by moving task_ctx_sched_out() from before > > > __perf_event_mask_enable() to after it. > > > > > > The overlooked concequence of that change is that task_ctx_sched_out() > > > would update the ctx time fields, and now __perf_event_mask_enable() > > > uses stale time. > > > > > > Fix this by adding an explicit time update. > > > > > > While looking at this, I also found that we need an ctx->is_active > > > check in perf_install_in_context(). > > > > > > XXX: does this actually fix the reported issue? I'm not sure what the > > > reproduction case is. Also an earlier version made Jiri's machine > > > explode -- something I've not managed to reproduce either. > > > > Jiri, can you have a look at this and perhaps share the reproducer? > > yep, I'm testing this patchset, but got stuck with 'crash' tool to get > some reasonable output.. got stuck on unrelated sched deadlock ;-) > > the reproducer is described in this email: > http://marc.info/?l=linux-kernel&m=145568006709552&w=2
so I finally got some reasonable backtrace and figured that crash finally: #7 [ffff8802751afcd0] general_protection at ffffffff817a69e8 [exception RIP: special_mapping_fault+47] RIP: ffffffff811e40df RSP: ffff8802751afd88 RFLAGS: 00010282 RAX: ffff8802747e8b68 RBX: 00007fffffffe080 RCX: c4712d0070657267 RDX: ffff8802751afd98 RSI: ffff8802742c4f00 RDI: ffff8802747e8b68 RBP: ffff8802751afd88 R8: 0000000000000000 R9: ffff8802751afe58 R10: 00000000000001fe R11: 00003fffffe00000 R12: ffff8802742c4f00 R13: ffff8802751afe58 R14: 0000000000000000 R15: ffff880273f59ff8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #8 [ffff8802751afd90] __do_fault at ffffffff811db505 #9 [ffff8802751afdf8] handle_mm_fault at ffffffff811e0b03 #10 [ffff8802751afec8] __do_page_fault at ffffffff8106734a #11 [ffff8802751aff20] do_page_fault at ffffffff810675df #12 [ffff8802751aff50] page_fault at ffffffff817a6a48 it was caused by: - f872f5400cc0 mm: Add a vm_special_mapping.fault() method that added call of vm_special_mapping::fault if it's defined - and uprobes code not initializing this fault pointer properly, attached patch fixed the issue for me, Oleg, I'm not sure this is how you want to fix this though.. however I still see the off by 1 as Pratyush said: 65536;;probe_exact:f_65535x;132185462;100.00 I have another patch making the ena/run times equal for software events in read syscal, and that obviously works.. but I'm not sure how fix this otherwise ATM thanks, jirka --- diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0167679182c0..0c045aad28a2 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1169,7 +1169,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr) uprobe_opcode_t insn = UPROBE_SWBP_INSN; struct xol_area *area; - area = kmalloc(sizeof(*area), GFP_KERNEL); + area = kzalloc(sizeof(*area), GFP_KERNEL); if (unlikely(!area)) goto out;