On Tue, Feb 23, 2016 at 04:48:49PM +0100, Jiri Olsa wrote:
> On Tue, Feb 23, 2016 at 04:27:29PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 19, 2016 at 03:37:47PM +0100, Peter Zijlstra wrote:
> > > Oleg reported that enable_on_exec results in weird scale factors.
> > > 
> > > The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event
> > > scheduling") caused this by moving task_ctx_sched_out() from before
> > > __perf_event_mask_enable() to after it.
> > > 
> > > The overlooked concequence of that change is that task_ctx_sched_out()
> > > would update the ctx time fields, and now __perf_event_mask_enable()
> > > uses stale time.
> > > 
> > > Fix this by adding an explicit time update.
> > > 
> > > While looking at this, I also found that we need an ctx->is_active
> > > check in perf_install_in_context().
> > > 
> > > XXX: does this actually fix the reported issue? I'm not sure what the
> > > reproduction case is. Also an earlier version made Jiri's machine
> > > explode -- something I've not managed to reproduce either.
> > 
> > Jiri, can you have a look at this and perhaps share the reproducer?
> 
> yep, I'm testing this patchset, but got stuck with 'crash' tool to get
> some reasonable output.. got stuck on unrelated sched deadlock ;-)
> 
> the reproducer is described in this email:
> http://marc.info/?l=linux-kernel&m=145568006709552&w=2

so I finally got some reasonable backtrace and figured that crash finally:

 #7 [ffff8802751afcd0] general_protection at ffffffff817a69e8
    [exception RIP: special_mapping_fault+47]
    RIP: ffffffff811e40df  RSP: ffff8802751afd88  RFLAGS: 00010282
    RAX: ffff8802747e8b68  RBX: 00007fffffffe080  RCX: c4712d0070657267
    RDX: ffff8802751afd98  RSI: ffff8802742c4f00  RDI: ffff8802747e8b68
    RBP: ffff8802751afd88   R8: 0000000000000000   R9: ffff8802751afe58
    R10: 00000000000001fe  R11: 00003fffffe00000  R12: ffff8802742c4f00
    R13: ffff8802751afe58  R14: 0000000000000000  R15: ffff880273f59ff8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #8 [ffff8802751afd90] __do_fault at ffffffff811db505
 #9 [ffff8802751afdf8] handle_mm_fault at ffffffff811e0b03
#10 [ffff8802751afec8] __do_page_fault at ffffffff8106734a
#11 [ffff8802751aff20] do_page_fault at ffffffff810675df
#12 [ffff8802751aff50] page_fault at ffffffff817a6a48


it was caused by:
  - f872f5400cc0 mm: Add a vm_special_mapping.fault() method
    that added call of vm_special_mapping::fault if it's defined

  - and uprobes code not initializing this fault pointer properly,
    attached patch fixed the issue for me,
    Oleg, I'm not sure this is how you want to fix this though..


however I still see the off by 1 as Pratyush said:
  65536;;probe_exact:f_65535x;132185462;100.00

I have another patch making the ena/run times equal for software
events in read syscal, and that obviously works.. but I'm not sure
how fix this otherwise ATM

thanks,
jirka


---
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0167679182c0..0c045aad28a2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1169,7 +1169,7 @@ static struct xol_area *__create_xol_area(unsigned long 
vaddr)
        uprobe_opcode_t insn = UPROBE_SWBP_INSN;
        struct xol_area *area;
 
-       area = kmalloc(sizeof(*area), GFP_KERNEL);
+       area = kzalloc(sizeof(*area), GFP_KERNEL);
        if (unlikely(!area))
                goto out;
 

Reply via email to