Re: Getting empty callchain from perf_callchain_kernel()

2019-06-13 Thread Josh Poimboeuf
On Wed, Jun 12, 2019 at 09:50:08AM -0500, Josh Poimboeuf wrote:
> > Other than that, the same note as before, the 32bit JIT still seems
> > buggered, but I'm not sure you (or anybody else) cares enough about that
> > to fix it though. It seems to use ebp as its own frame pointer, which
> > completely defeats an unwinder.
> 
> I'm still trying to decide if I care about 32-bit.  It does indeed use
> ebp everywhere.  But I'm not sure if I want to poke the beehive...  Also
> factoring into the equation is the fact that I'll be on PTO next week
> :-)  If I have time in the next couple days then I may take a look.

32-bit actually looks much easier to fix than 64-bit was.  I haven't
tested it yet though, but I'll be gone next week so I'll just drop it
here in case anybody wants to try it.


diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index b29e82f190c7..8c1de7786e49 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -169,6 +169,10 @@ static const u8 bpf2ia32[][2] = {
 #define src_hi src[1]
 
 #define STACK_ALIGNMENT8
+
+/* Size of callee-saved register space (except EBP) */
+#define CALLEE_SAVE_SIZE 12
+
 /*
  * Stack space for BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4,
  * BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9,
@@ -176,13 +180,14 @@ static const u8 bpf2ia32[][2] = {
  */
 #define SCRATCH_SIZE 96
 
-/* Total stack size used in JITed code */
+/* Total stack size used in JITed code (except callee-saved) */
 #define _STACK_SIZE(stack_depth + SCRATCH_SIZE)
 
 #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
 
-/* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (off)
+/* Offset of eBPF REGISTERs stored in scratch space, relative to EBP */
+//FIXME: rename to EBP_OFFSET
+#define STACK_VAR(off) (off - CALLEE_SAVE_SIZE - SCRATCH_SIZE)
 
 /* Encode 'dst_reg' register into IA32 opcode 'byte' */
 static u8 add_1reg(u8 byte, u32 dst_reg)
@@ -1408,7 +1413,7 @@ struct jit_context {
 #define BPF_MAX_INSN_SIZE  128
 #define BPF_INSN_SAFETY64
 
-#define PROLOGUE_SIZE 35
+#define PROLOGUE_SIZE 32
 
 /*
  * Emit prologue code for BPF program and check it's size.
@@ -1436,8 +1441,6 @@ static void emit_prologue(u8 **pprog, u32 stack_depth)
 
/* sub esp,STACK_SIZE */
EMIT2_off32(0x81, 0xEC, STACK_SIZE);
-   /* sub ebp,SCRATCH_SIZE+12*/
-   EMIT3(0x83, add_1reg(0xE8, IA32_EBP), SCRATCH_SIZE + 12);
/* xor ebx,ebx */
EMIT2(0x31, add_2reg(0xC0, IA32_EBX, IA32_EBX));
 
@@ -1470,18 +1473,21 @@ static void emit_epilogue(u8 **pprog, u32 stack_depth)
/* mov edx,dword ptr [ebp+off]*/
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r0[1]));
 
-   /* add ebp,SCRATCH_SIZE+12*/
-   EMIT3(0x83, add_1reg(0xC0, IA32_EBP), SCRATCH_SIZE + 12);
+   /* add esp, STACK_SIZE */
+   EMIT2_off32(0x81, 0xC4, STACK_SIZE);
+
+   /* pop ebx */
+   EMIT1(0x5b);
+   /* pop esi */
+   EMIT1(0x5e);
+   /* pop edi */
+   EMIT1(0x5f);
+   /* pop ebp */
+   EMIT1(0x5d);
 
-   /* mov ebx,dword ptr [ebp-12]*/
-   EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), -12);
-   /* mov esi,dword ptr [ebp-8]*/
-   EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ESI), -8);
-   /* mov edi,dword ptr [ebp-4]*/
-   EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDI), -4);
+   /* ret */
+   EMIT1(0xC3);
 
-   EMIT1(0xC9); /* leave */
-   EMIT1(0xC3); /* ret */
*pprog = prog;
 }
 


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-12 Thread Josh Poimboeuf
On Wed, Jun 12, 2019 at 10:54:23AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 11, 2019 at 10:05:01PM -0500, Josh Poimboeuf wrote:
> > On Fri, May 24, 2019 at 10:53:19AM +0200, Peter Zijlstra wrote:
> > > > For ORC, I'm thinking we may be able to just require that all generated
> > > > code (BPF and others) always use frame pointers.  Then when ORC doesn't
> > > > recognize a code address, it could try using the frame pointer as a
> > > > fallback.
> > > 
> > > Yes, this seems like a sensible approach. We'd also have to audit the
> > > ftrace and kprobe trampolines, IIRC they only do framepointer setup for
> > > CONFIG_FRAME_POINTER currently, which should be easy to fix (after the
> > > patches I have to fix the FP generation in the first place:
> > > 
> > >   
> > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=x86/wip
> > 
> > Right now, ftrace has a special hook in the ORC unwinder
> > (orc_ftrace_find).  It would be great if we could get rid of that in
> > favor of the "always use frame pointers" approach.  I'll hold off on
> > doing the kpatch/kprobe trampoline conversions in my patches since it
> > would conflict with yours.
> > 
> > Though, hm, because of pt_regs I guess ORC would need to be able to
> > decode an encoded frame pointer?  I was hoping we could leave those
> > encoded frame pointers behind in CONFIG_FRAME_POINTER-land forever...
> 
> Ah, I see.. could a similar approach work for the kprobe trampolines
> perhaps?

If you mean requiring that kprobes trampolines always use frame
pointers, I think it should work.

> > Here are my latest BPF unwinder patches in case anybody wants a sneak
> > peek:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=bpf-orc-fix
> 
> On a quick read-through, that looks good to me. A minor nit:
> 
>   /* mov dst_reg, %r11 */
>   EMIT_mov(dst_reg, AUX_REG);
> 
> The disparity between %r11 and AUX_REG is jarring. I understand the
> whole bpf register mapping thing, but it is just weird when reading
> this.

True, but there are several cases where the r11 is hard-coded in the
instruction encoding itself, like:

/* mov imm32, %r11 */
EMIT3_off32(0x49, 0xC7, 0xC3, imm32);

If the code were more decoupled, like if it had helpers where you could
always just pass AUX_REG, and the code never had to know what the value
of AUX_REG is, then using "AUX_REG" in the comments would make sense.

But since there are inconsistencies, with hard-coded register mapping
knowledge in many places, I find it easier to follow what's going on
when the specific register name is always shown in the comments.

> Other than that, the same note as before, the 32bit JIT still seems
> buggered, but I'm not sure you (or anybody else) cares enough about that
> to fix it though. It seems to use ebp as its own frame pointer, which
> completely defeats an unwinder.

I'm still trying to decide if I care about 32-bit.  It does indeed use
ebp everywhere.  But I'm not sure if I want to poke the beehive...  Also
factoring into the equation is the fact that I'll be on PTO next week
:-)  If I have time in the next couple days then I may take a look.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-12 Thread Josh Poimboeuf
On Wed, Jun 12, 2019 at 09:10:23AM -0400, Steven Rostedt wrote:
> On Tue, 11 Jun 2019 22:05:01 -0500
> Josh Poimboeuf  wrote:
> 
> > Right now, ftrace has a special hook in the ORC unwinder
> > (orc_ftrace_find).  It would be great if we could get rid of that in
> > favor of the "always use frame pointers" approach.  I'll hold off on
> > doing the kpatch/kprobe trampoline conversions in my patches since it
> > would conflict with yours.
> 
> Basically, IIUC, what you are saying is that the ftrace trampoline
> should always store the %sp in %rb even when CONFIG_FRAME_POINTER is not
> enabled? And this can allow you to remove the ftrace specific code from
> the orc unwinder?

Basically, yes.  Though the frame pointer encoding which Peter is adding
to the ftrace/kprobes trampolines might complicate things a bit.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-12 Thread Steven Rostedt
On Tue, 11 Jun 2019 22:05:01 -0500
Josh Poimboeuf  wrote:

> Right now, ftrace has a special hook in the ORC unwinder
> (orc_ftrace_find).  It would be great if we could get rid of that in
> favor of the "always use frame pointers" approach.  I'll hold off on
> doing the kpatch/kprobe trampoline conversions in my patches since it
> would conflict with yours.

Basically, IIUC, what you are saying is that the ftrace trampoline
should always store the %sp in %rb even when CONFIG_FRAME_POINTER is not
enabled? And this can allow you to remove the ftrace specific code from
the orc unwinder?

-- Steve


> 
> Though, hm, because of pt_regs I guess ORC would need to be able to
> decode an encoded frame pointer?  I was hoping we could leave those
> encoded frame pointers behind in CONFIG_FRAME_POINTER-land forever...



Re: Getting empty callchain from perf_callchain_kernel()

2019-06-12 Thread Peter Zijlstra
On Tue, Jun 11, 2019 at 10:05:01PM -0500, Josh Poimboeuf wrote:
> On Fri, May 24, 2019 at 10:53:19AM +0200, Peter Zijlstra wrote:
> > > For ORC, I'm thinking we may be able to just require that all generated
> > > code (BPF and others) always use frame pointers.  Then when ORC doesn't
> > > recognize a code address, it could try using the frame pointer as a
> > > fallback.
> > 
> > Yes, this seems like a sensible approach. We'd also have to audit the
> > ftrace and kprobe trampolines, IIRC they only do framepointer setup for
> > CONFIG_FRAME_POINTER currently, which should be easy to fix (after the
> > patches I have to fix the FP generation in the first place:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=x86/wip
> 
> Right now, ftrace has a special hook in the ORC unwinder
> (orc_ftrace_find).  It would be great if we could get rid of that in
> favor of the "always use frame pointers" approach.  I'll hold off on
> doing the kpatch/kprobe trampoline conversions in my patches since it
> would conflict with yours.
> 
> Though, hm, because of pt_regs I guess ORC would need to be able to
> decode an encoded frame pointer?  I was hoping we could leave those
> encoded frame pointers behind in CONFIG_FRAME_POINTER-land forever...

Ah, I see.. could a similar approach work for the kprobe trampolines
perhaps?

> Here are my latest BPF unwinder patches in case anybody wants a sneak
> peek:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=bpf-orc-fix

On a quick read-through, that looks good to me. A minor nit:

/* mov dst_reg, %r11 */
EMIT_mov(dst_reg, AUX_REG);

The disparity between %r11 and AUX_REG is jarring. I understand the
whole bpf register mapping thing, but it is just weird when reading
this.

Other than that, the same note as before, the 32bit JIT still seems
buggered, but I'm not sure you (or anybody else) cares enough about that
to fix it though. It seems to use ebp as its own frame pointer, which
completely defeats an unwinder.


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-11 Thread Josh Poimboeuf
On Fri, May 24, 2019 at 10:53:19AM +0200, Peter Zijlstra wrote:
> > For ORC, I'm thinking we may be able to just require that all generated
> > code (BPF and others) always use frame pointers.  Then when ORC doesn't
> > recognize a code address, it could try using the frame pointer as a
> > fallback.
> 
> Yes, this seems like a sensible approach. We'd also have to audit the
> ftrace and kprobe trampolines, IIRC they only do framepointer setup for
> CONFIG_FRAME_POINTER currently, which should be easy to fix (after the
> patches I have to fix the FP generation in the first place:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=x86/wip

Right now, ftrace has a special hook in the ORC unwinder
(orc_ftrace_find).  It would be great if we could get rid of that in
favor of the "always use frame pointers" approach.  I'll hold off on
doing the kpatch/kprobe trampoline conversions in my patches since it
would conflict with yours.

Though, hm, because of pt_regs I guess ORC would need to be able to
decode an encoded frame pointer?  I was hoping we could leave those
encoded frame pointers behind in CONFIG_FRAME_POINTER-land forever...

Here are my latest BPF unwinder patches in case anybody wants a sneak
peek:

  
https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=bpf-orc-fix

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-11 Thread Josh Poimboeuf
On Thu, Jun 06, 2019 at 04:04:48PM +, Song Liu wrote:
> Hi Josh, 
> 
> Have you got luck fixing the ORC side?

Here's the ORC fix.  It's needed in addition to the bpf frame pointer
fix (the previous patch).  I'll clean the patches up and post them soon.

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 33b66b5c5aec..a2337460b1a3 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -128,6 +128,16 @@ static struct orc_entry null_orc_entry = {
.type = ORC_TYPE_CALL
 };
 
+/* Fake frame pointer entry -- used as a fallback for generated code */
+static struct orc_entry orc_fp_entry = {
+   .type   = ORC_TYPE_CALL,
+   .sp_reg = ORC_REG_BP,
+   .sp_offset  = 16,
+   .bp_reg = ORC_REG_PREV_SP,
+   .bp_offset  = -16,
+   .end= 0,
+};
+
 static struct orc_entry *orc_find(unsigned long ip)
 {
static struct orc_entry *orc;
@@ -392,8 +402,16 @@ bool unwind_next_frame(struct unwind_state *state)
 * calls and calls to noreturn functions.
 */
orc = orc_find(state->signal ? state->ip : state->ip - 1);
-   if (!orc)
-   goto err;
+   if (!orc) {
+   /*
+* As a fallback, try to assume rbp is the frame pointer.  This
+* is useful for generated code like BPF.  Since this is a
+* guess, the rest of the unwind is no longer considered
+* reliable.
+*/
+   orc = _fp_entry;
+   state->error = true;
+   }
 
/* End-of-stack check for kernel threads: */
if (orc->sp_reg == ORC_REG_UNDEFINED) {


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-06 Thread Josh Poimboeuf
On Thu, Jun 06, 2019 at 04:04:48PM +, Song Liu wrote:
> >> Here's a tentative BPF fix for the JIT frame pointer issue.  It was a
> >> bit harder than I expected.  Encoding r12 as a base register requires a
> >> SIB byte, so I had to add support for encoding that.  I also simplified
> >> the prologue to resemble a GCC prologue, which decreases the prologue
> >> size quite a bit.
> >> 
> >> Next week I can work on the corresponding ORC change.  Then I can clean
> >> all the patches up and submit them properly.
> 
> Hi Josh, 
> 
> Have you got luck fixing the ORC side?

Sorry, I've been traveling this week and I haven't had a chance to work
on it.  It's at the top of my TODO list for next week.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-06-06 Thread Song Liu



> On May 27, 2019, at 4:57 AM, Kairui Song  wrote:
> 
> On Sat, May 25, 2019 at 7:23 AM Josh Poimboeuf  wrote:
>> 
>> On Fri, May 24, 2019 at 10:20:52AM +0800, Kairui Song wrote:
>>> On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf  wrote:
 
 On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  
> wrote:
>> 
>> On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> Hi Josh, this still won't fix the problem.
> 
> Problem is not (or not only) with ___bpf_prog_run, what actually went
> wrong is with the JITed bpf code.
 
 There seem to be a bunch of issues.  My patch at least fixes the 
 failing
 selftest reported by Alexei for ORC.
 
 How can I recreate your issue?
>>> 
>>> Hmm, I used bcc's example to attach bpf to trace point, and with that
>>> fix stack trace is still invalid.
>>> 
>>> CMD I used with bcc:
>>> python3 ./tools/stackcount.py t:sched:sched_fork
>> 
>> I've had problems in the past getting bcc to build, so I was hoping it
>> was reproducible with a standalone selftest.
>> 
>>> And I just had another try applying your patch, self test is also 
>>> failing.
>> 
>> Is it the same selftest reported by Alexei?
>> 
>>  test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err 
>> -1 errno 2
>> 
>>> I'm applying on my local master branch, a few days older than
>>> upstream, I can update and try again, am I missing anything?
>> 
>> The above patch had some issues, so with some configs you might see an
>> objtool warning for ___bpf_prog_run(), in which case the patch doesn't
>> fix the test_stacktrace_map selftest.
>> 
>> Here's the latest version which should fix it in all cases (based on
>> tip/master):
>> 
>>  
>> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> 
> Hmm, I still get the failure:
> test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> err -1 errno 2
> 
> And I didn't see how this will fix the issue. As long as ORC need to
> unwind through the JITed code it will fail. And that will happen
> before reaching ___bpf_prog_run.
 
 Ok, I was able to recreate by doing
 
  echo 1 > /proc/sys/net/core/bpf_jit_enable
 
 first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
 
>>> 
>>> Yes, with JIT off it will be fixed. I can confirm that.
>> 
>> Here's a tentative BPF fix for the JIT frame pointer issue.  It was a
>> bit harder than I expected.  Encoding r12 as a base register requires a
>> SIB byte, so I had to add support for encoding that.  I also simplified
>> the prologue to resemble a GCC prologue, which decreases the prologue
>> size quite a bit.
>> 
>> Next week I can work on the corresponding ORC change.  Then I can clean
>> all the patches up and submit them properly.

Hi Josh, 

Have you got luck fixing the ORC side?

Thanks,
Song

>> 
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index afabf597c855..c9b4503558c9 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -104,9 +104,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
>> /*
>>  * The following table maps BPF registers to x86-64 registers.
>>  *
>> - * x86-64 register R12 is unused, since if used as base address
>> - * register in load/store instructions, it always needs an
>> - * extra byte of encoding and is callee saved.
>> + * RBP isn't used; it needs to be preserved to allow the unwinder to move
>> + * through generated code stacks.
>>  *
>>  * Also x86-64 register R9 is unused. x86-64 register R10 is
>>  * used for blinding (if enabled).
>> @@ -122,7 +121,7 @@ static const int reg2hex[] = {
>>[BPF_REG_7] = 5,  /* R13 callee saved */
>>[BPF_REG_8] = 6,  /* R14 callee saved */
>>[BPF_REG_9] = 7,  /* R15 callee saved */
>> -   [BPF_REG_FP] = 5, /* RBP readonly */
>> +   [BPF_REG_FP] = 4, /* R12 readonly */
>>[BPF_REG_AX] = 2, /* R10 temp register */
>>[AUX_REG] = 3,/* R11 temp register */
>> };
>> @@ -139,6 +138,7 @@ static bool is_ereg(u32 reg)
>> BIT(BPF_REG_7) |
>> BIT(BPF_REG_8) |
>> BIT(BPF_REG_9) |
>> +BIT(BPF_REG_FP) |
>> BIT(BPF_REG_AX));
>> }
>> 
>> @@ -147,6 +147,11 @@ static bool is_axreg(u32 reg)
>>return reg == BPF_REG_0;
>> }
>> 
>> +static bool is_sib_reg(u32 reg)
>> +{
>> +   return reg == BPF_REG_FP;
>> +}
>> +
>> /* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
>> static u8 add_1mod(u8 byte, u32 reg)
>> {
>> @@ -190,15 +195,13 @@ struct jit_context {
>> #define BPF_MAX_INSN_SIZE  128
>> #define 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-27 Thread Kairui Song
On Sat, May 25, 2019 at 7:23 AM Josh Poimboeuf  wrote:
>
> On Fri, May 24, 2019 at 10:20:52AM +0800, Kairui Song wrote:
> > On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf  wrote:
> > >
> > > On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> > > >  On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  
> > > > wrote:
> > > > >
> > > > > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > > > > Hi Josh, this still won't fix the problem.
> > > > > > > >
> > > > > > > > Problem is not (or not only) with ___bpf_prog_run, what 
> > > > > > > > actually went
> > > > > > > > wrong is with the JITed bpf code.
> > > > > > >
> > > > > > > There seem to be a bunch of issues.  My patch at least fixes the 
> > > > > > > failing
> > > > > > > selftest reported by Alexei for ORC.
> > > > > > >
> > > > > > > How can I recreate your issue?
> > > > > >
> > > > > > Hmm, I used bcc's example to attach bpf to trace point, and with 
> > > > > > that
> > > > > > fix stack trace is still invalid.
> > > > > >
> > > > > > CMD I used with bcc:
> > > > > > python3 ./tools/stackcount.py t:sched:sched_fork
> > > > >
> > > > > I've had problems in the past getting bcc to build, so I was hoping it
> > > > > was reproducible with a standalone selftest.
> > > > >
> > > > > > And I just had another try applying your patch, self test is also 
> > > > > > failing.
> > > > >
> > > > > Is it the same selftest reported by Alexei?
> > > > >
> > > > >   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap 
> > > > > err -1 errno 2
> > > > >
> > > > > > I'm applying on my local master branch, a few days older than
> > > > > > upstream, I can update and try again, am I missing anything?
> > > > >
> > > > > The above patch had some issues, so with some configs you might see an
> > > > > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > > > > fix the test_stacktrace_map selftest.
> > > > >
> > > > > Here's the latest version which should fix it in all cases (based on
> > > > > tip/master):
> > > > >
> > > > >   
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> > > >
> > > > Hmm, I still get the failure:
> > > > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> > > > err -1 errno 2
> > > >
> > > > And I didn't see how this will fix the issue. As long as ORC need to
> > > > unwind through the JITed code it will fail. And that will happen
> > > > before reaching ___bpf_prog_run.
> > >
> > > Ok, I was able to recreate by doing
> > >
> > >   echo 1 > /proc/sys/net/core/bpf_jit_enable
> > >
> > > first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
> > >
> >
> > Yes, with JIT off it will be fixed. I can confirm that.
>
> Here's a tentative BPF fix for the JIT frame pointer issue.  It was a
> bit harder than I expected.  Encoding r12 as a base register requires a
> SIB byte, so I had to add support for encoding that.  I also simplified
> the prologue to resemble a GCC prologue, which decreases the prologue
> size quite a bit.
>
> Next week I can work on the corresponding ORC change.  Then I can clean
> all the patches up and submit them properly.
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index afabf597c855..c9b4503558c9 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -104,9 +104,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
>  /*
>   * The following table maps BPF registers to x86-64 registers.
>   *
> - * x86-64 register R12 is unused, since if used as base address
> - * register in load/store instructions, it always needs an
> - * extra byte of encoding and is callee saved.
> + * RBP isn't used; it needs to be preserved to allow the unwinder to move
> + * through generated code stacks.
>   *
>   * Also x86-64 register R9 is unused. x86-64 register R10 is
>   * used for blinding (if enabled).
> @@ -122,7 +121,7 @@ static const int reg2hex[] = {
> [BPF_REG_7] = 5,  /* R13 callee saved */
> [BPF_REG_8] = 6,  /* R14 callee saved */
> [BPF_REG_9] = 7,  /* R15 callee saved */
> -   [BPF_REG_FP] = 5, /* RBP readonly */
> +   [BPF_REG_FP] = 4, /* R12 readonly */
> [BPF_REG_AX] = 2, /* R10 temp register */
> [AUX_REG] = 3,/* R11 temp register */
>  };
> @@ -139,6 +138,7 @@ static bool is_ereg(u32 reg)
>  BIT(BPF_REG_7) |
>  BIT(BPF_REG_8) |
>  BIT(BPF_REG_9) |
> +BIT(BPF_REG_FP) |
>  BIT(BPF_REG_AX));
>  }
>
> @@ -147,6 +147,11 @@ static bool is_axreg(u32 reg)
> return reg == BPF_REG_0;
>  }
>
> +static bool is_sib_reg(u32 reg)
> +{
> +   return reg == BPF_REG_FP;
> +}
> +
>  /* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
>  static u8 add_1mod(u8 byte, u32 reg)
>  {
> @@ -190,15 +195,13 @@ struct jit_context {
>  #define 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-24 Thread Josh Poimboeuf
On Fri, May 24, 2019 at 10:20:52AM +0800, Kairui Song wrote:
> On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf  wrote:
> >
> > On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> > >  On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  
> > > wrote:
> > > >
> > > > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > > > Hi Josh, this still won't fix the problem.
> > > > > > >
> > > > > > > Problem is not (or not only) with ___bpf_prog_run, what actually 
> > > > > > > went
> > > > > > > wrong is with the JITed bpf code.
> > > > > >
> > > > > > There seem to be a bunch of issues.  My patch at least fixes the 
> > > > > > failing
> > > > > > selftest reported by Alexei for ORC.
> > > > > >
> > > > > > How can I recreate your issue?
> > > > >
> > > > > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > > > > fix stack trace is still invalid.
> > > > >
> > > > > CMD I used with bcc:
> > > > > python3 ./tools/stackcount.py t:sched:sched_fork
> > > >
> > > > I've had problems in the past getting bcc to build, so I was hoping it
> > > > was reproducible with a standalone selftest.
> > > >
> > > > > And I just had another try applying your patch, self test is also 
> > > > > failing.
> > > >
> > > > Is it the same selftest reported by Alexei?
> > > >
> > > >   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap 
> > > > err -1 errno 2
> > > >
> > > > > I'm applying on my local master branch, a few days older than
> > > > > upstream, I can update and try again, am I missing anything?
> > > >
> > > > The above patch had some issues, so with some configs you might see an
> > > > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > > > fix the test_stacktrace_map selftest.
> > > >
> > > > Here's the latest version which should fix it in all cases (based on
> > > > tip/master):
> > > >
> > > >   
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> > >
> > > Hmm, I still get the failure:
> > > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> > > err -1 errno 2
> > >
> > > And I didn't see how this will fix the issue. As long as ORC need to
> > > unwind through the JITed code it will fail. And that will happen
> > > before reaching ___bpf_prog_run.
> >
> > Ok, I was able to recreate by doing
> >
> >   echo 1 > /proc/sys/net/core/bpf_jit_enable
> >
> > first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
> >
> 
> Yes, with JIT off it will be fixed. I can confirm that.

Here's a tentative BPF fix for the JIT frame pointer issue.  It was a
bit harder than I expected.  Encoding r12 as a base register requires a
SIB byte, so I had to add support for encoding that.  I also simplified
the prologue to resemble a GCC prologue, which decreases the prologue
size quite a bit.

Next week I can work on the corresponding ORC change.  Then I can clean
all the patches up and submit them properly.

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index afabf597c855..c9b4503558c9 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -104,9 +104,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
 /*
  * The following table maps BPF registers to x86-64 registers.
  *
- * x86-64 register R12 is unused, since if used as base address
- * register in load/store instructions, it always needs an
- * extra byte of encoding and is callee saved.
+ * RBP isn't used; it needs to be preserved to allow the unwinder to move
+ * through generated code stacks.
  *
  * Also x86-64 register R9 is unused. x86-64 register R10 is
  * used for blinding (if enabled).
@@ -122,7 +121,7 @@ static const int reg2hex[] = {
[BPF_REG_7] = 5,  /* R13 callee saved */
[BPF_REG_8] = 6,  /* R14 callee saved */
[BPF_REG_9] = 7,  /* R15 callee saved */
-   [BPF_REG_FP] = 5, /* RBP readonly */
+   [BPF_REG_FP] = 4, /* R12 readonly */
[BPF_REG_AX] = 2, /* R10 temp register */
[AUX_REG] = 3,/* R11 temp register */
 };
@@ -139,6 +138,7 @@ static bool is_ereg(u32 reg)
 BIT(BPF_REG_7) |
 BIT(BPF_REG_8) |
 BIT(BPF_REG_9) |
+BIT(BPF_REG_FP) |
 BIT(BPF_REG_AX));
 }
 
@@ -147,6 +147,11 @@ static bool is_axreg(u32 reg)
return reg == BPF_REG_0;
 }
 
+static bool is_sib_reg(u32 reg)
+{
+   return reg == BPF_REG_FP;
+}
+
 /* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
 static u8 add_1mod(u8 byte, u32 reg)
 {
@@ -190,15 +195,13 @@ struct jit_context {
 #define BPF_MAX_INSN_SIZE  128
 #define BPF_INSN_SAFETY64
 
-#define AUX_STACK_SPACE40 /* Space for RBX, R13, R14, R15, 
tailcnt */
-
-#define PROLOGUE_SIZE  37
+#define PROLOGUE_SIZE  25
 
 /*
  * Emit x86-64 prologue code for BPF program and check its size.
  * bpf_tail_call 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-24 Thread Josh Poimboeuf
On Fri, May 24, 2019 at 10:53:19AM +0200, Peter Zijlstra wrote:
> On Thu, May 23, 2019 at 10:24:13AM -0500, Josh Poimboeuf wrote:
> 
> > Here's the latest version which should fix it in all cases (based on
> > tip/master):
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> 
> That patch suffers an inconsitency, the comment states:
> 
>   'if they have "jump_table" in the name'
> 
> while the actual code implements:
> 
>   'if the name starts with "jump_table"'
> 
> Other than that, I suppose that works just fine ;-)

The thing is, gcc converts a static local variable named "jump_table" to
an ELF symbol with a numbered suffix, something like "jump_table.12345".
But yeah I should at least clarify that in the comment.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-24 Thread Peter Zijlstra
On Thu, May 23, 2019 at 10:24:13AM -0500, Josh Poimboeuf wrote:

> Here's the latest version which should fix it in all cases (based on
> tip/master):
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix

That patch suffers an inconsitency, the comment states:

  'if they have "jump_table" in the name'

while the actual code implements:

  'if the name starts with "jump_table"'

Other than that, I suppose that works just fine ;-)

> There's no need to put special cases in the FP unwinder when we can
> instead just fix the frame pointer usage in the JIT code.
> 
> For ORC, I'm thinking we may be able to just require that all generated
> code (BPF and others) always use frame pointers.  Then when ORC doesn't
> recognize a code address, it could try using the frame pointer as a
> fallback.

Yes, this seems like a sensible approach. We'd also have to audit the
ftrace and kprobe trampolines, IIRC they only do framepointer setup for
CONFIG_FRAME_POINTER currently, which should be easy to fix (after the
patches I have to fix the FP generation in the first place:

  
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=x86/wip
)


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Kairui Song
On Fri, May 24, 2019 at 1:27 AM Josh Poimboeuf  wrote:
>
> On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
> >  On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  
> > wrote:
> > >
> > > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > > Hi Josh, this still won't fix the problem.
> > > > > >
> > > > > > Problem is not (or not only) with ___bpf_prog_run, what actually 
> > > > > > went
> > > > > > wrong is with the JITed bpf code.
> > > > >
> > > > > There seem to be a bunch of issues.  My patch at least fixes the 
> > > > > failing
> > > > > selftest reported by Alexei for ORC.
> > > > >
> > > > > How can I recreate your issue?
> > > >
> > > > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > > > fix stack trace is still invalid.
> > > >
> > > > CMD I used with bcc:
> > > > python3 ./tools/stackcount.py t:sched:sched_fork
> > >
> > > I've had problems in the past getting bcc to build, so I was hoping it
> > > was reproducible with a standalone selftest.
> > >
> > > > And I just had another try applying your patch, self test is also 
> > > > failing.
> > >
> > > Is it the same selftest reported by Alexei?
> > >
> > >   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err 
> > > -1 errno 2
> > >
> > > > I'm applying on my local master branch, a few days older than
> > > > upstream, I can update and try again, am I missing anything?
> > >
> > > The above patch had some issues, so with some configs you might see an
> > > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > > fix the test_stacktrace_map selftest.
> > >
> > > Here's the latest version which should fix it in all cases (based on
> > > tip/master):
> > >
> > >   
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> >
> > Hmm, I still get the failure:
> > test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> > err -1 errno 2
> >
> > And I didn't see how this will fix the issue. As long as ORC need to
> > unwind through the JITed code it will fail. And that will happen
> > before reaching ___bpf_prog_run.
>
> Ok, I was able to recreate by doing
>
>   echo 1 > /proc/sys/net/core/bpf_jit_enable
>
> first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.
>

Yes, with JIT off it will be fixed. I can confirm that.

--
Best Regards,
Kairui Song


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Josh Poimboeuf
On Fri, May 24, 2019 at 12:41:59AM +0800, Kairui Song wrote:
>  On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  wrote:
> >
> > On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > > Hi Josh, this still won't fix the problem.
> > > > >
> > > > > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > > > > wrong is with the JITed bpf code.
> > > >
> > > > There seem to be a bunch of issues.  My patch at least fixes the failing
> > > > selftest reported by Alexei for ORC.
> > > >
> > > > How can I recreate your issue?
> > >
> > > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > > fix stack trace is still invalid.
> > >
> > > CMD I used with bcc:
> > > python3 ./tools/stackcount.py t:sched:sched_fork
> >
> > I've had problems in the past getting bcc to build, so I was hoping it
> > was reproducible with a standalone selftest.
> >
> > > And I just had another try applying your patch, self test is also failing.
> >
> > Is it the same selftest reported by Alexei?
> >
> >   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err 
> > -1 errno 2
> >
> > > I'm applying on my local master branch, a few days older than
> > > upstream, I can update and try again, am I missing anything?
> >
> > The above patch had some issues, so with some configs you might see an
> > objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> > fix the test_stacktrace_map selftest.
> >
> > Here's the latest version which should fix it in all cases (based on
> > tip/master):
> >
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix
> 
> Hmm, I still get the failure:
> test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
> err -1 errno 2
> 
> And I didn't see how this will fix the issue. As long as ORC need to
> unwind through the JITed code it will fail. And that will happen
> before reaching ___bpf_prog_run.

Ok, I was able to recreate by doing

  echo 1 > /proc/sys/net/core/bpf_jit_enable

first.  I'm guessing you have CONFIG_BPF_JIT_ALWAYS_ON.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Kairui Song
 On Thu, May 23, 2019 at 11:24 PM Josh Poimboeuf  wrote:
>
> On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > > Hi Josh, this still won't fix the problem.
> > > >
> > > > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > > > wrong is with the JITed bpf code.
> > >
> > > There seem to be a bunch of issues.  My patch at least fixes the failing
> > > selftest reported by Alexei for ORC.
> > >
> > > How can I recreate your issue?
> >
> > Hmm, I used bcc's example to attach bpf to trace point, and with that
> > fix stack trace is still invalid.
> >
> > CMD I used with bcc:
> > python3 ./tools/stackcount.py t:sched:sched_fork
>
> I've had problems in the past getting bcc to build, so I was hoping it
> was reproducible with a standalone selftest.
>
> > And I just had another try applying your patch, self test is also failing.
>
> Is it the same selftest reported by Alexei?
>
>   test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err -1 
> errno 2
>
> > I'm applying on my local master branch, a few days older than
> > upstream, I can update and try again, am I missing anything?
>
> The above patch had some issues, so with some configs you might see an
> objtool warning for ___bpf_prog_run(), in which case the patch doesn't
> fix the test_stacktrace_map selftest.
>
> Here's the latest version which should fix it in all cases (based on
> tip/master):
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix

Hmm, I still get the failure:
test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap
err -1 errno 2

And I didn't see how this will fix the issue. As long as ORC need to
unwind through the JITed code it will fail. And that will happen
before reaching ___bpf_prog_run.

>
> > > > For frame pointer unwinder, it seems the JITed bpf code will have a
> > > > shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> > > > unshift it properly then it will work.
> > >
> > > Yeah, that looks like a frame pointer bug in emit_prologue().
> > >
> > > > I tried below code, and problem is fixed (only for frame pointer
> > > > unwinder though). Need to find a better way to detect and do any
> > > > similar trick for bpf part, if this is a feasible way to fix it:
> > > >
> > > > diff --git a/arch/x86/kernel/unwind_frame.c 
> > > > b/arch/x86/kernel/unwind_frame.c
> > > > index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> > > > --- a/arch/x86/kernel/unwind_frame.c
> > > > +++ b/arch/x86/kernel/unwind_frame.c
> > > > @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
> > > > }
> > > >
> > > > /* Move to the next frame if it's safe: */
> > > > -   if (!update_stack_state(state, next_bp))
> > > > -   goto bad_address;
> > > > +   if (!update_stack_state(state, next_bp)) {
> > > > +   // Try again with shifted BP
> > > > +   state->bp += 5; // see AUX_STACK_SPACE
> > > > +   next_bp = (unsigned long
> > > > *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> > > > +   // Clean and refetch stack info, it's marked as error 
> > > > outed
> > > > +   state->stack_mask = 0;
> > > > +   get_stack_info(next_bp, state->task,
> > > > >stack_info, >stack_mask);
> > > > +   if (!update_stack_state(state, next_bp)) {
> > > > +   goto bad_address;
> > > > +   }
> > > > +   }
> > > >
> > > > return true;
> > >
> > > Nack.
> > >
> > > > For ORC unwinder, I think the unwinder can't find any info about the
> > > > JITed part. Maybe if can let it just skip the JITed part and go to
> > > > kernel context, then should be good enough.
> > >
> > > If it's starting from a fake pt_regs then that's going to be a
> > > challenge.
> > >
> > > Will the JIT code always have the same stack layout?  If so then we
> > > could hard code that knowledge in ORC.  Or even better, create a generic
> > > interface for ORC to query the creator of the generated code about the
> > > stack layout.
> >
> > I think yes.
> >
> > Not sure why we have the BP shift yet, if the prolog code could be
> > tweaked to work with frame pointer unwinder it will be good to have.
> > But still not for ORC.
> >
> > Will it be a good idea to have a region reserved for the JITed code?
> > Currently it shares the region with "module mapping space". If let it
> > have a separate region, when the unwinder meet any code in that region
> > it will know it's JITed code and then can do something special about
> > it.
> >
> > This should make it much easier for both frame pointer and ORC unwinder to 
> > work.
>
> There's no need to put special cases in the FP unwinder when we can
> instead just fix the frame pointer usage in the JIT code.
>
> For ORC, I'm thinking we may be able to just require that all generated
> code (BPF and others) always use frame pointers.  Then when ORC doesn't
> recognize a code 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Josh Poimboeuf
On Thu, May 23, 2019 at 10:50:24PM +0800, Kairui Song wrote:
> > > Hi Josh, this still won't fix the problem.
> > >
> > > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > > wrong is with the JITed bpf code.
> >
> > There seem to be a bunch of issues.  My patch at least fixes the failing
> > selftest reported by Alexei for ORC.
> >
> > How can I recreate your issue?
> 
> Hmm, I used bcc's example to attach bpf to trace point, and with that
> fix stack trace is still invalid.
> 
> CMD I used with bcc:
> python3 ./tools/stackcount.py t:sched:sched_fork

I've had problems in the past getting bcc to build, so I was hoping it
was reproducible with a standalone selftest.

> And I just had another try applying your patch, self test is also failing.

Is it the same selftest reported by Alexei?

  test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err -1 
errno 2

> I'm applying on my local master branch, a few days older than
> upstream, I can update and try again, am I missing anything?

The above patch had some issues, so with some configs you might see an
objtool warning for ___bpf_prog_run(), in which case the patch doesn't
fix the test_stacktrace_map selftest.

Here's the latest version which should fix it in all cases (based on
tip/master):

  
https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/commit/?h=bpf-orc-fix

> > > For frame pointer unwinder, it seems the JITed bpf code will have a
> > > shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> > > unshift it properly then it will work.
> >
> > Yeah, that looks like a frame pointer bug in emit_prologue().
> >
> > > I tried below code, and problem is fixed (only for frame pointer
> > > unwinder though). Need to find a better way to detect and do any
> > > similar trick for bpf part, if this is a feasible way to fix it:
> > >
> > > diff --git a/arch/x86/kernel/unwind_frame.c 
> > > b/arch/x86/kernel/unwind_frame.c
> > > index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> > > --- a/arch/x86/kernel/unwind_frame.c
> > > +++ b/arch/x86/kernel/unwind_frame.c
> > > @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
> > > }
> > >
> > > /* Move to the next frame if it's safe: */
> > > -   if (!update_stack_state(state, next_bp))
> > > -   goto bad_address;
> > > +   if (!update_stack_state(state, next_bp)) {
> > > +   // Try again with shifted BP
> > > +   state->bp += 5; // see AUX_STACK_SPACE
> > > +   next_bp = (unsigned long
> > > *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> > > +   // Clean and refetch stack info, it's marked as error 
> > > outed
> > > +   state->stack_mask = 0;
> > > +   get_stack_info(next_bp, state->task,
> > > >stack_info, >stack_mask);
> > > +   if (!update_stack_state(state, next_bp)) {
> > > +   goto bad_address;
> > > +   }
> > > +   }
> > >
> > > return true;
> >
> > Nack.
> >
> > > For ORC unwinder, I think the unwinder can't find any info about the
> > > JITed part. Maybe if can let it just skip the JITed part and go to
> > > kernel context, then should be good enough.
> >
> > If it's starting from a fake pt_regs then that's going to be a
> > challenge.
> >
> > Will the JIT code always have the same stack layout?  If so then we
> > could hard code that knowledge in ORC.  Or even better, create a generic
> > interface for ORC to query the creator of the generated code about the
> > stack layout.
> 
> I think yes.
> 
> Not sure why we have the BP shift yet, if the prolog code could be
> tweaked to work with frame pointer unwinder it will be good to have.
> But still not for ORC.
> 
> Will it be a good idea to have a region reserved for the JITed code?
> Currently it shares the region with "module mapping space". If let it
> have a separate region, when the unwinder meet any code in that region
> it will know it's JITed code and then can do something special about
> it.
> 
> This should make it much easier for both frame pointer and ORC unwinder to 
> work.

There's no need to put special cases in the FP unwinder when we can
instead just fix the frame pointer usage in the JIT code.

For ORC, I'm thinking we may be able to just require that all generated
code (BPF and others) always use frame pointers.  Then when ORC doesn't
recognize a code address, it could try using the frame pointer as a
fallback.

Once I get a reproducer I can do the patches for all that.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Kairui Song
On Thu, May 23, 2019 at 9:32 PM Josh Poimboeuf  wrote:
>
> On Thu, May 23, 2019 at 02:48:11PM +0800, Kairui Song wrote:
> > On Thu, May 23, 2019 at 7:46 AM Josh Poimboeuf  wrote:
> > >
> > > On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
> > > > > The one that is broken is prog_tests/stacktrace_map.c
> > > > > There we attach bpf to standard tracepoint where
> > > > > kernel suppose to collect pt_regs before calling into bpf.
> > > > > And that's what bpf_get_stackid_tp() is doing.
> > > > > It passes pt_regs (that was collected before any bpf)
> > > > > into bpf_get_stackid() which calls get_perf_callchain().
> > > > > Same thing with kprobes, uprobes.
> > > >
> > > > Is it trying to unwind through ___bpf_prog_run()?
> > > >
> > > > If so, that would at least explain why ORC isn't working.  Objtool
> > > > currently ignores that function because it can't follow the jump table.
> > >
> > > Here's a tentative fix (for ORC, at least).  I'll need to make sure this
> > > doesn't break anything else.
> > >
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 242a643af82f..1d9a7cc4b836 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
> > > bpf_insn *insn, u64 *stack)
> > > BUG_ON(1);
> > > return 0;
> > >  }
> > > -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
> > >
> > >  #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
> > >  #define DEFINE_BPF_PROG_RUN(stack_size) \
> > > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > > index 172f99195726..2567027fce95 100644
> > > --- a/tools/objtool/check.c
> > > +++ b/tools/objtool/check.c
> > > @@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
> > > objtool_file *file,
> > > if (text_rela->type == R_X86_64_PC32)
> > > table_offset += 4;
> > >
> > > -   /*
> > > -* Make sure the .rodata address isn't associated with a
> > > -* symbol.  gcc jump tables are anonymous data.
> > > -*/
> > > -   if (find_symbol_containing(rodata_sec, table_offset))
> > > -   continue;
> > > -
> > > rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
> > > if (rodata_rela) {
> > > /*
> >
> > Hi Josh, this still won't fix the problem.
> >
> > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > wrong is with the JITed bpf code.
>
> There seem to be a bunch of issues.  My patch at least fixes the failing
> selftest reported by Alexei for ORC.
>
> How can I recreate your issue?

Hmm, I used bcc's example to attach bpf to trace point, and with that
fix stack trace is still invalid.

CMD I used with bcc:
python3 ./tools/stackcount.py t:sched:sched_fork

And I just had another try applying your patch, self test is also failing.

I'm applying on my local master branch, a few days older than
upstream, I can update and try again, am I missing anything?

>
> > For frame pointer unwinder, it seems the JITed bpf code will have a
> > shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> > unshift it properly then it will work.
>
> Yeah, that looks like a frame pointer bug in emit_prologue().
>
> > I tried below code, and problem is fixed (only for frame pointer
> > unwinder though). Need to find a better way to detect and do any
> > similar trick for bpf part, if this is a feasible way to fix it:
> >
> > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> > --- a/arch/x86/kernel/unwind_frame.c
> > +++ b/arch/x86/kernel/unwind_frame.c
> > @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
> > }
> >
> > /* Move to the next frame if it's safe: */
> > -   if (!update_stack_state(state, next_bp))
> > -   goto bad_address;
> > +   if (!update_stack_state(state, next_bp)) {
> > +   // Try again with shifted BP
> > +   state->bp += 5; // see AUX_STACK_SPACE
> > +   next_bp = (unsigned long
> > *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> > +   // Clean and refetch stack info, it's marked as error outed
> > +   state->stack_mask = 0;
> > +   get_stack_info(next_bp, state->task,
> > >stack_info, >stack_mask);
> > +   if (!update_stack_state(state, next_bp)) {
> > +   goto bad_address;
> > +   }
> > +   }
> >
> > return true;
>
> Nack.
>
> > For ORC unwinder, I think the unwinder can't find any info about the
> > JITed part. Maybe if can let it just skip the JITed part and go to
> > kernel context, then should be good enough.
>
> If it's 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Josh Poimboeuf
On Thu, May 23, 2019 at 02:48:11PM +0800, Kairui Song wrote:
> On Thu, May 23, 2019 at 7:46 AM Josh Poimboeuf  wrote:
> >
> > On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
> > > On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
> > > > The one that is broken is prog_tests/stacktrace_map.c
> > > > There we attach bpf to standard tracepoint where
> > > > kernel suppose to collect pt_regs before calling into bpf.
> > > > And that's what bpf_get_stackid_tp() is doing.
> > > > It passes pt_regs (that was collected before any bpf)
> > > > into bpf_get_stackid() which calls get_perf_callchain().
> > > > Same thing with kprobes, uprobes.
> > >
> > > Is it trying to unwind through ___bpf_prog_run()?
> > >
> > > If so, that would at least explain why ORC isn't working.  Objtool
> > > currently ignores that function because it can't follow the jump table.
> >
> > Here's a tentative fix (for ORC, at least).  I'll need to make sure this
> > doesn't break anything else.
> >
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 242a643af82f..1d9a7cc4b836 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
> > bpf_insn *insn, u64 *stack)
> > BUG_ON(1);
> > return 0;
> >  }
> > -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
> >
> >  #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
> >  #define DEFINE_BPF_PROG_RUN(stack_size) \
> > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > index 172f99195726..2567027fce95 100644
> > --- a/tools/objtool/check.c
> > +++ b/tools/objtool/check.c
> > @@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
> > objtool_file *file,
> > if (text_rela->type == R_X86_64_PC32)
> > table_offset += 4;
> >
> > -   /*
> > -* Make sure the .rodata address isn't associated with a
> > -* symbol.  gcc jump tables are anonymous data.
> > -*/
> > -   if (find_symbol_containing(rodata_sec, table_offset))
> > -   continue;
> > -
> > rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
> > if (rodata_rela) {
> > /*
> 
> Hi Josh, this still won't fix the problem.
> 
> Problem is not (or not only) with ___bpf_prog_run, what actually went
> wrong is with the JITed bpf code.

There seem to be a bunch of issues.  My patch at least fixes the failing
selftest reported by Alexei for ORC.

How can I recreate your issue?

> For frame pointer unwinder, it seems the JITed bpf code will have a
> shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> unshift it properly then it will work.

Yeah, that looks like a frame pointer bug in emit_prologue().

> I tried below code, and problem is fixed (only for frame pointer
> unwinder though). Need to find a better way to detect and do any
> similar trick for bpf part, if this is a feasible way to fix it:
> 
> diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> --- a/arch/x86/kernel/unwind_frame.c
> +++ b/arch/x86/kernel/unwind_frame.c
> @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
> }
> 
> /* Move to the next frame if it's safe: */
> -   if (!update_stack_state(state, next_bp))
> -   goto bad_address;
> +   if (!update_stack_state(state, next_bp)) {
> +   // Try again with shifted BP
> +   state->bp += 5; // see AUX_STACK_SPACE
> +   next_bp = (unsigned long
> *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> +   // Clean and refetch stack info, it's marked as error outed
> +   state->stack_mask = 0;
> +   get_stack_info(next_bp, state->task,
> >stack_info, >stack_mask);
> +   if (!update_stack_state(state, next_bp)) {
> +   goto bad_address;
> +   }
> +   }
> 
> return true;

Nack.

> For ORC unwinder, I think the unwinder can't find any info about the
> JITed part. Maybe if can let it just skip the JITed part and go to
> kernel context, then should be good enough.

If it's starting from a fake pt_regs then that's going to be a
challenge.

Will the JIT code always have the same stack layout?  If so then we
could hard code that knowledge in ORC.  Or even better, create a generic
interface for ORC to query the creator of the generated code about the
stack layout.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Kairui Song
 On Thu, May 23, 2019 at 4:28 PM Song Liu  wrote:
>
> > On May 22, 2019, at 11:48 PM, Kairui Song  wrote:
> >
> > On Thu, May 23, 2019 at 7:46 AM Josh Poimboeuf  wrote:
> >>
> >> On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
> >>> On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
>  The one that is broken is prog_tests/stacktrace_map.c
>  There we attach bpf to standard tracepoint where
>  kernel suppose to collect pt_regs before calling into bpf.
>  And that's what bpf_get_stackid_tp() is doing.
>  It passes pt_regs (that was collected before any bpf)
>  into bpf_get_stackid() which calls get_perf_callchain().
>  Same thing with kprobes, uprobes.
> >>>
> >>> Is it trying to unwind through ___bpf_prog_run()?
> >>>
> >>> If so, that would at least explain why ORC isn't working.  Objtool
> >>> currently ignores that function because it can't follow the jump table.
> >>
> >> Here's a tentative fix (for ORC, at least).  I'll need to make sure this
> >> doesn't break anything else.
> >>
> >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> >> index 242a643af82f..1d9a7cc4b836 100644
> >> --- a/kernel/bpf/core.c
> >> +++ b/kernel/bpf/core.c
> >> @@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
> >> bpf_insn *insn, u64 *stack)
> >>BUG_ON(1);
> >>return 0;
> >> }
> >> -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
> >>
> >> #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
> >> #define DEFINE_BPF_PROG_RUN(stack_size) \
> >> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> >> index 172f99195726..2567027fce95 100644
> >> --- a/tools/objtool/check.c
> >> +++ b/tools/objtool/check.c
> >> @@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
> >> objtool_file *file,
> >>if (text_rela->type == R_X86_64_PC32)
> >>table_offset += 4;
> >>
> >> -   /*
> >> -* Make sure the .rodata address isn't associated with a
> >> -* symbol.  gcc jump tables are anonymous data.
> >> -*/
> >> -   if (find_symbol_containing(rodata_sec, table_offset))
> >> -   continue;
> >> -
> >>rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
> >>if (rodata_rela) {
> >>/*
> >
> > Hi Josh, this still won't fix the problem.
> >
> > Problem is not (or not only) with ___bpf_prog_run, what actually went
> > wrong is with the JITed bpf code.
> >
> > For frame pointer unwinder, it seems the JITed bpf code will have a
> > shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> > unshift it properly then it will work.
> >
> > I tried below code, and problem is fixed (only for frame pointer
> > unwinder though). Need to find a better way to detect and do any
> > similar trick for bpf part, if this is a feasible way to fix it:
> >
> > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> > index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> > --- a/arch/x86/kernel/unwind_frame.c
> > +++ b/arch/x86/kernel/unwind_frame.c
> > @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
> >}
> >
> >/* Move to the next frame if it's safe: */
> > -   if (!update_stack_state(state, next_bp))
> > -   goto bad_address;
> > +   if (!update_stack_state(state, next_bp)) {
> > +   // Try again with shifted BP
> > +   state->bp += 5; // see AUX_STACK_SPACE
> > +   next_bp = (unsigned long
> > *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> > +   // Clean and refetch stack info, it's marked as error outed
> > +   state->stack_mask = 0;
> > +   get_stack_info(next_bp, state->task,
> > >stack_info, >stack_mask);
> > +   if (!update_stack_state(state, next_bp)) {
> > +   goto bad_address;
> > +   }
> > +   }
> >
> >return true;
> >
> > For ORC unwinder, I think the unwinder can't find any info about the
> > JITed part. Maybe if can let it just skip the JITed part and go to
> > kernel context, then should be good enough.
>
> In this case (tracepoint), the callchain bpf_get_stackid() fetches is the
> callchain at the tracepoint. So we don't need the JITed part.
>

We don't want the JITed part indeed, but the unwinder is now being
called to start directly within the BPF JITed program, so it have to
traceback above the JITed part and ignoring the JITed part.

> BPF program passes the regs at the tracepoint to perf_callchain_kernel().
> However, perf_callchain_kernel() only uses regs->sp for !perf_hw_regs()
> case. This is probably expected, as passing regs in doesn't really help.
>

ORC is not working with a partial dumped regs, and without frame
pointer, can't get a valid BP value in the tracepoint. So it only used

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Song Liu



> On May 22, 2019, at 11:48 PM, Kairui Song  wrote:
> 
> On Thu, May 23, 2019 at 7:46 AM Josh Poimboeuf  wrote:
>> 
>> On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
>>> On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
 The one that is broken is prog_tests/stacktrace_map.c
 There we attach bpf to standard tracepoint where
 kernel suppose to collect pt_regs before calling into bpf.
 And that's what bpf_get_stackid_tp() is doing.
 It passes pt_regs (that was collected before any bpf)
 into bpf_get_stackid() which calls get_perf_callchain().
 Same thing with kprobes, uprobes.
>>> 
>>> Is it trying to unwind through ___bpf_prog_run()?
>>> 
>>> If so, that would at least explain why ORC isn't working.  Objtool
>>> currently ignores that function because it can't follow the jump table.
>> 
>> Here's a tentative fix (for ORC, at least).  I'll need to make sure this
>> doesn't break anything else.
>> 
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 242a643af82f..1d9a7cc4b836 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
>> bpf_insn *insn, u64 *stack)
>>BUG_ON(1);
>>return 0;
>> }
>> -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
>> 
>> #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
>> #define DEFINE_BPF_PROG_RUN(stack_size) \
>> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
>> index 172f99195726..2567027fce95 100644
>> --- a/tools/objtool/check.c
>> +++ b/tools/objtool/check.c
>> @@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
>> objtool_file *file,
>>if (text_rela->type == R_X86_64_PC32)
>>table_offset += 4;
>> 
>> -   /*
>> -* Make sure the .rodata address isn't associated with a
>> -* symbol.  gcc jump tables are anonymous data.
>> -*/
>> -   if (find_symbol_containing(rodata_sec, table_offset))
>> -   continue;
>> -
>>rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
>>if (rodata_rela) {
>>/*
> 
> Hi Josh, this still won't fix the problem.
> 
> Problem is not (or not only) with ___bpf_prog_run, what actually went
> wrong is with the JITed bpf code.
> 
> For frame pointer unwinder, it seems the JITed bpf code will have a
> shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
> unshift it properly then it will work.
> 
> I tried below code, and problem is fixed (only for frame pointer
> unwinder though). Need to find a better way to detect and do any
> similar trick for bpf part, if this is a feasible way to fix it:
> 
> diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> index 9b9fd4826e7a..2c0fa2aaa7e4 100644
> --- a/arch/x86/kernel/unwind_frame.c
> +++ b/arch/x86/kernel/unwind_frame.c
> @@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
>}
> 
>/* Move to the next frame if it's safe: */
> -   if (!update_stack_state(state, next_bp))
> -   goto bad_address;
> +   if (!update_stack_state(state, next_bp)) {
> +   // Try again with shifted BP
> +   state->bp += 5; // see AUX_STACK_SPACE
> +   next_bp = (unsigned long
> *)READ_ONCE_TASK_STACK(state->task, *state->bp);
> +   // Clean and refetch stack info, it's marked as error outed
> +   state->stack_mask = 0;
> +   get_stack_info(next_bp, state->task,
> >stack_info, >stack_mask);
> +   if (!update_stack_state(state, next_bp)) {
> +   goto bad_address;
> +   }
> +   }
> 
>return true;
> 
> For ORC unwinder, I think the unwinder can't find any info about the
> JITed part. Maybe if can let it just skip the JITed part and go to
> kernel context, then should be good enough.

In this case (tracepoint), the callchain bpf_get_stackid() fetches is the 
callchain at the tracepoint. So we don't need the JITed part. 

BPF program passes the regs at the tracepoint to perf_callchain_kernel(). 
However, perf_callchain_kernel() only uses regs->sp for !perf_hw_regs()
case. This is probably expected, as passing regs in doesn't really help. 

There are multiple cases in unwind_orc.c:__unwind_start(), which I don't 
understand very well. 

Does the above make sense? Did I mis-understand anything?

@Alexei, do you remember some rough time/version that ORC unwinder works
well for tracepoints? Maybe we can dig into that version to see the
difference.   

Thanks,
Song

> 
> 
> 
> 
> 
> --
> Best Regards,
> Kairui Song



Re: Getting empty callchain from perf_callchain_kernel()

2019-05-23 Thread Kairui Song
On Thu, May 23, 2019 at 7:46 AM Josh Poimboeuf  wrote:
>
> On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
> > On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
> > > The one that is broken is prog_tests/stacktrace_map.c
> > > There we attach bpf to standard tracepoint where
> > > kernel suppose to collect pt_regs before calling into bpf.
> > > And that's what bpf_get_stackid_tp() is doing.
> > > It passes pt_regs (that was collected before any bpf)
> > > into bpf_get_stackid() which calls get_perf_callchain().
> > > Same thing with kprobes, uprobes.
> >
> > Is it trying to unwind through ___bpf_prog_run()?
> >
> > If so, that would at least explain why ORC isn't working.  Objtool
> > currently ignores that function because it can't follow the jump table.
>
> Here's a tentative fix (for ORC, at least).  I'll need to make sure this
> doesn't break anything else.
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 242a643af82f..1d9a7cc4b836 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
> bpf_insn *insn, u64 *stack)
> BUG_ON(1);
> return 0;
>  }
> -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
>
>  #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
>  #define DEFINE_BPF_PROG_RUN(stack_size) \
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 172f99195726..2567027fce95 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
> objtool_file *file,
> if (text_rela->type == R_X86_64_PC32)
> table_offset += 4;
>
> -   /*
> -* Make sure the .rodata address isn't associated with a
> -* symbol.  gcc jump tables are anonymous data.
> -*/
> -   if (find_symbol_containing(rodata_sec, table_offset))
> -   continue;
> -
> rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
> if (rodata_rela) {
> /*

Hi Josh, this still won't fix the problem.

Problem is not (or not only) with ___bpf_prog_run, what actually went
wrong is with the JITed bpf code.

For frame pointer unwinder, it seems the JITed bpf code will have a
shifted "BP" register? (arch/x86/net/bpf_jit_comp.c:217), so if we can
unshift it properly then it will work.

I tried below code, and problem is fixed (only for frame pointer
unwinder though). Need to find a better way to detect and do any
similar trick for bpf part, if this is a feasible way to fix it:

diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 9b9fd4826e7a..2c0fa2aaa7e4 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -330,8 +330,17 @@ bool unwind_next_frame(struct unwind_state *state)
}

/* Move to the next frame if it's safe: */
-   if (!update_stack_state(state, next_bp))
-   goto bad_address;
+   if (!update_stack_state(state, next_bp)) {
+   // Try again with shifted BP
+   state->bp += 5; // see AUX_STACK_SPACE
+   next_bp = (unsigned long
*)READ_ONCE_TASK_STACK(state->task, *state->bp);
+   // Clean and refetch stack info, it's marked as error outed
+   state->stack_mask = 0;
+   get_stack_info(next_bp, state->task,
>stack_info, >stack_mask);
+   if (!update_stack_state(state, next_bp)) {
+   goto bad_address;
+   }
+   }

return true;

For ORC unwinder, I think the unwinder can't find any info about the
JITed part. Maybe if can let it just skip the JITed part and go to
kernel context, then should be good enough.





--
Best Regards,
Kairui Song


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Josh Poimboeuf
On Wed, May 22, 2019 at 12:45:17PM -0500, Josh Poimboeuf wrote:
> On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
> > The one that is broken is prog_tests/stacktrace_map.c
> > There we attach bpf to standard tracepoint where
> > kernel suppose to collect pt_regs before calling into bpf.
> > And that's what bpf_get_stackid_tp() is doing.
> > It passes pt_regs (that was collected before any bpf)
> > into bpf_get_stackid() which calls get_perf_callchain().
> > Same thing with kprobes, uprobes.
> 
> Is it trying to unwind through ___bpf_prog_run()?
> 
> If so, that would at least explain why ORC isn't working.  Objtool
> currently ignores that function because it can't follow the jump table.

Here's a tentative fix (for ORC, at least).  I'll need to make sure this
doesn't break anything else.

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 242a643af82f..1d9a7cc4b836 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1562,7 +1562,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
bpf_insn *insn, u64 *stack)
BUG_ON(1);
return 0;
 }
-STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */
 
 #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
 #define DEFINE_BPF_PROG_RUN(stack_size) \
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 172f99195726..2567027fce95 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1033,13 +1033,6 @@ static struct rela *find_switch_table(struct 
objtool_file *file,
if (text_rela->type == R_X86_64_PC32)
table_offset += 4;
 
-   /*
-* Make sure the .rodata address isn't associated with a
-* symbol.  gcc jump tables are anonymous data.
-*/
-   if (find_symbol_containing(rodata_sec, table_offset))
-   continue;
-
rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
if (rodata_rela) {
/*


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Alexei Starovoitov
On 5/22/19 11:07 AM, Josh Poimboeuf wrote:
> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>> On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra  wrote:
>>>
>>> On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
 On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> Hi,
>
> We found a failure with selftests/bpf/tests_prog in test_stacktrace_map 
> (on bpf/master
> branch).
>
> After digging into the code, we found that perf_callchain_kernel() is 
> giving empty
> callchain for tracepoint sched/sched_switch. And it seems related to 
> commit
>
> d15d356887e770c5f2dcf963b52c7cb510c9e42d
> ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
>
> Before this commit, perf_callchain_kernel() returns callchain with 
> regs->ip. With
> this commit, regs->ip is not sent for !perf_hw_regs(regs) case.

 So while I think the below is indeed right; we should store regs->ip
 regardless of the unwind path chosen, I still think there's something
 fishy if this results in just the 1 entry.

 The sched/sched_switch event really should have a non-trivial stack.

 Let me see if I can reproduce with just perf.
>>>
>>> $ perf record -g -e "sched:sched_switch" -- make clean
>>> $ perf report -D
>>>
>>> 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
>>> 0x81c29562 period: 1 addr: 0
>>> ... FP chain: nr:10
>>> .  0: ff80
>>> .  1: 81c29562
>>> .  2: 81c29933
>>> .  3: 8111f688
>>> .  4: 81120b9d
>>> .  5: 81120ce5
>>> .  6: 8100254a
>>> .  7: 81e0007d
>>> .  8: fe00
>>> .  9: 7f9b6cd9682a
>>> ... thread: sh:7236
>>> .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
>>>
>>>
>>> IOW, it seems to 'work'.
>>>
>>
>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>> some other bfp functions) is now broken, or, strating an unwind
>> directly inside a bpf program will end up strangely. It have following
>> kernel message:
>>
>> WARNING: kernel stack frame pointer at 70cad07c in
>> test_progs:1242 has bad value ffd4497e
>>
>> And in the debug message:
>>
>> [  160.460287] 6e117175: aa23a0e8
>> (get_perf_callchain+0x148/0x280)
>> [  160.460287] 02e8715f: 00c6bba0 (0xc6bba0)
>> [  160.460288] b3d07758: 9ce3f979 (0x9ce3f979)
>> [  160.460289] 55c97836: 9ce3f979 (0x9ce3f979)
>> [  160.460289] 7cbb140a: 0001007f (0x1007f)
>> [  160.460290] 7dc62ac9:  ...
>> [  160.460290] 6b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
>> [  160.460291] f23d1826: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460292] f9a16017: 007f (0x7f)
>> [  160.460292] a8e62d44:  ...
>> [  160.460293] cbc83c97: b89d00d8d000 (0xb89d00d8d000)
>> [  160.460293] 92842522: 007f (0x7f)
>> [  160.460294] dafbec89: b89d00c6bc50 (0xb89d00c6bc50)
>> [  160.460296] 0777e4cf: aa225d97 
>> (bpf_get_stackid+0x77/0x470)
>> [  160.460296] 9942ea16:  ...
>> [  160.460297] a08006b1: 0001 (0x1)
>> [  160.460298] 9f03b438: b89d00c6bc30 (0xb89d00c6bc30)
>> [  160.460299] 6caf8b32: aa074fe8 (__do_page_fault+0x58/0x90)
>> [  160.460300] 3a13d702:  ...
>> [  160.460300] e2e2496d: 9ce3 (0x9ce3)
>> [  160.460301] 8ee6b7c2: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460301] a8cf6d55:  ...
>> [  160.460302] 59078076: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460303] c6aac739: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
>> [  160.460303] a39aff92: b89d00c6bc60 (0xb89d00c6bc60)
>> [  160.460305] 97498bf2: aa1f4791 
>> (bpf_get_stackid_tp+0x11/0x20)
>> [  160.460306] 6992de1e: b89d00c6bc78 (0xb89d00c6bc78)
>> [  160.460307] dacd0ce5: c0405676 (0xc0405676)
>> [  160.460307] a81f2714:  ...
>>
>> # Note here is the invalid frame pointer
>> [  160.460308] 70cad07c: b89d00a1d000 (0xb89d00a1d000)
>> [  160.460308] f8f194e4: 0001 (0x1)
>> [  160.460309] 2134f42a: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460310] f9345889: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
>> [  160.460310] 8ad22a42:  ...
>> [  160.460311] 73808173: b89d00c6bce0 (0xb89d00c6bce0)
>> [  160.460312] c9e4: aa1f48d1 (trace_call_bpf+0x81/0x100)
>> [  160.460313] c5d8ebd1: b89d00c6bcc0 (0xb89d00c6bcc0)
>> [  160.460315] bce0b072: 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Josh Poimboeuf
On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra  wrote:
> >
> > On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> > > > Hi,
> > > >
> > > > We found a failure with selftests/bpf/tests_prog in test_stacktrace_map 
> > > > (on bpf/master
> > > > branch).
> > > >
> > > > After digging into the code, we found that perf_callchain_kernel() is 
> > > > giving empty
> > > > callchain for tracepoint sched/sched_switch. And it seems related to 
> > > > commit
> > > >
> > > > d15d356887e770c5f2dcf963b52c7cb510c9e42d
> > > > ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> > > >
> > > > Before this commit, perf_callchain_kernel() returns callchain with 
> > > > regs->ip. With
> > > > this commit, regs->ip is not sent for !perf_hw_regs(regs) case.
> > >
> > > So while I think the below is indeed right; we should store regs->ip
> > > regardless of the unwind path chosen, I still think there's something
> > > fishy if this results in just the 1 entry.
> > >
> > > The sched/sched_switch event really should have a non-trivial stack.
> > >
> > > Let me see if I can reproduce with just perf.
> >
> > $ perf record -g -e "sched:sched_switch" -- make clean
> > $ perf report -D
> >
> > 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
> > 0x81c29562 period: 1 addr: 0
> > ... FP chain: nr:10
> > .  0: ff80
> > .  1: 81c29562
> > .  2: 81c29933
> > .  3: 8111f688
> > .  4: 81120b9d
> > .  5: 81120ce5
> > .  6: 8100254a
> > .  7: 81e0007d
> > .  8: fe00
> > .  9: 7f9b6cd9682a
> > ... thread: sh:7236
> > .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
> >
> >
> > IOW, it seems to 'work'.
> >
> 
> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> some other bfp functions) is now broken, or, strating an unwind
> directly inside a bpf program will end up strangely. It have following
> kernel message:
> 
> WARNING: kernel stack frame pointer at 70cad07c in
> test_progs:1242 has bad value ffd4497e
> 
> And in the debug message:
> 
> [  160.460287] 6e117175: aa23a0e8
> (get_perf_callchain+0x148/0x280)
> [  160.460287] 02e8715f: 00c6bba0 (0xc6bba0)
> [  160.460288] b3d07758: 9ce3f979 (0x9ce3f979)
> [  160.460289] 55c97836: 9ce3f979 (0x9ce3f979)
> [  160.460289] 7cbb140a: 0001007f (0x1007f)
> [  160.460290] 7dc62ac9:  ...
> [  160.460290] 6b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
> [  160.460291] f23d1826: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460292] f9a16017: 007f (0x7f)
> [  160.460292] a8e62d44:  ...
> [  160.460293] cbc83c97: b89d00d8d000 (0xb89d00d8d000)
> [  160.460293] 92842522: 007f (0x7f)
> [  160.460294] dafbec89: b89d00c6bc50 (0xb89d00c6bc50)
> [  160.460296] 0777e4cf: aa225d97 (bpf_get_stackid+0x77/0x470)
> [  160.460296] 9942ea16:  ...
> [  160.460297] a08006b1: 0001 (0x1)
> [  160.460298] 9f03b438: b89d00c6bc30 (0xb89d00c6bc30)
> [  160.460299] 6caf8b32: aa074fe8 (__do_page_fault+0x58/0x90)
> [  160.460300] 3a13d702:  ...
> [  160.460300] e2e2496d: 9ce3 (0x9ce3)
> [  160.460301] 8ee6b7c2: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460301] a8cf6d55:  ...
> [  160.460302] 59078076: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460303] c6aac739: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
> [  160.460303] a39aff92: b89d00c6bc60 (0xb89d00c6bc60)
> [  160.460305] 97498bf2: aa1f4791 
> (bpf_get_stackid_tp+0x11/0x20)
> [  160.460306] 6992de1e: b89d00c6bc78 (0xb89d00c6bc78)
> [  160.460307] dacd0ce5: c0405676 (0xc0405676)
> [  160.460307] a81f2714:  ...
> 
> # Note here is the invalid frame pointer
> [  160.460308] 70cad07c: b89d00a1d000 (0xb89d00a1d000)
> [  160.460308] f8f194e4: 0001 (0x1)
> [  160.460309] 2134f42a: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460310] f9345889: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
> [  160.460310] 8ad22a42:  ...
> [  160.460311] 73808173: b89d00c6bce0 (0xb89d00c6bce0)
> [  160.460312] c9e4: aa1f48d1 (trace_call_bpf+0x81/0x100)
> [  160.460313] c5d8ebd1: b89d00c6bcc0 (0xb89d00c6bcc0)
> [  160.460315] bce0b072: ab651be0
> (event_sched_migrate_task+0xa0/0xa0)

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Josh Poimboeuf
On Wed, May 22, 2019 at 02:49:07PM +, Alexei Starovoitov wrote:
> The one that is broken is prog_tests/stacktrace_map.c
> There we attach bpf to standard tracepoint where
> kernel suppose to collect pt_regs before calling into bpf.
> And that's what bpf_get_stackid_tp() is doing.
> It passes pt_regs (that was collected before any bpf)
> into bpf_get_stackid() which calls get_perf_callchain().
> Same thing with kprobes, uprobes.

Is it trying to unwind through ___bpf_prog_run()?

If so, that would at least explain why ORC isn't working.  Objtool
currently ignores that function because it can't follow the jump table.

-- 
Josh


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Alexei Starovoitov
On 5/22/19 7:02 AM, Peter Zijlstra wrote:
> 
>> If the unwinder could trace back through the bpf func call then there
>> will be no such problem.
> 
> Why couldn't it trace back through the bpf stuff? And how can we fix
> that?

Most of the time there is no 'tracing through bpf stuff'.
bpf infra is preserving 'pt_regs*' that was collected before
any bpf things start executing.
The reason is that bpf can be executed via interpreter and
directly when JITed.
In both cases collected stack traces should be the same or
it's confusing the users and they cannot compensate for such
difference.

The only exception is raw_tracepoint, since it's the most
minimalistic way of calling bpf and kernel side doesn't do
anything before calling into bpf.
Only in such case bpf side has to call perf_fetch_caller_regs().
See bpf_get_stackid_raw_tp().
But this test case is actually working!
It's covered by prog_tests/stacktrace_map_raw_tp.c and
it passes.
The one that is broken is prog_tests/stacktrace_map.c
There we attach bpf to standard tracepoint where
kernel suppose to collect pt_regs before calling into bpf.
And that's what bpf_get_stackid_tp() is doing.
It passes pt_regs (that was collected before any bpf)
into bpf_get_stackid() which calls get_perf_callchain().
Same thing with kprobes, uprobes.




Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Peter Zijlstra
On Mon, May 20, 2019 at 02:06:54AM +0800, Kairui Song wrote:
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra  wrote:
> >
> > On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> > > Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> > > some other bfp functions) is now broken, or, strating an unwind
> > > directly inside a bpf program will end up strangely. It have following
> > > kernel message:
> >
> > Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> > follow.
> 
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
> 
> But if the stack trace is started withing a bpf func call then it's broken...

I'm confused.. how is this broken? Surely we should eventually find the
original stack frame and be good again, right?

> If the unwinder could trace back through the bpf func call then there
> will be no such problem.

Why couldn't it trace back through the bpf stuff? And how can we fix
that?


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-22 Thread Peter Zijlstra
On Mon, May 20, 2019 at 05:22:12PM +, Song Liu wrote:
> I think this is still the best fix/workaround here? And only one level 
> of stack trace should be OK for tracepoint? 

No. That's still completely broken.


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-20 Thread Song Liu



> On May 19, 2019, at 11:07 AM, Kairui Song  wrote:
> 
> On Sat, May 18, 2019 at 5:48 AM Song Liu  wrote:
>> 
>> 
>> 
>>> On May 17, 2019, at 2:06 PM, Alexei Starovoitov  wrote:
>>> 
>>> On 5/17/19 11:40 AM, Song Liu wrote:
 +Alexei, Daniel, and bpf
 
> On May 17, 2019, at 2:10 AM, Peter Zijlstra  wrote:
> 
> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>> some other bfp functions) is now broken, or, strating an unwind
>> directly inside a bpf program will end up strangely. It have following
>> kernel message:
> 
> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> follow.
 
 I guess we need something like the following? (we should be able to
 optimize the PER_CPU stuff).
 
 Thanks,
 Song
 
 
 diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
 index f92d6ad5e080..c525149028a7 100644
 --- i/kernel/trace/bpf_trace.c
 +++ w/kernel/trace/bpf_trace.c
 @@ -696,11 +696,13 @@ static const struct bpf_func_proto 
 bpf_perf_event_output_proto_tp = {
.arg5_type  = ARG_CONST_SIZE_OR_ZERO,
 };
 
 +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
 BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
   u64, flags)
 {
 -   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
 +   struct pt_regs *regs = this_cpu_ptr(_stackid_tp_regs);
 
 +   perf_fetch_caller_regs(regs);
>>> 
>>> No. pt_regs is already passed in. It's the first argument.
>>> If we call perf_fetch_caller_regs() again the stack trace will be wrong.
>>> bpf prog should not see itself, interpreter or all the frames in between.
>> 
>> Thanks Alexei! I get it now.
>> 
>> In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field
>> of tp_buff:
>> 
>>struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>> 
>> tp_buff points to something like
>> 
>>struct sched_switch_args {
>>unsigned long long pad;
>>char prev_comm[16];
>>int prev_pid;
>>int prev_prio;
>>long long prev_state;
>>char next_comm[16];
>>int next_pid;
>>int next_prio;
>>};
>> 
>> where the first field "pad" is a pointer to pt_regs.
>> 
>> @Kairui, I think you confirmed that current code will give empty call trace
>> with ORC unwinder? If that's the case, can we add regs->ip back? (as in the
>> first email of this thread.
>> 
>> Thanks,
>> Song
>> 
> 
> Hi thanks for the suggestion, yes we can add it should be good an idea
> to always have IP when stack trace is not available.
> But stack trace is actually still broken, it will always give only one
> level of stacktrace (the IP).

I think this is still the best fix/workaround here? And only one level 
of stack trace should be OK for tracepoint? 

Thanks,
Song

> 
> -- 
> Best Regards,
> Kairui Song



Re: Getting empty callchain from perf_callchain_kernel()

2019-05-20 Thread Song Liu
Sorry for previous empty email.. Clicked send by accident. 

> On May 19, 2019, at 11:06 AM, Kairui Song  wrote:
> 
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra  wrote:
>> 
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>> 
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
> 
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
> 
> But if the stack trace is started withing a bpf func call then it's broken...
> 
> If the unwinder could trace back through the bpf func call then there
> will be no such problem.
> 
> For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
> before bpf call, and ensure bp is also dumped could fix it (so it will
> start using the regs for bpf calls, like before the commit
> d15d356887e7). But for ORC I don't see a clear way to fix the problem.
> First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
> DX, R10, R13, else?) in the trace point handler, then use the flag to
> indicate ORC to do one more unwind (because can't get caller's regs,
> so get callee's regs instaed) before actually giving output?
> 
> I had a try, for framepointer unwinder, mark the indicator flag before
> calling bpf functions, and dump bp as well in the trace point. Then
> with frame pointer, it works, test passed:
> 
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 1392d5e6e8d6..6f1192e9776b 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
> pt_regs *regs);
> 
> #include 
> 
> +#ifdef CONFIG_FRAME_POINTER
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +   return (unsigned long)__builtin_frame_address(1);
> +}
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +   return 0;
> +}
> +#endif
> +
> /*
>  * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
>  * and the comment with PERF_EFLAGS_EXACT.
>  */
> #define perf_arch_fetch_caller_regs(regs, __ip){   \
>(regs)->ip = (__ip);\
> +   (regs)->bp = caller_frame_pointer();\
>(regs)->sp = (unsigned long)__builtin_frame_address(0); \
>(regs)->cs = __KERNEL_CS;   \
>regs->flags = 0;\
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..ca7b95ee74f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
> int size, int rctx,
>   struct task_struct *task)
> {
>if (bpf_prog_array_valid(call)) {
> +   regs->flags |= X86_EFLAGS_FIXED;
>*(struct pt_regs **)raw_data = regs;
>if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
>perf_swevent_put_recursion_context(rctx);
> @@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event 
> *event,
>int ret = 0;
> 
>ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> +   ctx.regs->flags |= X86_EFLAGS_FIXED;
> +
>preempt_disable();
>if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
>goto out;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..e1fa656677dc 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
> flags, void *meta, u64 meta_size,
>};
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>perf_sample_data_init(sd, 0, 0);
>sd->raw = 
> 
> @@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>return bpf_perf_event_output(regs, map, flags, data, size);
> }
> 
> @@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>  

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-20 Thread Song Liu



> On May 19, 2019, at 11:06 AM, Kairui Song  wrote:
> 
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra  wrote:
>> 
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>> 
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
> 
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
> 
> But if the stack trace is started withing a bpf func call then it's broken...
> 
> If the unwinder could trace back through the bpf func call then there
> will be no such problem.
> 
> For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
> before bpf call, and ensure bp is also dumped could fix it (so it will
> start using the regs for bpf calls, like before the commit
> d15d356887e7). But for ORC I don't see a clear way to fix the problem.
> First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
> DX, R10, R13, else?) in the trace point handler, then use the flag to
> indicate ORC to do one more unwind (because can't get caller's regs,
> so get callee's regs instaed) before actually giving output?
> 
> I had a try, for framepointer unwinder, mark the indicator flag before
> calling bpf functions, and dump bp as well in the trace point. Then
> with frame pointer, it works, test passed:
> 
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 1392d5e6e8d6..6f1192e9776b 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
> pt_regs *regs);
> 
> #include 
> 
> +#ifdef CONFIG_FRAME_POINTER
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +   return (unsigned long)__builtin_frame_address(1);
> +}
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +   return 0;
> +}
> +#endif
> +
> /*
>  * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
>  * and the comment with PERF_EFLAGS_EXACT.
>  */
> #define perf_arch_fetch_caller_regs(regs, __ip){   \
>(regs)->ip = (__ip);\
> +   (regs)->bp = caller_frame_pointer();\
>(regs)->sp = (unsigned long)__builtin_frame_address(0); \
>(regs)->cs = __KERNEL_CS;   \
>regs->flags = 0;\
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..ca7b95ee74f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
> int size, int rctx,
>   struct task_struct *task)
> {
>if (bpf_prog_array_valid(call)) {
> +   regs->flags |= X86_EFLAGS_FIXED;
>*(struct pt_regs **)raw_data = regs;
>if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
>perf_swevent_put_recursion_context(rctx);
> @@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event 
> *event,
>int ret = 0;
> 
>ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> +   ctx.regs->flags |= X86_EFLAGS_FIXED;
> +
>preempt_disable();
>if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
>goto out;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..e1fa656677dc 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
> flags, void *meta, u64 meta_size,
>};
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>perf_sample_data_init(sd, 0, 0);
>sd->raw = 
> 
> @@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>return bpf_perf_event_output(regs, map, flags, data, size);
> }
> 
> @@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);
> 
>perf_fetch_caller_regs(regs);
> +   regs->flags |= X86_EFLAGS_FIXED;
> +
>/* similar to bpf_perf_event_output_tp, but pt_regs 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-19 Thread Kairui Song
On Sat, May 18, 2019 at 5:48 AM Song Liu  wrote:
>
>
>
> > On May 17, 2019, at 2:06 PM, Alexei Starovoitov  wrote:
> >
> > On 5/17/19 11:40 AM, Song Liu wrote:
> >> +Alexei, Daniel, and bpf
> >>
> >>> On May 17, 2019, at 2:10 AM, Peter Zijlstra  wrote:
> >>>
> >>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>  Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>  some other bfp functions) is now broken, or, strating an unwind
>  directly inside a bpf program will end up strangely. It have following
>  kernel message:
> >>>
> >>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> >>> follow.
> >>
> >> I guess we need something like the following? (we should be able to
> >> optimize the PER_CPU stuff).
> >>
> >> Thanks,
> >> Song
> >>
> >>
> >> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
> >> index f92d6ad5e080..c525149028a7 100644
> >> --- i/kernel/trace/bpf_trace.c
> >> +++ w/kernel/trace/bpf_trace.c
> >> @@ -696,11 +696,13 @@ static const struct bpf_func_proto 
> >> bpf_perf_event_output_proto_tp = {
> >> .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
> >>  };
> >>
> >> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
> >>  BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
> >>u64, flags)
> >>  {
> >> -   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
> >> +   struct pt_regs *regs = this_cpu_ptr(_stackid_tp_regs);
> >>
> >> +   perf_fetch_caller_regs(regs);
> >
> > No. pt_regs is already passed in. It's the first argument.
> > If we call perf_fetch_caller_regs() again the stack trace will be wrong.
> > bpf prog should not see itself, interpreter or all the frames in between.
>
> Thanks Alexei! I get it now.
>
> In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field
> of tp_buff:
>
> struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>
> tp_buff points to something like
>
> struct sched_switch_args {
> unsigned long long pad;
> char prev_comm[16];
> int prev_pid;
> int prev_prio;
> long long prev_state;
> char next_comm[16];
> int next_pid;
> int next_prio;
> };
>
> where the first field "pad" is a pointer to pt_regs.
>
> @Kairui, I think you confirmed that current code will give empty call trace
> with ORC unwinder? If that's the case, can we add regs->ip back? (as in the
> first email of this thread.
>
> Thanks,
> Song
>

Hi thanks for the suggestion, yes we can add it should be good an idea
to always have IP when stack trace is not available.
But stack trace is actually still broken, it will always give only one
level of stacktrace (the IP).

-- 
Best Regards,
Kairui Song


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-19 Thread Kairui Song
On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra  wrote:
>
> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> > Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> > some other bfp functions) is now broken, or, strating an unwind
> > directly inside a bpf program will end up strangely. It have following
> > kernel message:
>
> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> follow.

bpf_get_stackid_tp will just use the regs passed to it from the trace
point. And then it will eventually call perf_get_callchain to get the
call chain.
With a tracepoint we have the fake regs, so unwinder will start from
where it is called, and use the fake regs as the indicator of the
target frame it want, and keep unwinding until reached the actually
callsite.

But if the stack trace is started withing a bpf func call then it's broken...

If the unwinder could trace back through the bpf func call then there
will be no such problem.

For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
before bpf call, and ensure bp is also dumped could fix it (so it will
start using the regs for bpf calls, like before the commit
d15d356887e7). But for ORC I don't see a clear way to fix the problem.
First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
DX, R10, R13, else?) in the trace point handler, then use the flag to
indicate ORC to do one more unwind (because can't get caller's regs,
so get callee's regs instaed) before actually giving output?

I had a try, for framepointer unwinder, mark the indicator flag before
calling bpf functions, and dump bp as well in the trace point. Then
with frame pointer, it works, test passed:

diff --git a/arch/x86/include/asm/perf_event.h
b/arch/x86/include/asm/perf_event.h
index 1392d5e6e8d6..6f1192e9776b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
pt_regs *regs);

 #include 

+#ifdef CONFIG_FRAME_POINTER
+static inline unsigned long caller_frame_pointer(void)
+{
+   return (unsigned long)__builtin_frame_address(1);
+}
+#else
+static inline unsigned long caller_frame_pointer(void)
+{
+   return 0;
+}
+#endif
+
 /*
  * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
  * and the comment with PERF_EFLAGS_EXACT.
  */
 #define perf_arch_fetch_caller_regs(regs, __ip){   \
(regs)->ip = (__ip);\
+   (regs)->bp = caller_frame_pointer();\
(regs)->sp = (unsigned long)__builtin_frame_address(0); \
(regs)->cs = __KERNEL_CS;   \
regs->flags = 0;\
diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..ca7b95ee74f0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
int size, int rctx,
   struct task_struct *task)
 {
if (bpf_prog_array_valid(call)) {
+   regs->flags |= X86_EFLAGS_FIXED;
*(struct pt_regs **)raw_data = regs;
if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
perf_swevent_put_recursion_context(rctx);
@@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event *event,
int ret = 0;

ctx.regs = perf_arch_bpf_user_pt_regs(regs);
+   ctx.regs->flags |= X86_EFLAGS_FIXED;
+
preempt_disable();
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
goto out;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f92d6ad5e080..e1fa656677dc 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
flags, void *meta, u64 meta_size,
};

perf_fetch_caller_regs(regs);
+   regs->flags |= X86_EFLAGS_FIXED;
+
perf_sample_data_init(sd, 0, 0);
sd->raw = 

@@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
bpf_raw_tracepoint_args *, args,
struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);

perf_fetch_caller_regs(regs);
+   regs->flags |= X86_EFLAGS_FIXED;
+
return bpf_perf_event_output(regs, map, flags, data, size);
 }

@@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
bpf_raw_tracepoint_args *, args,
struct pt_regs *regs = this_cpu_ptr(_raw_tp_regs);

perf_fetch_caller_regs(regs);
+   regs->flags |= X86_EFLAGS_FIXED;
+
/* similar to bpf_perf_event_output_tp, but pt_regs fetched
differently */
return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
   flags, 0, 0);
@@ -871,6 +877,8 @@ BPF_CALL_4(bpf_get_stack_raw_tp, struct
bpf_raw_tracepoint_args *, args,
struct pt_regs *regs = 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Song Liu



> On May 17, 2019, at 2:06 PM, Alexei Starovoitov  wrote:
> 
> On 5/17/19 11:40 AM, Song Liu wrote:
>> +Alexei, Daniel, and bpf
>> 
>>> On May 17, 2019, at 2:10 AM, Peter Zijlstra  wrote:
>>> 
>>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
 Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
 some other bfp functions) is now broken, or, strating an unwind
 directly inside a bpf program will end up strangely. It have following
 kernel message:
>>> 
>>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>>> follow.
>> 
>> I guess we need something like the following? (we should be able to
>> optimize the PER_CPU stuff).
>> 
>> Thanks,
>> Song
>> 
>> 
>> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
>> index f92d6ad5e080..c525149028a7 100644
>> --- i/kernel/trace/bpf_trace.c
>> +++ w/kernel/trace/bpf_trace.c
>> @@ -696,11 +696,13 @@ static const struct bpf_func_proto 
>> bpf_perf_event_output_proto_tp = {
>> .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
>>  };
>> 
>> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
>>  BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
>>u64, flags)
>>  {
>> -   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>> +   struct pt_regs *regs = this_cpu_ptr(_stackid_tp_regs);
>> 
>> +   perf_fetch_caller_regs(regs);
> 
> No. pt_regs is already passed in. It's the first argument.
> If we call perf_fetch_caller_regs() again the stack trace will be wrong.
> bpf prog should not see itself, interpreter or all the frames in between.

Thanks Alexei! I get it now. 

In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field
of tp_buff:

struct pt_regs *regs = *(struct pt_regs **)tp_buff;

tp_buff points to something like

struct sched_switch_args {
unsigned long long pad;
char prev_comm[16];
int prev_pid;
int prev_prio;
long long prev_state;
char next_comm[16];
int next_pid;
int next_prio;
};

where the first field "pad" is a pointer to pt_regs. 

@Kairui, I think you confirmed that current code will give empty call trace 
with ORC unwinder? If that's the case, can we add regs->ip back? (as in the 
first email of this thread. 

Thanks,
Song









Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Alexei Starovoitov
On 5/17/19 11:40 AM, Song Liu wrote:
> +Alexei, Daniel, and bpf
> 
>> On May 17, 2019, at 2:10 AM, Peter Zijlstra  wrote:
>>
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>>
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
> 
> I guess we need something like the following? (we should be able to
> optimize the PER_CPU stuff).
> 
> Thanks,
> Song
> 
> 
> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
> index f92d6ad5e080..c525149028a7 100644
> --- i/kernel/trace/bpf_trace.c
> +++ w/kernel/trace/bpf_trace.c
> @@ -696,11 +696,13 @@ static const struct bpf_func_proto 
> bpf_perf_event_output_proto_tp = {
>  .arg5_type  = ARG_CONST_SIZE_OR_ZERO,
>   };
> 
> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
>   BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
> u64, flags)
>   {
> -   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
> +   struct pt_regs *regs = this_cpu_ptr(_stackid_tp_regs);
> 
> +   perf_fetch_caller_regs(regs);

No. pt_regs is already passed in. It's the first argument.
If we call perf_fetch_caller_regs() again the stack trace will be wrong.
bpf prog should not see itself, interpreter or all the frames in between.


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Song Liu
+Alexei, Daniel, and bpf

> On May 17, 2019, at 2:10 AM, Peter Zijlstra  wrote:
> 
> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>> some other bfp functions) is now broken, or, strating an unwind
>> directly inside a bpf program will end up strangely. It have following
>> kernel message:
> 
> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> follow.

I guess we need something like the following? (we should be able to 
optimize the PER_CPU stuff). 

Thanks,
Song


diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
index f92d6ad5e080..c525149028a7 100644
--- i/kernel/trace/bpf_trace.c
+++ w/kernel/trace/bpf_trace.c
@@ -696,11 +696,13 @@ static const struct bpf_func_proto 
bpf_perf_event_output_proto_tp = {
.arg5_type  = ARG_CONST_SIZE_OR_ZERO,
 };

+static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
 BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
   u64, flags)
 {
-   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
+   struct pt_regs *regs = this_cpu_ptr(_stackid_tp_regs);

+   perf_fetch_caller_regs(regs);
/*
 * Same comment as in bpf_perf_event_output_tp(), only that this time
 * the other helper's function body cannot be inlined due to being
@@ -719,10 +721,13 @@ static const struct bpf_func_proto 
bpf_get_stackid_proto_tp = {
.arg3_type  = ARG_ANYTHING,
 };

+static DEFINE_PER_CPU(struct pt_regs, bpf_stack_tp_regs);
 BPF_CALL_4(bpf_get_stack_tp, void *, tp_buff, void *, buf, u32, size,
   u64, flags)
 {
-   struct pt_regs *regs = *(struct pt_regs **)tp_buff;
+   struct pt_regs *regs = this_cpu_ptr(_stack_tp_regs);
+
+   perf_fetch_caller_regs(regs);

return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
 (unsigned long) size, flags, 0);

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Song Liu



> On May 17, 2019, at 1:10 AM, Peter Zijlstra  wrote:
> 
> On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
>> On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
>>> Hi, 
>>> 
>>> We found a failure with selftests/bpf/tests_prog in test_stacktrace_map (on 
>>> bpf/master
>>> branch). 
>>> 
>>> After digging into the code, we found that perf_callchain_kernel() is 
>>> giving empty
>>> callchain for tracepoint sched/sched_switch. And it seems related to commit
>>> 
>>> d15d356887e770c5f2dcf963b52c7cb510c9e42d
>>> ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
>>> 
>>> Before this commit, perf_callchain_kernel() returns callchain with 
>>> regs->ip. With
>>> this commit, regs->ip is not sent for !perf_hw_regs(regs) case. 
>> 
>> So while I think the below is indeed right; we should store regs->ip
>> regardless of the unwind path chosen, I still think there's something
>> fishy if this results in just the 1 entry.
>> 
>> The sched/sched_switch event really should have a non-trivial stack.
>> 
>> Let me see if I can reproduce with just perf.
> 
> $ perf record -g -e "sched:sched_switch" -- make clean
> $ perf report -D
> 
> 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
> 0x81c29562 period: 1 addr: 0
> ... FP chain: nr:10
> .  0: ff80
> .  1: 81c29562
> .  2: 81c29933
> .  3: 8111f688
> .  4: 81120b9d
> .  5: 81120ce5
> .  6: 8100254a
> .  7: 81e0007d
> .  8: fe00
> .  9: 7f9b6cd9682a
> ... thread: sh:7236
> .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
> 

Hmm... I also get this FP chain even with CONFIG_UNWINDER_ORC=y. 
I guess it is actually from ORC? 

Thanks,
Song 



Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Song Liu



> On May 17, 2019, at 1:32 AM, Kairui Song  wrote:
> 
> On Fri, May 17, 2019 at 4:15 PM Kairui Song  wrote:
>> 
>> On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra  wrote:
>>> 
>>> On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
 On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> Hi,
> 
> We found a failure with selftests/bpf/tests_prog in test_stacktrace_map 
> (on bpf/master
> branch).
> 
> After digging into the code, we found that perf_callchain_kernel() is 
> giving empty
> callchain for tracepoint sched/sched_switch. And it seems related to 
> commit
> 
> d15d356887e770c5f2dcf963b52c7cb510c9e42d
> ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> 
> Before this commit, perf_callchain_kernel() returns callchain with 
> regs->ip. With
> this commit, regs->ip is not sent for !perf_hw_regs(regs) case.
 
 So while I think the below is indeed right; we should store regs->ip
 regardless of the unwind path chosen, I still think there's something
 fishy if this results in just the 1 entry.
 
 The sched/sched_switch event really should have a non-trivial stack.
 
 Let me see if I can reproduce with just perf.
>>> 
>>> $ perf record -g -e "sched:sched_switch" -- make clean
>>> $ perf report -D
>>> 
>>> 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
>>> 0x81c29562 period: 1 addr: 0
>>> ... FP chain: nr:10
>>> .  0: ff80
>>> .  1: 81c29562
>>> .  2: 81c29933
>>> .  3: 8111f688
>>> .  4: 81120b9d
>>> .  5: 81120ce5
>>> .  6: 8100254a
>>> .  7: 81e0007d
>>> .  8: fe00
>>> .  9: 7f9b6cd9682a
>>> ... thread: sh:7236
>>> .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
>>> 
>>> 
>>> IOW, it seems to 'work'.
>>> 
>> 
>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>> some other bfp functions) is now broken, or, strating an unwind
>> directly inside a bpf program will end up strangely. It have following
>> kernel message:
>> 
>> WARNING: kernel stack frame pointer at 70cad07c in
>> test_progs:1242 has bad value ffd4497e
>> 
>> And in the debug message:
>> 
>> [  160.460287] 6e117175: aa23a0e8
>> (get_perf_callchain+0x148/0x280)
>> [  160.460287] 02e8715f: 00c6bba0 (0xc6bba0)
>> [  160.460288] b3d07758: 9ce3f979 (0x9ce3f979)
>> [  160.460289] 55c97836: 9ce3f979 (0x9ce3f979)
>> [  160.460289] 7cbb140a: 0001007f (0x1007f)
>> [  160.460290] 7dc62ac9:  ...
>> [  160.460290] 6b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
>> [  160.460291] f23d1826: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460292] f9a16017: 007f (0x7f)
>> [  160.460292] a8e62d44:  ...
>> [  160.460293] cbc83c97: b89d00d8d000 (0xb89d00d8d000)
>> [  160.460293] 92842522: 007f (0x7f)
>> [  160.460294] dafbec89: b89d00c6bc50 (0xb89d00c6bc50)
>> [  160.460296] 0777e4cf: aa225d97 
>> (bpf_get_stackid+0x77/0x470)
>> [  160.460296] 9942ea16:  ...
>> [  160.460297] a08006b1: 0001 (0x1)
>> [  160.460298] 9f03b438: b89d00c6bc30 (0xb89d00c6bc30)
>> [  160.460299] 6caf8b32: aa074fe8 (__do_page_fault+0x58/0x90)
>> [  160.460300] 3a13d702:  ...
>> [  160.460300] e2e2496d: 9ce3 (0x9ce3)
>> [  160.460301] 8ee6b7c2: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460301] a8cf6d55:  ...
>> [  160.460302] 59078076: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460303] c6aac739: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
>> [  160.460303] a39aff92: b89d00c6bc60 (0xb89d00c6bc60)
>> [  160.460305] 97498bf2: aa1f4791 
>> (bpf_get_stackid_tp+0x11/0x20)
>> [  160.460306] 6992de1e: b89d00c6bc78 (0xb89d00c6bc78)
>> [  160.460307] dacd0ce5: c0405676 (0xc0405676)
>> [  160.460307] a81f2714:  ...
>> 
>> # Note here is the invalid frame pointer
>> [  160.460308] 70cad07c: b89d00a1d000 (0xb89d00a1d000)
>> [  160.460308] f8f194e4: 0001 (0x1)
>> [  160.460309] 2134f42a: d89cffc3ae80 (0xd89cffc3ae80)
>> [  160.460310] f9345889: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
>> [  160.460310] 8ad22a42:  ...
>> [  160.460311] 73808173: b89d00c6bce0 (0xb89d00c6bce0)
>> [  160.460312] c9e4: aa1f48d1 (trace_call_bpf+0x81/0x100)
>> [  160.460313] c5d8ebd1: b89d00c6bcc0 (0xb89d00c6bcc0)
>> [  

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Peter Zijlstra
On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> some other bfp functions) is now broken, or, strating an unwind
> directly inside a bpf program will end up strangely. It have following
> kernel message:

Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
follow.


Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Kairui Song
On Fri, May 17, 2019 at 4:15 PM Kairui Song  wrote:
>
> On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra  wrote:
> >
> > On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> > > > Hi,
> > > >
> > > > We found a failure with selftests/bpf/tests_prog in test_stacktrace_map 
> > > > (on bpf/master
> > > > branch).
> > > >
> > > > After digging into the code, we found that perf_callchain_kernel() is 
> > > > giving empty
> > > > callchain for tracepoint sched/sched_switch. And it seems related to 
> > > > commit
> > > >
> > > > d15d356887e770c5f2dcf963b52c7cb510c9e42d
> > > > ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> > > >
> > > > Before this commit, perf_callchain_kernel() returns callchain with 
> > > > regs->ip. With
> > > > this commit, regs->ip is not sent for !perf_hw_regs(regs) case.
> > >
> > > So while I think the below is indeed right; we should store regs->ip
> > > regardless of the unwind path chosen, I still think there's something
> > > fishy if this results in just the 1 entry.
> > >
> > > The sched/sched_switch event really should have a non-trivial stack.
> > >
> > > Let me see if I can reproduce with just perf.
> >
> > $ perf record -g -e "sched:sched_switch" -- make clean
> > $ perf report -D
> >
> > 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
> > 0x81c29562 period: 1 addr: 0
> > ... FP chain: nr:10
> > .  0: ff80
> > .  1: 81c29562
> > .  2: 81c29933
> > .  3: 8111f688
> > .  4: 81120b9d
> > .  5: 81120ce5
> > .  6: 8100254a
> > .  7: 81e0007d
> > .  8: fe00
> > .  9: 7f9b6cd9682a
> > ... thread: sh:7236
> > .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
> >
> >
> > IOW, it seems to 'work'.
> >
>
> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> some other bfp functions) is now broken, or, strating an unwind
> directly inside a bpf program will end up strangely. It have following
> kernel message:
>
> WARNING: kernel stack frame pointer at 70cad07c in
> test_progs:1242 has bad value ffd4497e
>
> And in the debug message:
>
> [  160.460287] 6e117175: aa23a0e8
> (get_perf_callchain+0x148/0x280)
> [  160.460287] 02e8715f: 00c6bba0 (0xc6bba0)
> [  160.460288] b3d07758: 9ce3f979 (0x9ce3f979)
> [  160.460289] 55c97836: 9ce3f979 (0x9ce3f979)
> [  160.460289] 7cbb140a: 0001007f (0x1007f)
> [  160.460290] 7dc62ac9:  ...
> [  160.460290] 6b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
> [  160.460291] f23d1826: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460292] f9a16017: 007f (0x7f)
> [  160.460292] a8e62d44:  ...
> [  160.460293] cbc83c97: b89d00d8d000 (0xb89d00d8d000)
> [  160.460293] 92842522: 007f (0x7f)
> [  160.460294] dafbec89: b89d00c6bc50 (0xb89d00c6bc50)
> [  160.460296] 0777e4cf: aa225d97 (bpf_get_stackid+0x77/0x470)
> [  160.460296] 9942ea16:  ...
> [  160.460297] a08006b1: 0001 (0x1)
> [  160.460298] 9f03b438: b89d00c6bc30 (0xb89d00c6bc30)
> [  160.460299] 6caf8b32: aa074fe8 (__do_page_fault+0x58/0x90)
> [  160.460300] 3a13d702:  ...
> [  160.460300] e2e2496d: 9ce3 (0x9ce3)
> [  160.460301] 8ee6b7c2: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460301] a8cf6d55:  ...
> [  160.460302] 59078076: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460303] c6aac739: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
> [  160.460303] a39aff92: b89d00c6bc60 (0xb89d00c6bc60)
> [  160.460305] 97498bf2: aa1f4791 
> (bpf_get_stackid_tp+0x11/0x20)
> [  160.460306] 6992de1e: b89d00c6bc78 (0xb89d00c6bc78)
> [  160.460307] dacd0ce5: c0405676 (0xc0405676)
> [  160.460307] a81f2714:  ...
>
> # Note here is the invalid frame pointer
> [  160.460308] 70cad07c: b89d00a1d000 (0xb89d00a1d000)
> [  160.460308] f8f194e4: 0001 (0x1)
> [  160.460309] 2134f42a: d89cffc3ae80 (0xd89cffc3ae80)
> [  160.460310] f9345889: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
> [  160.460310] 8ad22a42:  ...
> [  160.460311] 73808173: b89d00c6bce0 (0xb89d00c6bce0)
> [  160.460312] c9e4: aa1f48d1 (trace_call_bpf+0x81/0x100)
> [  160.460313] c5d8ebd1: b89d00c6bcc0 (0xb89d00c6bcc0)
> [  160.460315] bce0b072: ab651be0
> (event_sched_migrate_task+0xa0/0xa0)
> [  

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Kairui Song
On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra  wrote:
>
> On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
> > On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> > > Hi,
> > >
> > > We found a failure with selftests/bpf/tests_prog in test_stacktrace_map 
> > > (on bpf/master
> > > branch).
> > >
> > > After digging into the code, we found that perf_callchain_kernel() is 
> > > giving empty
> > > callchain for tracepoint sched/sched_switch. And it seems related to 
> > > commit
> > >
> > > d15d356887e770c5f2dcf963b52c7cb510c9e42d
> > > ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> > >
> > > Before this commit, perf_callchain_kernel() returns callchain with 
> > > regs->ip. With
> > > this commit, regs->ip is not sent for !perf_hw_regs(regs) case.
> >
> > So while I think the below is indeed right; we should store regs->ip
> > regardless of the unwind path chosen, I still think there's something
> > fishy if this results in just the 1 entry.
> >
> > The sched/sched_switch event really should have a non-trivial stack.
> >
> > Let me see if I can reproduce with just perf.
>
> $ perf record -g -e "sched:sched_switch" -- make clean
> $ perf report -D
>
> 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
> 0x81c29562 period: 1 addr: 0
> ... FP chain: nr:10
> .  0: ff80
> .  1: 81c29562
> .  2: 81c29933
> .  3: 8111f688
> .  4: 81120b9d
> .  5: 81120ce5
> .  6: 8100254a
> .  7: 81e0007d
> .  8: fe00
> .  9: 7f9b6cd9682a
> ... thread: sh:7236
> .. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
>
>
> IOW, it seems to 'work'.
>

Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
some other bfp functions) is now broken, or, strating an unwind
directly inside a bpf program will end up strangely. It have following
kernel message:

WARNING: kernel stack frame pointer at 70cad07c in
test_progs:1242 has bad value ffd4497e

And in the debug message:

[  160.460287] 6e117175: aa23a0e8
(get_perf_callchain+0x148/0x280)
[  160.460287] 02e8715f: 00c6bba0 (0xc6bba0)
[  160.460288] b3d07758: 9ce3f979 (0x9ce3f979)
[  160.460289] 55c97836: 9ce3f979 (0x9ce3f979)
[  160.460289] 7cbb140a: 0001007f (0x1007f)
[  160.460290] 7dc62ac9:  ...
[  160.460290] 6b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
[  160.460291] f23d1826: d89cffc3ae80 (0xd89cffc3ae80)
[  160.460292] f9a16017: 007f (0x7f)
[  160.460292] a8e62d44:  ...
[  160.460293] cbc83c97: b89d00d8d000 (0xb89d00d8d000)
[  160.460293] 92842522: 007f (0x7f)
[  160.460294] dafbec89: b89d00c6bc50 (0xb89d00c6bc50)
[  160.460296] 0777e4cf: aa225d97 (bpf_get_stackid+0x77/0x470)
[  160.460296] 9942ea16:  ...
[  160.460297] a08006b1: 0001 (0x1)
[  160.460298] 9f03b438: b89d00c6bc30 (0xb89d00c6bc30)
[  160.460299] 6caf8b32: aa074fe8 (__do_page_fault+0x58/0x90)
[  160.460300] 3a13d702:  ...
[  160.460300] e2e2496d: 9ce3 (0x9ce3)
[  160.460301] 8ee6b7c2: d89cffc3ae80 (0xd89cffc3ae80)
[  160.460301] a8cf6d55:  ...
[  160.460302] 59078076: d89cffc3ae80 (0xd89cffc3ae80)
[  160.460303] c6aac739: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
[  160.460303] a39aff92: b89d00c6bc60 (0xb89d00c6bc60)
[  160.460305] 97498bf2: aa1f4791 (bpf_get_stackid_tp+0x11/0x20)
[  160.460306] 6992de1e: b89d00c6bc78 (0xb89d00c6bc78)
[  160.460307] dacd0ce5: c0405676 (0xc0405676)
[  160.460307] a81f2714:  ...

# Note here is the invalid frame pointer
[  160.460308] 70cad07c: b89d00a1d000 (0xb89d00a1d000)
[  160.460308] f8f194e4: 0001 (0x1)
[  160.460309] 2134f42a: d89cffc3ae80 (0xd89cffc3ae80)
[  160.460310] f9345889: 9ce3f1e18eb0 (0x9ce3f1e18eb0)
[  160.460310] 8ad22a42:  ...
[  160.460311] 73808173: b89d00c6bce0 (0xb89d00c6bce0)
[  160.460312] c9e4: aa1f48d1 (trace_call_bpf+0x81/0x100)
[  160.460313] c5d8ebd1: b89d00c6bcc0 (0xb89d00c6bcc0)
[  160.460315] bce0b072: ab651be0
(event_sched_migrate_task+0xa0/0xa0)
[  160.460316] 355cf319:  ...
[  160.460316] 3b67f2ad: d89cffc3ae80 (0xd89cffc3ae80)
[  160.460316] 9a77e20b: 9ce3fba25000 (0x9ce3fba25000)
[  160.460317] 32cf7376: 0001 (0x1)
[  160.460317] 

Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Peter Zijlstra
On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> > Hi, 
> > 
> > We found a failure with selftests/bpf/tests_prog in test_stacktrace_map (on 
> > bpf/master
> > branch). 
> > 
> > After digging into the code, we found that perf_callchain_kernel() is 
> > giving empty
> > callchain for tracepoint sched/sched_switch. And it seems related to commit
> > 
> > d15d356887e770c5f2dcf963b52c7cb510c9e42d
> > ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> > 
> > Before this commit, perf_callchain_kernel() returns callchain with 
> > regs->ip. With
> > this commit, regs->ip is not sent for !perf_hw_regs(regs) case. 
> 
> So while I think the below is indeed right; we should store regs->ip
> regardless of the unwind path chosen, I still think there's something
> fishy if this results in just the 1 entry.
> 
> The sched/sched_switch event really should have a non-trivial stack.
> 
> Let me see if I can reproduce with just perf.

$ perf record -g -e "sched:sched_switch" -- make clean
$ perf report -D

12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 
0x81c29562 period: 1 addr: 0
... FP chain: nr:10
.  0: ff80
.  1: 81c29562
.  2: 81c29933
.  3: 8111f688
.  4: 81120b9d
.  5: 81120ce5
.  6: 8100254a
.  7: 81e0007d
.  8: fe00
.  9: 7f9b6cd9682a
... thread: sh:7236
.. dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux


IOW, it seems to 'work'.



Re: Getting empty callchain from perf_callchain_kernel()

2019-05-17 Thread Peter Zijlstra
On Thu, May 16, 2019 at 11:51:55PM +, Song Liu wrote:
> Hi, 
> 
> We found a failure with selftests/bpf/tests_prog in test_stacktrace_map (on 
> bpf/master
> branch). 
> 
> After digging into the code, we found that perf_callchain_kernel() is giving 
> empty
> callchain for tracepoint sched/sched_switch. And it seems related to commit
> 
> d15d356887e770c5f2dcf963b52c7cb510c9e42d
> ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
> 
> Before this commit, perf_callchain_kernel() returns callchain with regs->ip. 
> With
> this commit, regs->ip is not sent for !perf_hw_regs(regs) case. 

So while I think the below is indeed right; we should store regs->ip
regardless of the unwind path chosen, I still think there's something
fishy if this results in just the 1 entry.

The sched/sched_switch event really should have a non-trivial stack.

Let me see if I can reproduce with just perf.


Getting empty callchain from perf_callchain_kernel()

2019-05-16 Thread Song Liu
Hi, 

We found a failure with selftests/bpf/tests_prog in test_stacktrace_map (on 
bpf/master
branch). 

After digging into the code, we found that perf_callchain_kernel() is giving 
empty
callchain for tracepoint sched/sched_switch. And it seems related to commit

d15d356887e770c5f2dcf963b52c7cb510c9e42d
("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")

Before this commit, perf_callchain_kernel() returns callchain with regs->ip. 
With
this commit, regs->ip is not sent for !perf_hw_regs(regs) case. 

I found the following change fixes the selftest. But I am not very sure, it is 
the best solution here. 

Please share comments and suggestions on this. 

Thanks in advance!

Song


diff --git i/arch/x86/events/core.c w/arch/x86/events/core.c
index f315425d8468..7b8a9eb4d5fd 100644
--- i/arch/x86/events/core.c
+++ w/arch/x86/events/core.c
@@ -2402,9 +2402,9 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
return;
}

+   if (perf_callchain_store(entry, regs->ip))
+   return;
if (perf_hw_regs(regs)) {
-   if (perf_callchain_store(entry, regs->ip))
-   return;
unwind_start(, current, regs, NULL);
} else {
unwind_start(, current, NULL, (void *)regs->sp);