Re: [PATCH net-next] bpf: arm64: remove callee-save registers use for tmp registers

2016-05-16 Thread Shi, Yang

On 5/16/2016 4:45 PM, Z Lim wrote:

Hi Yang,

On Mon, May 16, 2016 at 4:09 PM, Yang Shi  wrote:

In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
tmp registers, which are callee-saved registers. This leads to variable size
of JIT prologue and epilogue. The latest blinding constant change prefers to
constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
registers which not need to be saved/restored during function call. So, replace
R23 and R24 to R10 and R11, and remove tmp_used flag.

CC: Zi Shen Lim 
CC: Daniel Borkmann 
Signed-off-by: Yang Shi 
---


Couple suggestions, but otherwise:
Acked-by: Zi Shen Lim 

1. Update the diagram. I think it should now be:

-* BPF fp register => -80:+-+ <= (BPF_FP)
+* BPF fp register => -64:+-+ <= (BPF_FP)


Nice catch. I forgot the stack diagram.



2. Add a comment in commit log along the lines of: this is an
optimization saving 2 instructions per jited BPF program.


Sure, will address in V2.

Thanks,
Yang



Thanks :)

z


Apply on top of Daniel's blinding constant patchset.

 arch/arm64/net/bpf_jit_comp.c | 32 
 1 file changed, 4 insertions(+), 28 deletions(-)





Re: [PATCH] arm64: bpf: jit JMP_JSET_{X,K}

2016-05-13 Thread Shi, Yang

On 5/12/2016 11:37 PM, Zi Shen Lim wrote:

Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.

As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.

With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.

...
[2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
[2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
[2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
...
[3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 
PASS
[3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0x) return 1 
jited:1 98 PASS
[3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 
PASS
[3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0x) return 1 
jited:1 89 PASS
...

Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim 


Acked-by: Yang Shi 

Thanks,
Yang


---
 arch/arm64/net/bpf_jit_comp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 031ed08..d0d5190 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -478,6 +478,7 @@ emit_cond_jmp:
case BPF_JGE:
jmp_cond = A64_COND_CS;
break;
+   case BPF_JSET:
case BPF_JNE:
jmp_cond = A64_COND_NE;
break;





Re: Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420

2016-04-25 Thread Shi, Yang

On 4/22/2016 9:50 PM, Eric Dumazet wrote:

On Fri, 2016-04-22 at 21:02 -0700, Shi, Yang wrote:

Hi David,

When I ran some test on a nfs mounted rootfs, I got the below warning
with LOCKDEP enabled on linux-next-20160420:

WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408
udp_queue_rcv_skb+0x3d0/0x660
Modules linked in:
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G  D
4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS
S5500.86B.01.10.0025.030220091519 03/02/2009
    88066fd03a70 8155855f 
    88066fd03ab0 81062803 058061318ec8
   88065d1e39c0 880661318e40  880661318ec8
Call Trace:
 [] dump_stack+0x67/0x98
Checking out fil [] __warn+0xd3/0xf0
   [] warn_slowpath_null+0x1d/0x20
   [] udp_queue_rcv_skb+0x3d0/0x660
   [] __udp4_lib_rcv+0x4dc/0xc00
   [] udp_rcv+0x1a/0x20
   [] ip_local_deliver_finish+0xd1/0x2e0
es:  57% (30585/ [] ? ip_local_deliver_finish+0x3f/0x2e0
   [] ip_local_deliver+0xc2/0xd0
   [] ip_rcv_finish+0x1e2/0x5a0
   [] ip_rcv+0x2dc/0x410
   [] ? __pskb_pull_tail+0x82/0x400
   [] __netif_receive_skb_core+0x3a8/0xa80
   [] ? netif_receive_skb_internal+0x1b/0xf0
   [] __netif_receive_skb+0x1d/0x60
   [] netif_receive_skb_internal+0x55/0xf0
   [] ? netif_receive_skb_internal+0x1b/0xf0
   [] napi_gro_receive+0xc2/0x180
   [] igb_poll+0x5ea/0xdf0
   [] net_rx_action+0x15c/0x3d0
   [] __do_softirq+0x161/0x413
   [] irq_exit+0xd1/0x110
   [] do_IRQ+0x62/0xf0
   [] common_interrupt+0x8e/0x8e
 [] ? cpuidle_enter_state+0xc6/0x290
   [] cpuidle_enter+0x17/0x20
   [] call_cpuidle+0x33/0x50
   [] cpu_startup_entry+0x229/0x3b0
   [] start_secondary+0x144/0x150
---[ end trace ba508c424f0d52bf ]---


The warning is triggered by commit
fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks
for sock_owned_by_user"), which checks if slock is held before locking
"owned".

It looks good to lock_sock which is just called lock_sock_nested. But,
bh_lock_sock is different, which just calls spin_lock so it doesn't
touch dep_map then the check will fail even though it is locked.


?? spin_lock() definitely is lockdep friendly.


Yes, this is what I thought too. But, I didn't figure out why the 
warning was still reported even though spin_lock is called.






So, I'm wondering what a right fix for it should be:

1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols
implementation, but there are a lot places calling it.

2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock.

Or the both approach is wrong or not ideal?


I sent a patch yesterday, I am not sure what the status is.


Thanks for the patch. I just found your original patch and the 
discussion with Valdis. I think I ran into the same problem. There is 
kernel BUG is triggered before the warning, but "lockdep is off" 
information is not printed out, although is it really off.


Just tried your patch, it works for me.

Thanks,
Yang



diff --git a/include/net/sock.h b/include/net/sock.h
index d997ec13a643..db8301c76d50 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock 
*csk)
  {
struct sock *sk = (struct sock *)csk;

-   return lockdep_is_held(&sk->sk_lock) ||
+   return !debug_locks ||
+  lockdep_is_held(&sk->sk_lock) ||
   lockdep_is_held(&sk->sk_lock.slock);
  }
  #endif









Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420

2016-04-22 Thread Shi, Yang

Hi David,

When I ran some test on a nfs mounted rootfs, I got the below warning 
with LOCKDEP enabled on linux-next-20160420:


WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408 
udp_queue_rcv_skb+0x3d0/0x660

Modules linked in:
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G  D 
4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.10.0025.030220091519 03/02/2009

  88066fd03a70 8155855f 
  88066fd03ab0 81062803 058061318ec8
 88065d1e39c0 880661318e40  880661318ec8
Call Trace:
   [] dump_stack+0x67/0x98
Checking out fil [] __warn+0xd3/0xf0
 [] warn_slowpath_null+0x1d/0x20
 [] udp_queue_rcv_skb+0x3d0/0x660
 [] __udp4_lib_rcv+0x4dc/0xc00
 [] udp_rcv+0x1a/0x20
 [] ip_local_deliver_finish+0xd1/0x2e0
es:  57% (30585/ [] ? ip_local_deliver_finish+0x3f/0x2e0
 [] ip_local_deliver+0xc2/0xd0
 [] ip_rcv_finish+0x1e2/0x5a0
 [] ip_rcv+0x2dc/0x410
 [] ? __pskb_pull_tail+0x82/0x400
 [] __netif_receive_skb_core+0x3a8/0xa80
 [] ? netif_receive_skb_internal+0x1b/0xf0
 [] __netif_receive_skb+0x1d/0x60
 [] netif_receive_skb_internal+0x55/0xf0
 [] ? netif_receive_skb_internal+0x1b/0xf0
 [] napi_gro_receive+0xc2/0x180
 [] igb_poll+0x5ea/0xdf0
 [] net_rx_action+0x15c/0x3d0
 [] __do_softirq+0x161/0x413
 [] irq_exit+0xd1/0x110
 [] do_IRQ+0x62/0xf0
 [] common_interrupt+0x8e/0x8e
   [] ? cpuidle_enter_state+0xc6/0x290
 [] cpuidle_enter+0x17/0x20
 [] call_cpuidle+0x33/0x50
 [] cpu_startup_entry+0x229/0x3b0
 [] start_secondary+0x144/0x150
---[ end trace ba508c424f0d52bf ]---


The warning is triggered by commit 
fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks 
for sock_owned_by_user"), which checks if slock is held before locking 
"owned".


It looks good to lock_sock which is just called lock_sock_nested. But, 
bh_lock_sock is different, which just calls spin_lock so it doesn't 
touch dep_map then the check will fail even though it is locked.


So, I'm wondering what a right fix for it should be:

1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols 
implementation, but there are a lot places calling it.


2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock.

Or the both approach is wrong or not ideal?

Thanks,
Yang


Re: [PATCH net-next 2/5] bpf: move clearing of A/X into classic to eBPF migration prologue

2015-12-17 Thread Shi, Yang

On 12/17/2015 2:51 PM, Daniel Borkmann wrote:

Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88da
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Reviewed-by: Michael Holzheu 
Tested-by: Michael Holzheu 
Cc: Zi Shen Lim 
Cc: Yang Shi 


Acked by me on the arm64 part.

Acked-by: Yang Shi 

Thanks,
Yang


---
  arch/arm64/net/bpf_jit_comp.c |  6 --
  arch/s390/net/bpf_jit_comp.c  | 13 ++---
  arch/x86/net/bpf_jit_comp.c   | 14 +-
  kernel/bpf/core.c |  4 
  net/core/filter.c | 19 ---
  5 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index b162ad7..7658612 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -152,8 +152,6 @@ static void build_prologue(struct jit_ctx *ctx)
const u8 r8 = bpf2a64[BPF_REG_8];
const u8 r9 = bpf2a64[BPF_REG_9];
const u8 fp = bpf2a64[BPF_REG_FP];
-   const u8 ra = bpf2a64[BPF_REG_A];
-   const u8 rx = bpf2a64[BPF_REG_X];
const u8 tmp1 = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2];

@@ -200,10 +198,6 @@ static void build_prologue(struct jit_ctx *ctx)

/* Set up function call stack */
emit(A64_SUB_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);
-
-   /* Clear registers A and X */
-   emit_a64_mov_i64(ra, 0, ctx);
-   emit_a64_mov_i64(rx, 0, ctx);
  }

  static void build_epilogue(struct jit_ctx *ctx)
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 9a0c4c2..3c0bfc1 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -408,7 +408,7 @@ static void emit_load_skb_data_hlen(struct bpf_jit *jit)
   * Save registers and create stack frame if necessary.
   * See stack frame layout desription in "bpf_jit.h"!
   */
-static void bpf_jit_prologue(struct bpf_jit *jit, bool is_classic)
+static void bpf_jit_prologue(struct bpf_jit *jit)
  {
if (jit->seen & SEEN_TAIL_CALL) {
/* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */
@@ -448,15 +448,6 @@ static void bpf_jit_prologue(struct bpf_jit *jit, bool 
is_classic)
/* stg %b1,ST_OFF_SKBP(%r0,%r15) */
EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15,
  STK_OFF_SKBP);
-   /* Clear A (%b0) and X (%b7) registers for converted BPF programs */
-   if (is_classic) {
-   if (REG_SEEN(BPF_REG_A))
-   /* lghi %ba,0 */
-   EMIT4_IMM(0xa709, BPF_REG_A, 0);
-   if (REG_SEEN(BPF_REG_X))
-   /* lghi %bx,0 */
-   EMIT4_IMM(0xa709, BPF_REG_X, 0);
-   }
  }

  /*
@@ -1245,7 +1236,7 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct 
bpf_prog *fp)
jit->lit = jit->lit_start;
jit->prg = 0;

-   bpf_jit_prologue(jit, bpf_prog_was_classic(fp));
+   bpf_jit_prologue(jit);
for (i = 0; i < fp->len; i += insn_count) {
insn_count = bpf_jit_insn(jit, fp, i);
if (insn_count < 0)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7599197..c080e81 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -193,7 +193,7 @@ struct jit_context {
 32 /* space for rbx, r13, r14, r15 */ + \
 8 /* space for skb_copy_bits() buffer */)

-#define

Re: [RESEND PATCH] arm64: bpf: add 'store immediate' instruction

2015-12-01 Thread Shi, Yang

On 11/30/2015 2:24 PM, Yang Shi wrote:

aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 


Had email exchange offline with Zi Shen Lim since he is traveling and 
cannot send text-only mail, quoted below for his reply:


"I've given reviewed-by in response to original posting. Unless 
something has changed, feel free to add it."


Since there is nothing changed, added his reviewed-by.

Reviewed-by: Zi Shen Lim 

Thanks,
Yang


CC: Xi Wang 
---
Thsi patch might be buried by the storm of xadd discussion, however, it is
absolutely irrelevent to xadd, so resend the patch itself.

  arch/arm64/net/bpf_jit_comp.c | 20 +++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 6809647..49c1f1b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -563,7 +563,25 @@ emit_cond_jmp:
case BPF_ST | BPF_MEM | BPF_H:
case BPF_ST | BPF_MEM | BPF_B:
case BPF_ST | BPF_MEM | BPF_DW:
-   goto notyet;
+   /* Load imm to a register then store it */
+   ctx->tmp_used = 1;
+   emit_a64_mov_i(1, tmp2, off, ctx);
+   emit_a64_mov_i(1, tmp, imm, ctx);
+   switch (BPF_SIZE(code)) {
+   case BPF_W:
+   emit(A64_STR32(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_H:
+   emit(A64_STRH(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_B:
+   emit(A64_STRB(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_DW:
+   emit(A64_STR64(tmp, dst, tmp2), ctx);
+   break;
+   }
+   break;

/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W:



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction

2015-11-23 Thread Shi, Yang

Hi folks,

Any more comments on this patch (store immediate only)?

I need more time to add XADD (I'm supposed everyone agrees it is 
equivalent to atomic_add). However, this one is irrelevant to XADD, so 
we may be able to apply it first?


Thanks,
Yang


On 11/12/2015 7:45 PM, Z Lim wrote:

On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang  wrote:

On 11/11/2015 4:39 AM, Will Deacon wrote:


Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.



Yes, the immediate means immediate offset for addressing index. Doesn't mean
to store immediate to memory.

I don't think any load-store architecture has store immediate instruction.



Indeed. Sorry for the noise.

Somehow Will caught a whiff of whatever I was smoking then :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix buffer pointer

2015-11-18 Thread Shi, Yang

On 11/18/2015 1:41 PM, Z Lim wrote:

On Wed, Nov 18, 2015 at 1:07 PM, Shi, Yang  wrote:

On 11/18/2015 12:56 AM, Zi Shen Lim wrote:

 emit_a64_mov_i64(r3, size, ctx);
-   emit(A64_ADD_I(1, r4, fp, MAX_BPF_STACK), ctx);
+   emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx);



Should not it sub MAX_BPF_STACK?


No, if it's at (BPF_FP - MAX_BPF_STACK), we'll be writing into the BPF
stack area, which should only be used by the BPF program.


If you sub STACK_SIZE here, the buffer pointer will point to bottom of the
reserved area.


Yes, that's the idea. The buffer is allocated in here. Right now we're
using this "reserved" space for this buffer only.


OK, I see. The buffer grows from low to high.

Thanks for the elaboration.

Acked-by: Yang Shi 

Yang





You stack layout change also shows this:

+*+-+ <= (BPF_FP - MAX_BPF_STACK)
+*|RSVD | JIT scratchpad
+* current A64_SP =>  +-+ <= (BPF_FP - STACK_SIZE)


Yes, this diagram reflects the code and intention.


Thanks for reviewing, we definitely need more of these :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix buffer pointer

2015-11-18 Thread Shi, Yang

On 11/18/2015 12:56 AM, Zi Shen Lim wrote:

During code review, I noticed we were passing a bad buffer pointer
to bpf_load_pointer helper function called by jitted code.

Point to the buffer allocated by JIT, so we don't silently corrupt
other parts of the stack.

Signed-off-by: Zi Shen Lim 
---
  arch/arm64/net/bpf_jit_comp.c | 27 +--
  1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index d6a53ef..7cf032b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -139,6 +139,12 @@ static inline int epilogue_offset(const struct jit_ctx 
*ctx)
  /* Stack must be multiples of 16B */
  #define STACK_ALIGN(sz) (((sz) + 15) & ~15)

+#define _STACK_SIZE \
+   (MAX_BPF_STACK \
++ 4 /* extra for skb_copy_bits buffer */)
+
+#define STACK_SIZE STACK_ALIGN(_STACK_SIZE)
+
  static void build_prologue(struct jit_ctx *ctx)
  {
const u8 r6 = bpf2a64[BPF_REG_6];
@@ -150,10 +156,6 @@ static void build_prologue(struct jit_ctx *ctx)
const u8 rx = bpf2a64[BPF_REG_X];
const u8 tmp1 = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2];
-   int stack_size = MAX_BPF_STACK;
-
-   stack_size += 4; /* extra for skb_copy_bits buffer */
-   stack_size = STACK_ALIGN(stack_size);

/*
 * BPF prog stack layout
@@ -165,12 +167,13 @@ static void build_prologue(struct jit_ctx *ctx)
 *| ... | callee saved registers
 *+-+
 *| | x25/x26
-* BPF fp register => -80:+-+
+* BPF fp register => -80:+-+ <= (BPF_FP)
 *| |
 *| ... | BPF prog stack
 *| |
-*| |
-* current A64_SP =>  +-+
+*+-+ <= (BPF_FP - MAX_BPF_STACK)
+*|RSVD | JIT scratchpad
+* current A64_SP =>  +-+ <= (BPF_FP - STACK_SIZE)
 *| |
 *| ... | Function call stack
 *| |
@@ -196,7 +199,7 @@ static void build_prologue(struct jit_ctx *ctx)
emit(A64_MOV(1, fp, A64_SP), ctx);

/* Set up function call stack */
-   emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+   emit(A64_SUB_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);

/* Clear registers A and X */
emit_a64_mov_i64(ra, 0, ctx);
@@ -213,13 +216,9 @@ static void build_epilogue(struct jit_ctx *ctx)
const u8 fp = bpf2a64[BPF_REG_FP];
const u8 tmp1 = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2];
-   int stack_size = MAX_BPF_STACK;
-
-   stack_size += 4; /* extra for skb_copy_bits buffer */
-   stack_size = STACK_ALIGN(stack_size);

/* We're done with BPF stack */
-   emit(A64_ADD_I(1, A64_SP, A64_SP, stack_size), ctx);
+   emit(A64_ADD_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);

/* Restore fs (x25) and x26 */
emit(A64_POP(fp, A64_R(26), A64_SP), ctx);
@@ -658,7 +657,7 @@ emit_cond_jmp:
return -EINVAL;
}
emit_a64_mov_i64(r3, size, ctx);
-   emit(A64_ADD_I(1, r4, fp, MAX_BPF_STACK), ctx);
+   emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx);


Should not it sub MAX_BPF_STACK?

If you sub STACK_SIZE here, the buffer pointer will point to bottom of 
the reserved area.


You stack layout change also shows this:

+*+-+ <= (BPF_FP - MAX_BPF_STACK)
+*|RSVD | JIT scratchpad
+* current A64_SP =>  +-+ <= (BPF_FP - STACK_SIZE)

Thanks,
Yang



emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx);
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS

2015-11-16 Thread Shi, Yang

On 11/16/2015 8:41 PM, Alexei Starovoitov wrote:

On Mon, Nov 16, 2015 at 08:37:11PM -0800, Z Lim wrote:

On Mon, Nov 16, 2015 at 2:35 PM, Yang Shi  wrote:

Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.

...

CC: Zi Shen Lim 
CC: Xi Wang 
Signed-off-by: Yang Shi 


Acked-by: Zi Shen Lim 


great that it's finalized.
Yang, please resubmit both patches to netdev as fresh submission so it
gets picked up by patchwork.


OK, btw the first one has been applied by David.

Yang





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS

2015-11-16 Thread Shi, Yang

On 11/13/2015 6:39 PM, Z Lim wrote:

Yang, I noticed another thing...

On Fri, Nov 13, 2015 at 10:09 AM, Yang Shi  wrote:

Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.

However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.


Can you please add save/restore for x25 also? :)


Sure. BTW, since PUSH invokes stp instruction and SP need 16-bytes 
alignment, so we have to save x26 with x25 together. Anyway, it won't 
introduce any harm overhead since one instruction saves two registers.


Yang




It is initialized in BPF prog prologue when BPF prog is started to run
everytime. When BPF prog exits, it could be just tossed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS

2015-11-13 Thread Shi, Yang

On 11/12/2015 7:28 PM, Z Lim wrote:

On Thu, Nov 12, 2015 at 1:57 PM, Yang Shi  wrote:


Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.

However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. When BPF prog exits, it could be just tossed.

So, the BPF stack layout looks like:

  high
  original A64_SP =>   0:+-+ BPF prologue
 | | FP/LR and callee saved registers
  BPF fp register => -64:+-+
 | |
 | ... | BPF prog stack
 | |
 | |
  current A64_SP/FP =>   +-+
 | |
 | ... | Function call stack
 | |
 +-+
   low



Yang, for stack unwinding to work, shouldn't it be something like the following?


Yes, thanks for catching this. v3 will be post soon.

Yang



   | LR |
A64_FP => | FP |
   | .. |



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction

2015-11-12 Thread Shi, Yang

On 11/11/2015 4:39 AM, Will Deacon wrote:

On Wed, Nov 11, 2015 at 12:12:56PM +, Will Deacon wrote:

On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:

On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi  wrote:

aarch64 doesn't have native store immediate instruction, such operation


Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
consider using it as an optimization.


Yes, I'd definitely like to see that in preference to moving via a
temporary register.


Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.


Yes, the immediate means immediate offset for addressing index. Doesn't 
mean to store immediate to memory.


I don't think any load-store architecture has store immediate instruction.

Thanks,
Yang



So the original patch is fine.

Will



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction

2015-11-10 Thread Shi, Yang

On 11/10/2015 4:08 PM, Eric Dumazet wrote:

On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:

aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

Load (dst + off) to a register
Add src to it
Store it back to (dst + off)


Not really what is needed ?

See this BPF_XADD as an atomic_add() equivalent.


I see. Thanks. The documentation doesn't say too much about "exclusive" 
add. If so it should need load-acquire/store-release.


I will rework it.

Yang






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix JIT stack setup

2015-11-10 Thread Shi, Yang

On 11/9/2015 12:00 PM, Z Lim wrote:

On Mon, Nov 9, 2015 at 10:08 AM, Shi, Yang  wrote:

I added it to stay align with ARMv8 AAPCS to maintain the correct FP during
function call. It makes us get correct stack backtrace.

I think we'd better to keep compliant with ARMv8 AAPCS in BPF JIT prologue
too.

If nobody thinks it is necessary, we definitely could remove that change.


Oh no, I don't think anyone will say it's unnecessary!
I agree the A64_FP-related change is a good idea, so stack unwinding works.

How about splitting this into two patches? One for the BPF-related
bug, and another for A64 FP-handling.


I'm not sure if this is a good approach or not. IMHO, they are kind of 
atomic. Without A64 FP-handling, that fix looks incomplete and 
introduces another problem (stack backtrace).


Thanks,
Yang



Thanks again for tracking this down and improving things overall for arm64 :)



Thanks,
Yang




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix JIT stack setup

2015-11-09 Thread Shi, Yang

On 11/8/2015 2:29 PM, Z Lim wrote:

On Sat, Nov 7, 2015 at 6:27 PM, Alexei Starovoitov
 wrote:

On Fri, Nov 06, 2015 at 09:36:17PM -0800, Yang Shi wrote:

ARM64 JIT used FP (x29) as eBPF fp register, but FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too. Whenever, it pointed to the bottom of BPF prog stack instead of
the top.

So, when copying data via bpf_probe_read, it will be copied to (SP - offset),
then it may overwrite the saved FP/LR.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. When BPF prog exits, it could be just tossed.

Other than this the BPf prog stack base need to be setup before function
call stack.

So, the BPF stack layout looks like:

  high
  original A64_SP =>   0:+-+ BPF prologue
 | | FP/LR and callee saved registers
  BPF fp register => +64:+-+
 | |
 | ... | BPF prog stack
 | |
 | |
  current A64_SP =>  +-+
 | |
 | ... | Function call stack
 | |
 +-+
   low

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 
CC: Xi Wang 


Thanks for tracking it down.
That looks like fundamental bug in arm64 jit. I'm surprised function calls 
worked at all.
Zi please review.



For function calls (BPF_JMP | BPF_CALL), we are compliant with AAPCS64
[1]. That part is okay.


bpf_probe_read accesses the BPF program stack, which is based on BPF_REG_FP.

This exposes an issue with how BPF_REG_FP was setup, as Yang pointed out.
Instead of having BPF_REG_FP point to top of stack, we erroneously
point it to the bottom of stack. When there are function calls, we run
the risk of clobbering of BPF stack. Bad idea.


Yes, exactly.



Otherwise, since BPF_REG_FP is read-only, and is setup exactly once in
prologue, it remains consistent throughout lifetime of the BPF
program.


Yang, can you please try the following?


It should work without the below change:

+   emit(A64_MOV(1, A64_FP, A64_SP), ctx);

I added it to stay align with ARMv8 AAPCS to maintain the correct FP 
during function call. It makes us get correct stack backtrace.


I think we'd better to keep compliant with ARMv8 AAPCS in BPF JIT 
prologue too.


If nobody thinks it is necessary, we definitely could remove that change.

Thanks,
Yang



8<-
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -161,12 +161,12 @@ static void build_prologue(struct jit_ctx *ctx)
 if (ctx->tmp_used)
 emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx);

-   /* Set up BPF stack */
-   emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
-
 /* Set up frame pointer */
 emit(A64_MOV(1, fp, A64_SP), ctx);

+   /* Set up BPF stack */
+   emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+
 /* Clear registers A and X */
 emit_a64_mov_i64(ra, 0, ctx);
 emit_a64_mov_i64(rx, 0, ctx);
->8

[1] 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: bpf: fix JIT stack setup

2015-11-06 Thread Shi, Yang

Please ignore this one, forgot to cc to linux-arm-kernel list.

Sorry for the inconvenience.

Yang


On 11/6/2015 9:34 PM, Yang Shi wrote:

ARM64 JIT used FP (x29) as eBPF fp register, but FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too. Whenever, it pointed to the bottom of BPF prog stack instead of
the top.

So, when copying data via bpf_probe_read, it will be copied to (SP - offset),
then it may overwrite the saved FP/LR.

Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. When BPF prog exits, it could be just tossed.

Other than this the BPf prog stack base need to be setup before function
call stack.

So, the BPF stack layout looks like:

  high
  original A64_SP =>   0:+-+ BPF prologue
 | | FP/LR and callee saved registers
  BPF fp register => +64:+-+
 | |
 | ... | BPF prog stack
 | |
 | |
  current A64_SP =>  +-+
 | |
 | ... | Function call stack
 | |
 +-+
   low

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 
CC: Xi Wang 
---
  arch/arm64/net/bpf_jit_comp.c | 38 +++---
  1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a44e529..6809647 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -50,7 +50,7 @@ static const int bpf2a64[] = {
[BPF_REG_8] = A64_R(21),
[BPF_REG_9] = A64_R(22),
/* read-only frame pointer to access stack */
-   [BPF_REG_FP] = A64_FP,
+   [BPF_REG_FP] = A64_R(25),
/* temporary register for internal BPF JIT */
[TMP_REG_1] = A64_R(23),
[TMP_REG_2] = A64_R(24),
@@ -155,18 +155,42 @@ static void build_prologue(struct jit_ctx *ctx)
stack_size += 4; /* extra for skb_copy_bits buffer */
stack_size = STACK_ALIGN(stack_size);

+   /*
+* BPF prog stack layout
+*
+* high
+* original A64_SP =>   0:+-+ BPF prologue
+*| | FP/LR and callee saved registers
+* BPF fp register => +64:+-+
+*| |
+ *| ... | BPF prog stack
+*| |
+*| |
+* current A64_SP =>  +-+
+*| |
+*| ... | Function call stack
+*| |
+*+-+
+*  low
+*
+*/
+
+   /* Save FP and LR registers to stay align with ARM64 AAPCS */
+   emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
+
/* Save callee-saved register */
emit(A64_PUSH(r6, r7, A64_SP), ctx);
emit(A64_PUSH(r8, r9, A64_SP), ctx);
if (ctx->tmp_used)
emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx);

-   /* Set up BPF stack */
-   emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
-
-   /* Set up frame pointer */
+   /* Set up BPF prog stack base register (x25) */
emit(A64_MOV(1, fp, A64_SP), ctx);

+   /* Set up function call stack */
+   emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+   emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
/* Clear registers A and X */
emit_a64_mov_i64(ra, 0, ctx);
emit_a64_mov_i64(rx, 0, ctx);
@@ -196,8 +220,8 @@ static void build_epilogue(struct jit_ctx *ctx)
emit(A64_POP(r8, r9, A64_SP), ctx);
emit(A64_POP(r6, r7, A64_SP), ctx);

-   /* Restore frame pointer */
-   emit(A64_MOV(1, fp, A64_SP), ctx);
+   /* Restore FP/LR registers */
+   emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);

/* Set return value */
emit(A64_MOV(1, A64_R(0), r0), ctx);



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf: convert hashtab lock to raw lock

2015-11-02 Thread Shi, Yang

On 11/2/2015 9:24 AM, Steven Rostedt wrote:

On Mon, 02 Nov 2015 09:12:29 -0800
"Shi, Yang"  wrote:


Yes, it is common practice for converting sleepable spin lock to raw
spin lock in -rt to avoid scheduling in atomic context bug.


Note, in a lot of cases we don't just convert spin_locks to raw because
of atomic context. There's times we need to change the design where the
lock is not taken in atomic context (switching preempt_disable() to a
local_lock() for example).


Yes, definitely. Understood.

Thanks,
Yang



But bpf is much like ftrace and kprobes where they can be taken almost
anywhere, and the do indeed need to be raw.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf: convert hashtab lock to raw lock

2015-11-02 Thread Shi, Yang

On 10/31/2015 11:37 AM, Daniel Borkmann wrote:

On 10/31/2015 02:47 PM, Steven Rostedt wrote:

On Fri, 30 Oct 2015 17:03:58 -0700
Alexei Starovoitov  wrote:

On Fri, Oct 30, 2015 at 03:16:26PM -0700, Yang Shi wrote:

When running bpf samples on rt kernel, it reports the below warning:

BUG: sleeping function called from invalid context at
kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping
Preemption disabled at:[] kprobe_perf_func+0x30/0x228

...

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 83c209d..972b76b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -17,7 +17,7 @@
  struct bpf_htab {
  struct bpf_map map;
  struct hlist_head *buckets;
-spinlock_t lock;
+raw_spinlock_t lock;


How do we address such things in general?
I bet there are tons of places around the kernel that
call spin_lock from atomic.
I'd hate to lose the benefits of lockdep of non-raw spin_lock
just to make rt happy.


You wont lose any benefits of lockdep. Lockdep still checks
raw_spin_lock(). The only difference between raw_spin_lock and
spin_lock is that in -rt spin_lock turns into an rt_mutex() and
raw_spin_lock stays a spin lock.


( Btw, Yang, would have been nice if your commit description would have
   already included such info, not only that you convert it, but also why
   it's okay to do so. )


I think Thomas's document will include all the information about rt spin 
lock/raw spin lock, etc.


Alexei & Daniel,

If you think such info is necessary, I definitely could add it into the 
commit log in v2.





The error is that in -rt, you called a mutex and not a spin lock while
atomic.


You are right, I think this happens due to the preempt_disable() in the
trace_call_bpf() handler. So, I think the patch seems okay. The dep_map
is btw union'ed in the struct spinlock case to the same offset of the
dep_map from raw_spinlock.

It's a bit inconvenient, though, when we add other library code as maps
in future, f.e. things like rhashtable as they would first need to be
converted to raw_spinlock_t as well, but judging from the git log, it
looks like common practice.


Yes, it is common practice for converting sleepable spin lock to raw 
spin lock in -rt to avoid scheduling in atomic context bug.


Thanks,
Yang



Thanks,
Daniel


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bpf: convert hashtab lock to raw lock

2015-11-02 Thread Shi, Yang

On 11/2/2015 12:59 AM, Thomas Gleixner wrote:

On Sun, 1 Nov 2015, Alexei Starovoitov wrote:

On Sat, Oct 31, 2015 at 09:47:36AM -0400, Steven Rostedt wrote:

On Fri, 30 Oct 2015 17:03:58 -0700
Alexei Starovoitov  wrote:


On Fri, Oct 30, 2015 at 03:16:26PM -0700, Yang Shi wrote:

When running bpf samples on rt kernel, it reports the below warning:

BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping
Preemption disabled at:[] kprobe_perf_func+0x30/0x228

...

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 83c209d..972b76b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -17,7 +17,7 @@
  struct bpf_htab {
struct bpf_map map;
struct hlist_head *buckets;
-   spinlock_t lock;
+   raw_spinlock_t lock;


How do we address such things in general?
I bet there are tons of places around the kernel that
call spin_lock from atomic.
I'd hate to lose the benefits of lockdep of non-raw spin_lock
just to make rt happy.


You wont lose any benefits of lockdep. Lockdep still checks
raw_spin_lock(). The only difference between raw_spin_lock and
spin_lock is that in -rt spin_lock turns into an rt_mutex() and
raw_spin_lock stays a spin lock.


I see. The patch makes sense then.
Would be good to document this peculiarity of spin_lock.


I'm working on a document.


Thanks Steven and Thomas for your elaboration and comment.

Yang



Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html