Re: [PATCH net-next] bpf: arm64: remove callee-save registers use for tmp registers
On 5/16/2016 4:45 PM, Z Lim wrote: Hi Yang, On Mon, May 16, 2016 at 4:09 PM, Yang Shi wrote: In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for tmp registers, which are callee-saved registers. This leads to variable size of JIT prologue and epilogue. The latest blinding constant change prefers to constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp registers which not need to be saved/restored during function call. So, replace R23 and R24 to R10 and R11, and remove tmp_used flag. CC: Zi Shen Lim CC: Daniel Borkmann Signed-off-by: Yang Shi --- Couple suggestions, but otherwise: Acked-by: Zi Shen Lim 1. Update the diagram. I think it should now be: -* BPF fp register => -80:+-+ <= (BPF_FP) +* BPF fp register => -64:+-+ <= (BPF_FP) Nice catch. I forgot the stack diagram. 2. Add a comment in commit log along the lines of: this is an optimization saving 2 instructions per jited BPF program. Sure, will address in V2. Thanks, Yang Thanks :) z Apply on top of Daniel's blinding constant patchset. arch/arm64/net/bpf_jit_comp.c | 32 1 file changed, 4 insertions(+), 28 deletions(-)
Re: [PATCH] arm64: bpf: jit JMP_JSET_{X,K}
On 5/12/2016 11:37 PM, Zi Shen Lim wrote: Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler") had the relevant code paths, but due to an oversight always fail jiting. As a result, we had been falling back to BPF interpreter whenever a BPF program has JMP_JSET_{X,K} instructions. With this fix, we confirm that the corresponding tests in lib/test_bpf continue to pass, and also jited. ... [2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS [2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS [2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS ... [3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS [3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0x) return 1 jited:1 98 PASS [3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS [3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0x) return 1 jited:1 89 PASS ... Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim Acked-by: Yang Shi Thanks, Yang --- arch/arm64/net/bpf_jit_comp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 031ed08..d0d5190 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -478,6 +478,7 @@ emit_cond_jmp: case BPF_JGE: jmp_cond = A64_COND_CS; break; + case BPF_JSET: case BPF_JNE: jmp_cond = A64_COND_NE; break;
Re: Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420
On 4/22/2016 9:50 PM, Eric Dumazet wrote: On Fri, 2016-04-22 at 21:02 -0700, Shi, Yang wrote: Hi David, When I ran some test on a nfs mounted rootfs, I got the below warning with LOCKDEP enabled on linux-next-20160420: WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408 udp_queue_rcv_skb+0x3d0/0x660 Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Tainted: G D 4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 88066fd03a70 8155855f 88066fd03ab0 81062803 058061318ec8 88065d1e39c0 880661318e40 880661318ec8 Call Trace: [] dump_stack+0x67/0x98 Checking out fil [] __warn+0xd3/0xf0 [] warn_slowpath_null+0x1d/0x20 [] udp_queue_rcv_skb+0x3d0/0x660 [] __udp4_lib_rcv+0x4dc/0xc00 [] udp_rcv+0x1a/0x20 [] ip_local_deliver_finish+0xd1/0x2e0 es: 57% (30585/ [] ? ip_local_deliver_finish+0x3f/0x2e0 [] ip_local_deliver+0xc2/0xd0 [] ip_rcv_finish+0x1e2/0x5a0 [] ip_rcv+0x2dc/0x410 [] ? __pskb_pull_tail+0x82/0x400 [] __netif_receive_skb_core+0x3a8/0xa80 [] ? netif_receive_skb_internal+0x1b/0xf0 [] __netif_receive_skb+0x1d/0x60 [] netif_receive_skb_internal+0x55/0xf0 [] ? netif_receive_skb_internal+0x1b/0xf0 [] napi_gro_receive+0xc2/0x180 [] igb_poll+0x5ea/0xdf0 [] net_rx_action+0x15c/0x3d0 [] __do_softirq+0x161/0x413 [] irq_exit+0xd1/0x110 [] do_IRQ+0x62/0xf0 [] common_interrupt+0x8e/0x8e [] ? cpuidle_enter_state+0xc6/0x290 [] cpuidle_enter+0x17/0x20 [] call_cpuidle+0x33/0x50 [] cpu_startup_entry+0x229/0x3b0 [] start_secondary+0x144/0x150 ---[ end trace ba508c424f0d52bf ]--- The warning is triggered by commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks for sock_owned_by_user"), which checks if slock is held before locking "owned". It looks good to lock_sock which is just called lock_sock_nested. But, bh_lock_sock is different, which just calls spin_lock so it doesn't touch dep_map then the check will fail even though it is locked. ?? spin_lock() definitely is lockdep friendly. Yes, this is what I thought too. But, I didn't figure out why the warning was still reported even though spin_lock is called. So, I'm wondering what a right fix for it should be: 1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols implementation, but there are a lot places calling it. 2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock. Or the both approach is wrong or not ideal? I sent a patch yesterday, I am not sure what the status is. Thanks for the patch. I just found your original patch and the discussion with Valdis. I think I ran into the same problem. There is kernel BUG is triggered before the warning, but "lockdep is off" information is not printed out, although is it really off. Just tried your patch, it works for me. Thanks, Yang diff --git a/include/net/sock.h b/include/net/sock.h index d997ec13a643..db8301c76d50 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk) { struct sock *sk = (struct sock *)csk; - return lockdep_is_held(&sk->sk_lock) || + return !debug_locks || + lockdep_is_held(&sk->sk_lock) || lockdep_is_held(&sk->sk_lock.slock); } #endif
Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420
Hi David, When I ran some test on a nfs mounted rootfs, I got the below warning with LOCKDEP enabled on linux-next-20160420: WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408 udp_queue_rcv_skb+0x3d0/0x660 Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Tainted: G D 4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 88066fd03a70 8155855f 88066fd03ab0 81062803 058061318ec8 88065d1e39c0 880661318e40 880661318ec8 Call Trace: [] dump_stack+0x67/0x98 Checking out fil [] __warn+0xd3/0xf0 [] warn_slowpath_null+0x1d/0x20 [] udp_queue_rcv_skb+0x3d0/0x660 [] __udp4_lib_rcv+0x4dc/0xc00 [] udp_rcv+0x1a/0x20 [] ip_local_deliver_finish+0xd1/0x2e0 es: 57% (30585/ [] ? ip_local_deliver_finish+0x3f/0x2e0 [] ip_local_deliver+0xc2/0xd0 [] ip_rcv_finish+0x1e2/0x5a0 [] ip_rcv+0x2dc/0x410 [] ? __pskb_pull_tail+0x82/0x400 [] __netif_receive_skb_core+0x3a8/0xa80 [] ? netif_receive_skb_internal+0x1b/0xf0 [] __netif_receive_skb+0x1d/0x60 [] netif_receive_skb_internal+0x55/0xf0 [] ? netif_receive_skb_internal+0x1b/0xf0 [] napi_gro_receive+0xc2/0x180 [] igb_poll+0x5ea/0xdf0 [] net_rx_action+0x15c/0x3d0 [] __do_softirq+0x161/0x413 [] irq_exit+0xd1/0x110 [] do_IRQ+0x62/0xf0 [] common_interrupt+0x8e/0x8e [] ? cpuidle_enter_state+0xc6/0x290 [] cpuidle_enter+0x17/0x20 [] call_cpuidle+0x33/0x50 [] cpu_startup_entry+0x229/0x3b0 [] start_secondary+0x144/0x150 ---[ end trace ba508c424f0d52bf ]--- The warning is triggered by commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks for sock_owned_by_user"), which checks if slock is held before locking "owned". It looks good to lock_sock which is just called lock_sock_nested. But, bh_lock_sock is different, which just calls spin_lock so it doesn't touch dep_map then the check will fail even though it is locked. So, I'm wondering what a right fix for it should be: 1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols implementation, but there are a lot places calling it. 2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock. Or the both approach is wrong or not ideal? Thanks, Yang
Re: [PATCH net-next 2/5] bpf: move clearing of A/X into classic to eBPF migration prologue
On 12/17/2015 2:51 PM, Daniel Borkmann wrote: Back in the days where eBPF (or back then "internal BPF" ;->) was not exposed to user space, and only the classic BPF programs internally translated into eBPF programs, we missed the fact that for classic BPF A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9 ("net: filter: initialize A and X registers"), and thus classic BPF specifics were added to the eBPF interpreter core to work around it. This added some confusion for JIT developers later on that take the eBPF interpreter code as an example for deriving their JIT. F.e. in f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at least X could leak stack memory. Furthermore, since this is only needed for classic BPF translations and not for eBPF (verifier takes care that read access to regs cannot be done uninitialized), more complexity is added to JITs as they need to determine whether they deal with migrations or native eBPF where they can just omit clearing A/X in their prologue and thus reduce image size a bit, see f.e. cde66c2d88da ("s390/bpf: Only clear A and X for converted BPF programs"). In other cases (x86, arm64), A and X is being cleared in the prologue also for eBPF case, which is unnecessary. Lets move this into the BPF migration in bpf_convert_filter() where it actually belongs as long as the number of eBPF JITs are still few. It can thus be done generically; allowing us to remove the quirk from __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF, while reducing code duplication on this matter in current(/future) eBPF JITs. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Reviewed-by: Michael Holzheu Tested-by: Michael Holzheu Cc: Zi Shen Lim Cc: Yang Shi Acked by me on the arm64 part. Acked-by: Yang Shi Thanks, Yang --- arch/arm64/net/bpf_jit_comp.c | 6 -- arch/s390/net/bpf_jit_comp.c | 13 ++--- arch/x86/net/bpf_jit_comp.c | 14 +- kernel/bpf/core.c | 4 net/core/filter.c | 19 --- 5 files changed, 27 insertions(+), 29 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index b162ad7..7658612 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -152,8 +152,6 @@ static void build_prologue(struct jit_ctx *ctx) const u8 r8 = bpf2a64[BPF_REG_8]; const u8 r9 = bpf2a64[BPF_REG_9]; const u8 fp = bpf2a64[BPF_REG_FP]; - const u8 ra = bpf2a64[BPF_REG_A]; - const u8 rx = bpf2a64[BPF_REG_X]; const u8 tmp1 = bpf2a64[TMP_REG_1]; const u8 tmp2 = bpf2a64[TMP_REG_2]; @@ -200,10 +198,6 @@ static void build_prologue(struct jit_ctx *ctx) /* Set up function call stack */ emit(A64_SUB_I(1, A64_SP, A64_SP, STACK_SIZE), ctx); - - /* Clear registers A and X */ - emit_a64_mov_i64(ra, 0, ctx); - emit_a64_mov_i64(rx, 0, ctx); } static void build_epilogue(struct jit_ctx *ctx) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 9a0c4c2..3c0bfc1 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -408,7 +408,7 @@ static void emit_load_skb_data_hlen(struct bpf_jit *jit) * Save registers and create stack frame if necessary. * See stack frame layout desription in "bpf_jit.h"! */ -static void bpf_jit_prologue(struct bpf_jit *jit, bool is_classic) +static void bpf_jit_prologue(struct bpf_jit *jit) { if (jit->seen & SEEN_TAIL_CALL) { /* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */ @@ -448,15 +448,6 @@ static void bpf_jit_prologue(struct bpf_jit *jit, bool is_classic) /* stg %b1,ST_OFF_SKBP(%r0,%r15) */ EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15, STK_OFF_SKBP); - /* Clear A (%b0) and X (%b7) registers for converted BPF programs */ - if (is_classic) { - if (REG_SEEN(BPF_REG_A)) - /* lghi %ba,0 */ - EMIT4_IMM(0xa709, BPF_REG_A, 0); - if (REG_SEEN(BPF_REG_X)) - /* lghi %bx,0 */ - EMIT4_IMM(0xa709, BPF_REG_X, 0); - } } /* @@ -1245,7 +1236,7 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp) jit->lit = jit->lit_start; jit->prg = 0; - bpf_jit_prologue(jit, bpf_prog_was_classic(fp)); + bpf_jit_prologue(jit); for (i = 0; i < fp->len; i += insn_count) { insn_count = bpf_jit_insn(jit, fp, i); if (insn_count < 0) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 7599197..c080e81 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -193,7 +193,7 @@ struct jit_context { 32 /* space for rbx, r13, r14, r15 */ + \ 8 /* space for skb_copy_bits() buffer */) -#define
Re: [RESEND PATCH] arm64: bpf: add 'store immediate' instruction
On 11/30/2015 2:24 PM, Yang Shi wrote: aarch64 doesn't have native store immediate instruction, such operation has to be implemented by the below instruction sequence: Load immediate to register Store register Signed-off-by: Yang Shi CC: Zi Shen Lim Had email exchange offline with Zi Shen Lim since he is traveling and cannot send text-only mail, quoted below for his reply: "I've given reviewed-by in response to original posting. Unless something has changed, feel free to add it." Since there is nothing changed, added his reviewed-by. Reviewed-by: Zi Shen Lim Thanks, Yang CC: Xi Wang --- Thsi patch might be buried by the storm of xadd discussion, however, it is absolutely irrelevent to xadd, so resend the patch itself. arch/arm64/net/bpf_jit_comp.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 6809647..49c1f1b 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -563,7 +563,25 @@ emit_cond_jmp: case BPF_ST | BPF_MEM | BPF_H: case BPF_ST | BPF_MEM | BPF_B: case BPF_ST | BPF_MEM | BPF_DW: - goto notyet; + /* Load imm to a register then store it */ + ctx->tmp_used = 1; + emit_a64_mov_i(1, tmp2, off, ctx); + emit_a64_mov_i(1, tmp, imm, ctx); + switch (BPF_SIZE(code)) { + case BPF_W: + emit(A64_STR32(tmp, dst, tmp2), ctx); + break; + case BPF_H: + emit(A64_STRH(tmp, dst, tmp2), ctx); + break; + case BPF_B: + emit(A64_STRB(tmp, dst, tmp2), ctx); + break; + case BPF_DW: + emit(A64_STR64(tmp, dst, tmp2), ctx); + break; + } + break; /* STX: *(size *)(dst + off) = src */ case BPF_STX | BPF_MEM | BPF_W: -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
Hi folks, Any more comments on this patch (store immediate only)? I need more time to add XADD (I'm supposed everyone agrees it is equivalent to atomic_add). However, this one is irrelevant to XADD, so we may be able to apply it first? Thanks, Yang On 11/12/2015 7:45 PM, Z Lim wrote: On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang wrote: On 11/11/2015 4:39 AM, Will Deacon wrote: Wait a second, we're both talking rubbish here :) The STR (immediate) form is referring to the addressing mode, whereas this patch wants to store an immediate value to memory, which does need moving to a register first. Yes, the immediate means immediate offset for addressing index. Doesn't mean to store immediate to memory. I don't think any load-store architecture has store immediate instruction. Indeed. Sorry for the noise. Somehow Will caught a whiff of whatever I was smoking then :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: bpf: fix buffer pointer
On 11/18/2015 1:41 PM, Z Lim wrote: On Wed, Nov 18, 2015 at 1:07 PM, Shi, Yang wrote: On 11/18/2015 12:56 AM, Zi Shen Lim wrote: emit_a64_mov_i64(r3, size, ctx); - emit(A64_ADD_I(1, r4, fp, MAX_BPF_STACK), ctx); + emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx); Should not it sub MAX_BPF_STACK? No, if it's at (BPF_FP - MAX_BPF_STACK), we'll be writing into the BPF stack area, which should only be used by the BPF program. If you sub STACK_SIZE here, the buffer pointer will point to bottom of the reserved area. Yes, that's the idea. The buffer is allocated in here. Right now we're using this "reserved" space for this buffer only. OK, I see. The buffer grows from low to high. Thanks for the elaboration. Acked-by: Yang Shi Yang You stack layout change also shows this: +*+-+ <= (BPF_FP - MAX_BPF_STACK) +*|RSVD | JIT scratchpad +* current A64_SP => +-+ <= (BPF_FP - STACK_SIZE) Yes, this diagram reflects the code and intention. Thanks for reviewing, we definitely need more of these :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: bpf: fix buffer pointer
On 11/18/2015 12:56 AM, Zi Shen Lim wrote: During code review, I noticed we were passing a bad buffer pointer to bpf_load_pointer helper function called by jitted code. Point to the buffer allocated by JIT, so we don't silently corrupt other parts of the stack. Signed-off-by: Zi Shen Lim --- arch/arm64/net/bpf_jit_comp.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index d6a53ef..7cf032b 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -139,6 +139,12 @@ static inline int epilogue_offset(const struct jit_ctx *ctx) /* Stack must be multiples of 16B */ #define STACK_ALIGN(sz) (((sz) + 15) & ~15) +#define _STACK_SIZE \ + (MAX_BPF_STACK \ ++ 4 /* extra for skb_copy_bits buffer */) + +#define STACK_SIZE STACK_ALIGN(_STACK_SIZE) + static void build_prologue(struct jit_ctx *ctx) { const u8 r6 = bpf2a64[BPF_REG_6]; @@ -150,10 +156,6 @@ static void build_prologue(struct jit_ctx *ctx) const u8 rx = bpf2a64[BPF_REG_X]; const u8 tmp1 = bpf2a64[TMP_REG_1]; const u8 tmp2 = bpf2a64[TMP_REG_2]; - int stack_size = MAX_BPF_STACK; - - stack_size += 4; /* extra for skb_copy_bits buffer */ - stack_size = STACK_ALIGN(stack_size); /* * BPF prog stack layout @@ -165,12 +167,13 @@ static void build_prologue(struct jit_ctx *ctx) *| ... | callee saved registers *+-+ *| | x25/x26 -* BPF fp register => -80:+-+ +* BPF fp register => -80:+-+ <= (BPF_FP) *| | *| ... | BPF prog stack *| | -*| | -* current A64_SP => +-+ +*+-+ <= (BPF_FP - MAX_BPF_STACK) +*|RSVD | JIT scratchpad +* current A64_SP => +-+ <= (BPF_FP - STACK_SIZE) *| | *| ... | Function call stack *| | @@ -196,7 +199,7 @@ static void build_prologue(struct jit_ctx *ctx) emit(A64_MOV(1, fp, A64_SP), ctx); /* Set up function call stack */ - emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); + emit(A64_SUB_I(1, A64_SP, A64_SP, STACK_SIZE), ctx); /* Clear registers A and X */ emit_a64_mov_i64(ra, 0, ctx); @@ -213,13 +216,9 @@ static void build_epilogue(struct jit_ctx *ctx) const u8 fp = bpf2a64[BPF_REG_FP]; const u8 tmp1 = bpf2a64[TMP_REG_1]; const u8 tmp2 = bpf2a64[TMP_REG_2]; - int stack_size = MAX_BPF_STACK; - - stack_size += 4; /* extra for skb_copy_bits buffer */ - stack_size = STACK_ALIGN(stack_size); /* We're done with BPF stack */ - emit(A64_ADD_I(1, A64_SP, A64_SP, stack_size), ctx); + emit(A64_ADD_I(1, A64_SP, A64_SP, STACK_SIZE), ctx); /* Restore fs (x25) and x26 */ emit(A64_POP(fp, A64_R(26), A64_SP), ctx); @@ -658,7 +657,7 @@ emit_cond_jmp: return -EINVAL; } emit_a64_mov_i64(r3, size, ctx); - emit(A64_ADD_I(1, r4, fp, MAX_BPF_STACK), ctx); + emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx); Should not it sub MAX_BPF_STACK? If you sub STACK_SIZE here, the buffer pointer will point to bottom of the reserved area. You stack layout change also shows this: +*+-+ <= (BPF_FP - MAX_BPF_STACK) +*|RSVD | JIT scratchpad +* current A64_SP => +-+ <= (BPF_FP - STACK_SIZE) Thanks, Yang emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx); emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); emit(A64_MOV(1, A64_FP, A64_SP), ctx); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
On 11/16/2015 8:41 PM, Alexei Starovoitov wrote: On Mon, Nov 16, 2015 at 08:37:11PM -0800, Z Lim wrote: On Mon, Nov 16, 2015 at 2:35 PM, Yang Shi wrote: Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP in prologue in order to get the correct stack backtrace. ... CC: Zi Shen Lim CC: Xi Wang Signed-off-by: Yang Shi Acked-by: Zi Shen Lim great that it's finalized. Yang, please resubmit both patches to netdev as fresh submission so it gets picked up by patchwork. OK, btw the first one has been applied by David. Yang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
On 11/13/2015 6:39 PM, Z Lim wrote: Yang, I noticed another thing... On Fri, Nov 13, 2015 at 10:09 AM, Yang Shi wrote: Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP in prologue in order to get the correct stack backtrace. However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to change during function call so it may cause the BPF prog stack base address change too. Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee saved register, so it will keep intact during function call. Can you please add save/restore for x25 also? :) Sure. BTW, since PUSH invokes stp instruction and SP need 16-bytes alignment, so we have to save x26 with x25 together. Anyway, it won't introduce any harm overhead since one instruction saves two registers. Yang It is initialized in BPF prog prologue when BPF prog is started to run everytime. When BPF prog exits, it could be just tossed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
On 11/12/2015 7:28 PM, Z Lim wrote: On Thu, Nov 12, 2015 at 1:57 PM, Yang Shi wrote: Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP in prologue in order to get the correct stack backtrace. However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to change during function call so it may cause the BPF prog stack base address change too. Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee saved register, so it will keep intact during function call. It is initialized in BPF prog prologue when BPF prog is started to run everytime. When BPF prog exits, it could be just tossed. So, the BPF stack layout looks like: high original A64_SP => 0:+-+ BPF prologue | | FP/LR and callee saved registers BPF fp register => -64:+-+ | | | ... | BPF prog stack | | | | current A64_SP/FP => +-+ | | | ... | Function call stack | | +-+ low Yang, for stack unwinding to work, shouldn't it be something like the following? Yes, thanks for catching this. v3 will be post soon. Yang | LR | A64_FP => | FP | | .. | -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
On 11/11/2015 4:39 AM, Will Deacon wrote: On Wed, Nov 11, 2015 at 12:12:56PM +, Will Deacon wrote: On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote: On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi wrote: aarch64 doesn't have native store immediate instruction, such operation Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can consider using it as an optimization. Yes, I'd definitely like to see that in preference to moving via a temporary register. Wait a second, we're both talking rubbish here :) The STR (immediate) form is referring to the addressing mode, whereas this patch wants to store an immediate value to memory, which does need moving to a register first. Yes, the immediate means immediate offset for addressing index. Doesn't mean to store immediate to memory. I don't think any load-store architecture has store immediate instruction. Thanks, Yang So the original patch is fine. Will -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
On 11/10/2015 4:08 PM, Eric Dumazet wrote: On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: aarch64 doesn't have native support for XADD instruction, implement it by the below instruction sequence: Load (dst + off) to a register Add src to it Store it back to (dst + off) Not really what is needed ? See this BPF_XADD as an atomic_add() equivalent. I see. Thanks. The documentation doesn't say too much about "exclusive" add. If so it should need load-acquire/store-release. I will rework it. Yang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: bpf: fix JIT stack setup
On 11/9/2015 12:00 PM, Z Lim wrote: On Mon, Nov 9, 2015 at 10:08 AM, Shi, Yang wrote: I added it to stay align with ARMv8 AAPCS to maintain the correct FP during function call. It makes us get correct stack backtrace. I think we'd better to keep compliant with ARMv8 AAPCS in BPF JIT prologue too. If nobody thinks it is necessary, we definitely could remove that change. Oh no, I don't think anyone will say it's unnecessary! I agree the A64_FP-related change is a good idea, so stack unwinding works. How about splitting this into two patches? One for the BPF-related bug, and another for A64 FP-handling. I'm not sure if this is a good approach or not. IMHO, they are kind of atomic. Without A64 FP-handling, that fix looks incomplete and introduces another problem (stack backtrace). Thanks, Yang Thanks again for tracking this down and improving things overall for arm64 :) Thanks, Yang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: bpf: fix JIT stack setup
On 11/8/2015 2:29 PM, Z Lim wrote: On Sat, Nov 7, 2015 at 6:27 PM, Alexei Starovoitov wrote: On Fri, Nov 06, 2015 at 09:36:17PM -0800, Yang Shi wrote: ARM64 JIT used FP (x29) as eBPF fp register, but FP is subjected to change during function call so it may cause the BPF prog stack base address change too. Whenever, it pointed to the bottom of BPF prog stack instead of the top. So, when copying data via bpf_probe_read, it will be copied to (SP - offset), then it may overwrite the saved FP/LR. Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee saved register, so it will keep intact during function call. It is initialized in BPF prog prologue when BPF prog is started to run everytime. When BPF prog exits, it could be just tossed. Other than this the BPf prog stack base need to be setup before function call stack. So, the BPF stack layout looks like: high original A64_SP => 0:+-+ BPF prologue | | FP/LR and callee saved registers BPF fp register => +64:+-+ | | | ... | BPF prog stack | | | | current A64_SP => +-+ | | | ... | Function call stack | | +-+ low Signed-off-by: Yang Shi CC: Zi Shen Lim CC: Xi Wang Thanks for tracking it down. That looks like fundamental bug in arm64 jit. I'm surprised function calls worked at all. Zi please review. For function calls (BPF_JMP | BPF_CALL), we are compliant with AAPCS64 [1]. That part is okay. bpf_probe_read accesses the BPF program stack, which is based on BPF_REG_FP. This exposes an issue with how BPF_REG_FP was setup, as Yang pointed out. Instead of having BPF_REG_FP point to top of stack, we erroneously point it to the bottom of stack. When there are function calls, we run the risk of clobbering of BPF stack. Bad idea. Yes, exactly. Otherwise, since BPF_REG_FP is read-only, and is setup exactly once in prologue, it remains consistent throughout lifetime of the BPF program. Yang, can you please try the following? It should work without the below change: + emit(A64_MOV(1, A64_FP, A64_SP), ctx); I added it to stay align with ARMv8 AAPCS to maintain the correct FP during function call. It makes us get correct stack backtrace. I think we'd better to keep compliant with ARMv8 AAPCS in BPF JIT prologue too. If nobody thinks it is necessary, we definitely could remove that change. Thanks, Yang 8<- --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -161,12 +161,12 @@ static void build_prologue(struct jit_ctx *ctx) if (ctx->tmp_used) emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx); - /* Set up BPF stack */ - emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); - /* Set up frame pointer */ emit(A64_MOV(1, fp, A64_SP), ctx); + /* Set up BPF stack */ + emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); + /* Clear registers A and X */ emit_a64_mov_i64(ra, 0, ctx); emit_a64_mov_i64(rx, 0, ctx); ->8 [1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: bpf: fix JIT stack setup
Please ignore this one, forgot to cc to linux-arm-kernel list. Sorry for the inconvenience. Yang On 11/6/2015 9:34 PM, Yang Shi wrote: ARM64 JIT used FP (x29) as eBPF fp register, but FP is subjected to change during function call so it may cause the BPF prog stack base address change too. Whenever, it pointed to the bottom of BPF prog stack instead of the top. So, when copying data via bpf_probe_read, it will be copied to (SP - offset), then it may overwrite the saved FP/LR. Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee saved register, so it will keep intact during function call. It is initialized in BPF prog prologue when BPF prog is started to run everytime. When BPF prog exits, it could be just tossed. Other than this the BPf prog stack base need to be setup before function call stack. So, the BPF stack layout looks like: high original A64_SP => 0:+-+ BPF prologue | | FP/LR and callee saved registers BPF fp register => +64:+-+ | | | ... | BPF prog stack | | | | current A64_SP => +-+ | | | ... | Function call stack | | +-+ low Signed-off-by: Yang Shi CC: Zi Shen Lim CC: Xi Wang --- arch/arm64/net/bpf_jit_comp.c | 38 +++--- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index a44e529..6809647 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -50,7 +50,7 @@ static const int bpf2a64[] = { [BPF_REG_8] = A64_R(21), [BPF_REG_9] = A64_R(22), /* read-only frame pointer to access stack */ - [BPF_REG_FP] = A64_FP, + [BPF_REG_FP] = A64_R(25), /* temporary register for internal BPF JIT */ [TMP_REG_1] = A64_R(23), [TMP_REG_2] = A64_R(24), @@ -155,18 +155,42 @@ static void build_prologue(struct jit_ctx *ctx) stack_size += 4; /* extra for skb_copy_bits buffer */ stack_size = STACK_ALIGN(stack_size); + /* +* BPF prog stack layout +* +* high +* original A64_SP => 0:+-+ BPF prologue +*| | FP/LR and callee saved registers +* BPF fp register => +64:+-+ +*| | + *| ... | BPF prog stack +*| | +*| | +* current A64_SP => +-+ +*| | +*| ... | Function call stack +*| | +*+-+ +* low +* +*/ + + /* Save FP and LR registers to stay align with ARM64 AAPCS */ + emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); + /* Save callee-saved register */ emit(A64_PUSH(r6, r7, A64_SP), ctx); emit(A64_PUSH(r8, r9, A64_SP), ctx); if (ctx->tmp_used) emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx); - /* Set up BPF stack */ - emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); - - /* Set up frame pointer */ + /* Set up BPF prog stack base register (x25) */ emit(A64_MOV(1, fp, A64_SP), ctx); + /* Set up function call stack */ + emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); + emit(A64_MOV(1, A64_FP, A64_SP), ctx); + /* Clear registers A and X */ emit_a64_mov_i64(ra, 0, ctx); emit_a64_mov_i64(rx, 0, ctx); @@ -196,8 +220,8 @@ static void build_epilogue(struct jit_ctx *ctx) emit(A64_POP(r8, r9, A64_SP), ctx); emit(A64_POP(r6, r7, A64_SP), ctx); - /* Restore frame pointer */ - emit(A64_MOV(1, fp, A64_SP), ctx); + /* Restore FP/LR registers */ + emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx); /* Set return value */ emit(A64_MOV(1, A64_R(0), r0), ctx); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bpf: convert hashtab lock to raw lock
On 11/2/2015 9:24 AM, Steven Rostedt wrote: On Mon, 02 Nov 2015 09:12:29 -0800 "Shi, Yang" wrote: Yes, it is common practice for converting sleepable spin lock to raw spin lock in -rt to avoid scheduling in atomic context bug. Note, in a lot of cases we don't just convert spin_locks to raw because of atomic context. There's times we need to change the design where the lock is not taken in atomic context (switching preempt_disable() to a local_lock() for example). Yes, definitely. Understood. Thanks, Yang But bpf is much like ftrace and kprobes where they can be taken almost anywhere, and the do indeed need to be raw. -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bpf: convert hashtab lock to raw lock
On 10/31/2015 11:37 AM, Daniel Borkmann wrote: On 10/31/2015 02:47 PM, Steven Rostedt wrote: On Fri, 30 Oct 2015 17:03:58 -0700 Alexei Starovoitov wrote: On Fri, Oct 30, 2015 at 03:16:26PM -0700, Yang Shi wrote: When running bpf samples on rt kernel, it reports the below warning: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping Preemption disabled at:[] kprobe_perf_func+0x30/0x228 ... diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 83c209d..972b76b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -17,7 +17,7 @@ struct bpf_htab { struct bpf_map map; struct hlist_head *buckets; -spinlock_t lock; +raw_spinlock_t lock; How do we address such things in general? I bet there are tons of places around the kernel that call spin_lock from atomic. I'd hate to lose the benefits of lockdep of non-raw spin_lock just to make rt happy. You wont lose any benefits of lockdep. Lockdep still checks raw_spin_lock(). The only difference between raw_spin_lock and spin_lock is that in -rt spin_lock turns into an rt_mutex() and raw_spin_lock stays a spin lock. ( Btw, Yang, would have been nice if your commit description would have already included such info, not only that you convert it, but also why it's okay to do so. ) I think Thomas's document will include all the information about rt spin lock/raw spin lock, etc. Alexei & Daniel, If you think such info is necessary, I definitely could add it into the commit log in v2. The error is that in -rt, you called a mutex and not a spin lock while atomic. You are right, I think this happens due to the preempt_disable() in the trace_call_bpf() handler. So, I think the patch seems okay. The dep_map is btw union'ed in the struct spinlock case to the same offset of the dep_map from raw_spinlock. It's a bit inconvenient, though, when we add other library code as maps in future, f.e. things like rhashtable as they would first need to be converted to raw_spinlock_t as well, but judging from the git log, it looks like common practice. Yes, it is common practice for converting sleepable spin lock to raw spin lock in -rt to avoid scheduling in atomic context bug. Thanks, Yang Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bpf: convert hashtab lock to raw lock
On 11/2/2015 12:59 AM, Thomas Gleixner wrote: On Sun, 1 Nov 2015, Alexei Starovoitov wrote: On Sat, Oct 31, 2015 at 09:47:36AM -0400, Steven Rostedt wrote: On Fri, 30 Oct 2015 17:03:58 -0700 Alexei Starovoitov wrote: On Fri, Oct 30, 2015 at 03:16:26PM -0700, Yang Shi wrote: When running bpf samples on rt kernel, it reports the below warning: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping Preemption disabled at:[] kprobe_perf_func+0x30/0x228 ... diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 83c209d..972b76b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -17,7 +17,7 @@ struct bpf_htab { struct bpf_map map; struct hlist_head *buckets; - spinlock_t lock; + raw_spinlock_t lock; How do we address such things in general? I bet there are tons of places around the kernel that call spin_lock from atomic. I'd hate to lose the benefits of lockdep of non-raw spin_lock just to make rt happy. You wont lose any benefits of lockdep. Lockdep still checks raw_spin_lock(). The only difference between raw_spin_lock and spin_lock is that in -rt spin_lock turns into an rt_mutex() and raw_spin_lock stays a spin lock. I see. The patch makes sense then. Would be good to document this peculiarity of spin_lock. I'm working on a document. Thanks Steven and Thomas for your elaboration and comment. Yang Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html