Re: [PATCH 1/2] MAINTAINERS: Update email address of Naveen
On 7/17/24 11:43 PM, Masami Hiramatsu (Google) wrote: On Wed, 17 Jul 2024 13:58:35 +1000 Michael Ellerman wrote: Masami Hiramatsu (Google) writes: On Sun, 14 Jul 2024 14:04:23 +0530 Naveen N Rao wrote: I have switched to using my @kernel.org id for my contributions. Update MAINTAINERS and mailmap to reflect the same. Looks good to me. Reviewed-by: Masami Hiramatsu (Google) Would powerpc maintainer pick this? Yeah I can take both. Thank you for pick them up! Looks like patchbot did not send a reply, but I already took them to bpf tree. Thanks, Daniel
Re: [PATCH] arch: powerpc: net: bpf_jit_comp32.c: Fixed 'instead' typo
On 10/13/23 7:31 AM, Muhammad Muzammil wrote: Fixed 'instead' typo Signed-off-by: Muhammad Muzammil Michael, I presume you'll pick it up? Thanks, Daniel --- arch/powerpc/net/bpf_jit_comp32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 7f91ea064c08..bc7f92ec7f2d 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -940,7 +940,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * * !fp->aux->verifier_zext. Emit NOP otherwise. * * Note that "li reg_h,0" is emitted for BPF_B/H/W case, -* if necessary. So, jump there insted of emitting an +* if necessary. So, jump there instead of emitting an * additional "li reg_h,0" instruction. */ if (size == BPF_DW && !fp->aux->verifier_zext)
Re: [PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator
On 10/12/23 10:03 PM, Hari Bathini wrote: Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this may also add significant pressure on instruction TLB. High iTLB pressure usually slows down the whole system causing visible performance degradation for production workloads. bpf_prog_pack, a customized allocator that packs multiple bpf programs into preallocated memory chunks, was proposed [1] to address it. This series extends this support on powerpc. Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions, needed for this support depend on instruction patching in text area. Currently, patch_instruction() supports patching only one instruction at a time. The first patch introduces patch_instructions() function to enable patching more than one instruction at a time. This helps in avoiding performance degradation while JITing bpf programs. Patches 2 & 3 implement the above mentioned arch specific functions using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing code. The last patch enables the use of BPF prog pack allocator on powerpc and also, ensures cleanup is handled gracefully. [1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/ Changes in v6: * No changes in patches 2-5/5 except addition of Acked-by tags from Song. * Skipped merging code path of patch_instruction() & patch_instructions() to avoid performance overhead observed on ppc32 with that. I presume this will be routed via Michael? Thanks, Daniel
Re: [bpf-next v1] bpf: drop deprecated bpf_jit_enable == 2
On 1/3/23 2:24 PM, x...@infragraf.org wrote: From: Tonghao Zhang The x86_64 can't dump the valid insn in this way. A test BPF prog which include subprog: $ llvm-objdump -d subprog.o Disassembly of section .text: : 0: 18 01 00 00 73 75 62 70 00 00 00 00 72 6f 67 00 r1 = 29114459903653235 ll 2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1 3: bf a1 00 00 00 00 00 00 r1 = r10 4: 07 01 00 00 f8 ff ff ff r1 += -8 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 85 00 00 00 06 00 00 00 call 6 7: 95 00 00 00 00 00 00 00 exit Disassembly of section raw_tp/sys_enter: : 0: 85 10 00 00 ff ff ff ff call -1 1: b7 00 00 00 00 00 00 00 r0 = 0 2: 95 00 00 00 00 00 00 00 exit kernel print message: [ 580.775387] flen=8 proglen=51 pass=3 image=a000c20c from=kprobe-load pid=1643 [ 580.777236] JIT code: : cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 580.779037] JIT code: 0010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 580.780767] JIT code: 0020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 580.782568] JIT code: 0030: cc cc cc $ bpf_jit_disasm 51 bytes emitted from JIT compiler (pass:3, flen:8) a000c20c + : 0: int3 1: int3 2: int3 3: int3 4: int3 5: int3 ... Until bpf_jit_binary_pack_finalize is invoked, we copy rw_header to header and then image/insn is valid. BTW, we can use the "bpftool prog dump" JITed instructions. * clean up the doc * remove bpf_jit_disasm tool * set bpf_jit_enable only 0 or 1. Signed-off-by: Tonghao Zhang Suggested-by: Alexei Starovoitov Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: John Fastabend Cc: KP Singh Cc: Stanislav Fomichev Cc: Hao Luo Cc: Jiri Olsa Cc: Hou Tao --- Documentation/admin-guide/sysctl/net.rst | 2 +- Documentation/networking/filter.rst | 98 +-- arch/arm/net/bpf_jit_32.c| 4 - arch/arm64/net/bpf_jit_comp.c| 4 - arch/loongarch/net/bpf_jit.c | 4 - arch/mips/net/bpf_jit_comp.c | 3 - arch/powerpc/net/bpf_jit_comp.c | 11 - arch/riscv/net/bpf_jit_core.c| 3 - arch/s390/net/bpf_jit_comp.c | 4 - arch/sparc/net/bpf_jit_comp_32.c | 3 - arch/sparc/net/bpf_jit_comp_64.c | 13 - arch/x86/net/bpf_jit_comp.c | 3 - arch/x86/net/bpf_jit_comp32.c| 3 - net/core/sysctl_net_core.c | 14 +- tools/bpf/.gitignore | 1 - tools/bpf/Makefile | 10 +- tools/bpf/bpf_jit_disasm.c | 332 --- 17 files changed, 8 insertions(+), 504 deletions(-) delete mode 100644 tools/bpf/bpf_jit_disasm.c diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 6394f5dc2303..45d3d965276c 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -87,7 +87,7 @@ Values: - 0 - disable the JIT (default value) - 1 - enable the JIT - - 2 - enable the JIT and ask the compiler to emit traces on kernel log. + - 2 - deprecated in linux 6.2 I'd make it more clear in the docs and reword it as follows (also, it'll be in 6.3, not 6.2): - 2 - enable the JIT and ask the compiler to emit traces on kernel log. (deprecated since v6.3, use ``bpftool prog dump jited id `` instead) bpf_jit_harden -- [...] diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 5b1ce656baa1..731a2eb0f68e 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -275,16 +275,8 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write, tmp.data = _enable; ret = proc_dointvec_minmax(, write, buffer, lenp, ppos); - if (write && !ret) { - if (jit_enable < 2 || - (jit_enable == 2 && bpf_dump_raw_ok(current_cred( { - *(int *)table->data = jit_enable; - if (jit_enable == 2) - pr_warn("bpf_jit_enable = 2 was set! NEVER use this in production, only for JIT debugging!\n"); - } else { - ret = -EPERM; - } - } + if (write && !ret) + *(int *)table->data = jit_enable; if (write && ret && min == max) pr_info_once("CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1.\n"); @@ -395,7 +387,7 @@ static struct ctl_table net_core_table[] = { .extra2 = SYSCTL_ONE, # else
Re: [PATCH 0/5] Atomics support for eBPF on powerpc
On 5/12/22 9:45 AM, Hari Bathini wrote: This patchset adds atomic operations to the eBPF instruction set on powerpc. The instructions that are added here can be summarised with this list of kernel operations for ppc64: * atomic[64]_[fetch_]add * atomic[64]_[fetch_]and * atomic[64]_[fetch_]or * atomic[64]_[fetch_]xor * atomic[64]_xchg * atomic[64]_cmpxchg and this list of kernel operations for ppc32: * atomic_[fetch_]add * atomic_[fetch_]and * atomic_[fetch_]or * atomic_[fetch_]xor * atomic_xchg * atomic_cmpxchg The following are left out of scope for this effort: * 64 bit operations on ppc32. * Explicit memory barriers, 16 and 8 bit operations on both ppc32 & ppc64. The first patch adds support for bitwsie atomic operations on ppc64. The next patch adds fetch variant support for these instructions. The third patch adds support for xchg and cmpxchg atomic operations on ppc64. Patch #4 adds support for 32-bit atomic bitwise operations on ppc32. patch #5 adds support for xchg and cmpxchg atomic operations on ppc32. Thanks for adding these, Hari! I presume they'll get routed via Michael, right? One thing that may be worth adding to the commit log as well is the test result from test_bpf.ko given it has an extensive suite around atomics useful for testing corner cases in JITs. Hari Bathini (5): bpf ppc64: add support for BPF_ATOMIC bitwise operations bpf ppc64: add support for atomic fetch operations bpf ppc64: Add instructions for atomic_[cmp]xchg bpf ppc32: add support for BPF_ATOMIC bitwise operations bpf ppc32: Add instructions for atomic_[cmp]xchg arch/powerpc/net/bpf_jit_comp32.c | 62 +- arch/powerpc/net/bpf_jit_comp64.c | 87 +-- 2 files changed, 108 insertions(+), 41 deletions(-)
Re: [PATCH 01/13] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
On 1/6/22 12:45 PM, Naveen N. Rao wrote: task_pt_regs() can return NULL on powerpc for kernel threads. This is then used in __bpf_get_stack() to check for user mode, resulting in a kernel oops. Guard against this by checking return value of task_pt_regs() before trying to obtain the call chain. Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()") Cc: sta...@vger.kernel.org # v5.9+ Signed-off-by: Naveen N. Rao Acked-by: Daniel Borkmann
Re: [PATCH] powerpc/bpf: fix write protecting JIT code
On 10/25/21 8:15 AM, Naveen N. Rao wrote: Hari Bathini wrote: Running program with bpf-to-bpf function calls results in data access exception (0x300) with the below call trace: [c0113f28] bpf_int_jit_compile+0x238/0x750 (unreliable) [c037d2f8] bpf_check+0x2008/0x2710 [c0360050] bpf_prog_load+0xb00/0x13a0 [c0361d94] __sys_bpf+0x6f4/0x27c0 [c0363f0c] sys_bpf+0x2c/0x40 [c0032434] system_call_exception+0x164/0x330 [c000c1e8] system_call_vectored_common+0xe8/0x278 as bpf_int_jit_compile() tries writing to write protected JIT code location during the extra pass. Fix it by holding off write protection of JIT code until the extra pass, where branch target addresses fixup happens. Cc: sta...@vger.kernel.org Fixes: 62e3d4210ac9 ("powerpc/bpf: Write protect JIT code") Signed-off-by: Hari Bathini --- arch/powerpc/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks for the fix! Reviewed-by: Naveen N. Rao LGTM, I presume this fix will be routed via Michael. BPF selftests have plenty of BPF-to-BPF calls in there, too bad this was caught so late. :/
Re: [PATCH v4 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler
On 10/4/21 12:49 AM, Michael Ellerman wrote: Daniel Borkmann writes: On 9/29/21 1:18 PM, Hari Bathini wrote: Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT compiler code with the aim to simplify adding BPF_PROBE_MEM support. Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64 & PPC32 JIT compilers respectively. Patch #6 & #8 handle bad userspace pointers for PPC64 & PPC32 cases respectively. Michael, are you planning to pick up the series or shall we route via bpf-next? Yeah I'll plan to take it, unless you think there is a strong reason it needs to go via the bpf tree (doesn't look like it from the diffstat). Sounds good to me, in that case, please also route the recent JIT fixes from Naveen through your tree. Thanks, Daniel
Re: [PATCH v4 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler
On 9/29/21 1:18 PM, Hari Bathini wrote: Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT compiler code with the aim to simplify adding BPF_PROBE_MEM support. Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64 & PPC32 JIT compilers respectively. Patch #6 & #8 handle bad userspace pointers for PPC64 & PPC32 cases respectively. Michael, are you planning to pick up the series or shall we route via bpf-next? Thanks, Daniel Changes in v4: * Addressed all the review comments from Christophe. Hari Bathini (4): bpf powerpc: refactor JIT compiler code powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro bpf ppc32: Add BPF_PROBE_MEM support for JIT bpf ppc32: Access only if addr is kernel address Ravi Bangoria (4): bpf powerpc: Remove unused SEEN_STACK bpf powerpc: Remove extra_pass from bpf_jit_build_body() bpf ppc64: Add BPF_PROBE_MEM support for JIT bpf ppc64: Access only if addr is kernel address arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/net/bpf_jit.h| 19 +++-- arch/powerpc/net/bpf_jit_comp.c | 72 -- arch/powerpc/net/bpf_jit_comp32.c | 101 ++ arch/powerpc/net/bpf_jit_comp64.c | 72 ++ 5 files changed, 224 insertions(+), 42 deletions(-)
Re: [PATCH bpf-next v2] bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
On 9/11/21 3:56 AM, Tiezhu Yang wrote: In the current code, the actual max tail call count is 33 which is greater than MAX_TAIL_CALL_CNT (defined as 32), the actual limit is not consistent with the meaning of MAX_TAIL_CALL_CNT, there is some confusion and need to spend some time to think about the reason at the first glance. We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") and commit f9dabe016b63 ("bpf: Undo off-by-one in interpreter tail call count limit"). In order to avoid changing existing behavior, the actual limit is 33 now, this is reasonable. After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can see there exists failed testcase. On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set: # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:0 ret 34 != 33 FAIL On some archs: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:1 ret 34 != 33 FAIL So it is necessary to change the value of MAX_TAIL_CALL_CNT from 32 to 33, then do some small changes of the related code. With this patch, it does not change the current limit 33, MAX_TAIL_CALL_CNT can reflect the actual max tail call count, the tailcall selftests can work well, and also the above failed testcase in test_bpf can be fixed for the interpreter (all archs) and the JIT (all archs except for x86). # uname -m x86_64 # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:1 ret 33 != 34 FAIL Could you also state in here which archs you have tested with this change? I presume /every/ arch which has a JIT? Signed-off-by: Tiezhu Yang --- v2: -- fix the typos in the commit message and update the commit message. -- fix the failed tailcall selftests for x86 jit. I am not quite sure the change on x86 is proper, with this change, tailcall selftests passed, but tailcall limit test in test_bpf.ko failed, I do not know the reason now, I think this is another issue, maybe someone more versed in x86 jit could take a look. There should be a series from Johan coming today with regards to test_bpf.ko that will fix the "tail call error path, max count reached" test which had an assumption in that R0 would always be valid for the fall-through and could be passed to the bpf_exit insn whereas it is not guaranteed and verifier, for example, forbids a subsequent access to R0 w/o reinit. For your testing, I would suggested to recheck once this series is out. arch/arm/net/bpf_jit_32.c | 11 ++- arch/arm64/net/bpf_jit_comp.c | 7 --- arch/mips/net/ebpf_jit.c | 4 ++-- arch/powerpc/net/bpf_jit_comp32.c | 4 ++-- arch/powerpc/net/bpf_jit_comp64.c | 12 ++-- arch/riscv/net/bpf_jit_comp32.c | 4 ++-- arch/riscv/net/bpf_jit_comp64.c | 4 ++-- arch/sparc/net/bpf_jit_comp_64.c | 8 arch/x86/net/bpf_jit_comp.c | 10 +- include/linux/bpf.h | 2 +- kernel/bpf/core.c | 4 ++-- 11 files changed, 36 insertions(+), 34 deletions(-) [...] /* prog = array->ptrs[index] diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 41c23f4..5d6c843 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -286,14 +286,15 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx) emit(A64_CMP(0, r3, tmp), ctx); emit(A64_B_(A64_COND_CS, jmp_offset), ctx); - /* if (tail_call_cnt > MAX_TAIL_CALL_CNT) -* goto out; + /* * tail_call_cnt++; +* if (tail_call_cnt > MAX_TAIL_CALL_CNT) +* goto out; */ + emit(A64_ADD_I(1, tcc, tcc, 1), ctx); emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx); emit(A64_CMP(1, tcc, tmp), ctx); emit(A64_B_(A64_COND_HI, jmp_offset), ctx); - emit(A64_ADD_I(1, tcc, tcc, 1), ctx); /* prog = array->ptrs[index]; * if (prog == NULL) [...] diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 0fe6aac..74a9e61 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -402,7 +402,7 @@ static int get_pop_bytes(bool *callee_regs_used) * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ... * if (index >= array->map.max_entries) * goto out; - * if (++tail_call_cnt > MAX_TAIL_CALL_CNT) + * if (tail_call_cnt++ == MAX_TAIL_CALL_CNT) Why such inconsistency to e.g. above with arm64 case but also compared to x86 32 bit which uses JAE? If so, we should cleanly follow the reference implementation (== interpreter) _everywhere_ and _not_ introduce additional variants/implementations across JITs. * goto out; * prog =
Re: [PATCH bpf-next] bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
On 9/9/21 7:50 AM, Andrii Nakryiko wrote: On Wed, Sep 8, 2021 at 8:33 PM Tiezhu Yang wrote: In the current code, the actual max tail call count is 33 which is greater than MAX_TAIL_CALL_CNT (defined as 32), the actual limit is not consistent with the meaning of MAX_TAIL_CALL_CNT, there is some confusion and need to spend some time to think the reason at the first glance. think *about* the reason We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") and commit f9dabe016b63 ("bpf: Undo off-by-one in interpreter tail call count limit"). In order to avoid changing existing behavior, the actual limit is 33 now, this is resonable. typo: reasonable After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can see there exists failed testcase. On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set: # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:0 ret 34 != 33 FAIL On some archs: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg | grep -w FAIL Tail call error path, max count reached jited:1 ret 34 != 33 FAIL So it is necessary to change the value of MAX_TAIL_CALL_CNT from 32 to 33, then do some small changes of the related code. With this patch, it does not change the current limit, MAX_TAIL_CALL_CNT can reflect the actual max tail call count, and the above failed testcase can be fixed. Signed-off-by: Tiezhu Yang --- This change breaks selftests ([0]), please fix them at the same time as you are changing the kernel behavior: The below selftests shouldn't have to change given there is no change in behavior intended (MAX_TAIL_CALL_CNT is bumped to 33 but counter inc'ed prior to the comparison). It just means that /all/ JITs must be changed and in particular properly _tested_. test_tailcall_2:PASS:tailcall 128 nsec test_tailcall_2:PASS:tailcall 128 nsec test_tailcall_2:FAIL:tailcall err 0 errno 2 retval 4 #135/2 tailcalls/tailcall_2:FAIL test_tailcall_3:PASS:tailcall 128 nsec test_tailcall_3:FAIL:tailcall count err 0 errno 2 count 34 test_tailcall_3:PASS:tailcall 128 nsec #135/3 tailcalls/tailcall_3:FAIL #135/4 tailcalls/tailcall_4:OK #135/5 tailcalls/tailcall_5:OK #135/6 tailcalls/tailcall_bpf2bpf_1:OK test_tailcall_bpf2bpf_2:PASS:tailcall 128 nsec test_tailcall_bpf2bpf_2:FAIL:tailcall count err 0 errno 2 count 34 test_tailcall_bpf2bpf_2:PASS:tailcall 128 nsec #135/7 tailcalls/tailcall_bpf2bpf_2:FAIL #135/8 tailcalls/tailcall_bpf2bpf_3:OK test_tailcall_bpf2bpf_4:PASS:tailcall 54 nsec test_tailcall_bpf2bpf_4:FAIL:tailcall count err 0 errno 2 count 32 #135/9 tailcalls/tailcall_bpf2bpf_4:FAIL test_tailcall_bpf2bpf_4:PASS:tailcall 54 nsec test_tailcall_bpf2bpf_4:FAIL:tailcall count err 0 errno 2 count 32 #135/10 tailcalls/tailcall_bpf2bpf_5:FAIL #135 tailcalls:FAIL [0] https://github.com/kernel-patches/bpf/pull/1747/checks?check_run_id=3552002906 arch/arm/net/bpf_jit_32.c | 11 ++- arch/arm64/net/bpf_jit_comp.c | 7 --- arch/mips/net/ebpf_jit.c | 4 ++-- arch/powerpc/net/bpf_jit_comp32.c | 4 ++-- arch/powerpc/net/bpf_jit_comp64.c | 12 ++-- arch/riscv/net/bpf_jit_comp32.c | 4 ++-- arch/riscv/net/bpf_jit_comp64.c | 4 ++-- arch/sparc/net/bpf_jit_comp_64.c | 8 include/linux/bpf.h | 2 +- kernel/bpf/core.c | 4 ++-- 10 files changed, 31 insertions(+), 29 deletions(-) [...]
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 6/4/21 6:50 AM, Paul Moore wrote: On Thu, Jun 3, 2021 at 2:53 PM Daniel Borkmann wrote: [...] I did run this entire discussion by both of the other BPF co-maintainers (Alexei, Andrii, CC'ed) and together we did further brainstorming on the matter on how we could solve this, but couldn't find a sensible & clean solution so far. Before I jump into the patch below I just want to say that I appreciate you looking into solutions on the BPF side of things. However, I voted "no" on this patch previously and since you haven't really changed it, my "no"/NACK vote remains, at least until we exhaust a few more options. Just to set the record straight, you previously did neither ACK nor NACK it. And again, as summarized earlier, this patch is _fixing_ the majority of the damage caused by 59438b46471a for at least the BPF side of things where users run into this, Ondrej the rest. Everything else can be discussed on top, and so far it seems there is no clean solution in front of us either, not even speaking of one that is small and suitable for _stable_ trees. Let me reiterate where we are: it's not that the original implementation in 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") was broken, it's that the later added _SELinux_ locked_down hook implementation that is broken, and not other LSMs. Now you're trying to retrofittingly ask us for hacks at all costs just because of /a/ broken LSM implementation. Maybe another avenue is to just swallow the pill and revert 59438b46471a since it made assumptions that don't work /generally/. And the use case for an application runtime policy change in particular in case of lockdown where users only have 3 choices is extremely tiny/rare, if it's not then something is very wrong in your deployment. Let me ask you this ... are you also planning to address *all* the other cases inside the kernel? If your answer is no, then this entire discussion is pointless. [PATCH] bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() ->| try_to_wake_up() -> | rq_lock(rq) --+ Since BPF is a bit of chaotic nightmare in the sense that it basically out-of-tree kernel code that can be called from anywhere and do pretty much anything; it presents quite the challenge for those of us worried about LSM access controls. There is no need to generalize ... for those worried, BPF subsystem has LSM access controls for the syscall since 2017 via afdb09c720b6 ("security: bpf: Add LSM hooks for bpf object related syscall"). [...] So let's look at this from a different angle. Let's look at the two problems you mention above. If we start with the runqueue deadlock we see the main problem is that audit_log_end() pokes the kauditd_wait waitqueue to ensure the kauditd_thread thread wakes up and processes the audit queue. The audit_log_start() function does something similar, but it is conditional on a number of factors and isn't as likely to be hit. If we relocate these kauditd wakeup calls we can remove the deadlock in trace_sched_switch(). In the case of CONFIG_AUDITSYSCALL=y we can probably just move the wakeup to __audit_syscall_exit() and in the case of CONFIG_AUDITSYSCALL=n we can likely just change the wait_event_freezable() call in kauditd_thread to a wait_event_freezable_timeout() call with a HZ timeout (the audit stream will be much less on these systems anyway so a queue overflow is much less likely). I'm building a kernel with these changes now, I should have something to test when I wake up tomorrow mornin
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek Reported-by: Jakub Hrozek Reported-by: Serhei Makarov Reported-by: Jiri Olsa Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Tested-by: Jiri Olsa Cc: Paul Moore Cc: James Morris Cc: Jerome Marchand Cc: Frank Eigler Cc: Linus Torvalds Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f84...@iogearbox.net --- kernel/bpf/helpers.c | 7 +-- kernel/trace/bpf_trace.c | 32 2 files changed, 17 insertions(+), 22 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 73443498d88f..a2f1f15ce432 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "../../lib/kstrtox.h" @@ -1069,11 +1070,13 @@ bpf_base_func_proto(enum bpf_func_id func_id) case BPF_FUNC_probe_read_user: return _probe_read_user_proto; case BPF_FUNC_probe_read_kernel: - return _probe_read_kernel_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_kernel_proto; case BPF_FUNC_probe_read_user_str: return _probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: - return _probe_read_kernel_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_kernel_str_proto; case BPF_FUNC_snprintf_btf: return _snprintf_btf_proto; case BPF_FUNC_snprintf: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index d2d7cf6cfe83..7a52bc172841 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -215,16 +215,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = { static __always_inline int bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) { - int ret = security_locked_down(LOCKDOWN_BPF_READ); + int ret; - if (unlikely(ret < 0)) - goto fail; ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); if (unlikely(ret < 0)) - goto fail; - return ret; -fail: - memset(dst, 0, size); + memset(dst, 0, size); return ret; } @@ -246,10 +241,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = { static __always_inline int bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr) { - int ret = security_locked_down(LOCKDOWN_BPF_READ); - - if (unlikely(ret < 0)) - goto fail; + int ret; /* * The strncpy_from_kernel_nofault() call will likely not fill the @@ -262,11 +254,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr) */ ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size); if (unlikely(ret < 0)) - goto fail; - - return ret; -fail: - memset(dst, 0, size); + memset(dst, 0, size); return ret; } @@ -1011,16 +999,20 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_probe_read_user: return _probe_read_user_proto; case BPF_FUNC_probe_read_kernel: - return _probe_read_kernel_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_kernel_proto; case BPF_FUNC_probe_read_user_str: return _probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: - return _probe_read_kernel_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_kernel_str_proto; #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE case BPF_FUNC_probe_read: - return _probe_read_compat_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_compat_proto; case BPF_FUNC_probe_read_str: - return _probe_read_compat_str_proto; + return security_locked_down(LOCKDOWN_BPF_READ) < 0 ? + NULL : _probe_read_compat_str_proto; #endif #ifdef CONFIG_CGROUPS case BPF_FUNC_get_current_cgroup_id: -- 2.21.0
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 6/1/21 10:47 PM, Paul Moore wrote: On Mon, May 31, 2021 at 4:24 AM Daniel Borkmann wrote: On 5/29/21 8:48 PM, Paul Moore wrote: [...] Daniel's patch side steps that worry by just doing the lockdown permission check when the BPF program is loaded, but that isn't a great solution if the policy changes afterward. I was hoping there might be some way to perform the permission check as needed, but the more I look the more that appears to be difficult, if not impossible (once again, corrections are welcome). Your observation is correct, will try to clarify below a bit. I'm now wondering if the right solution here is to make use of the LSM notifier mechanism. I'm not yet entirely sure if this would work from a BPF perspective, but I could envision the BPF subsystem registering a LSM notification callback via register_blocking_lsm_notifier(), see if Infiniband code as an example, and then when the LSM(s) policy changes the BPF subsystem would get a notification and it could revalidate the existing BPF programs and take block/remove/whatever the offending BPF programs. This obviously requires a few things which I'm not sure are easily done, or even possible: 1. Somehow the BPF programs would need to be "marked" at load/verification time with respect to their lockdown requirements so that decisions can be made later. Perhaps a flag in bpf_prog_aux? 2. While it looks like it should be possible to iterate over all of the loaded BPF programs in the LSM notifier callback via idr_for_each(prog_idr, ...), it is not clear to me if it is possible to safely remove, or somehow disable, BPF programs once they have been loaded. Hopefully the BPF folks can help answer that question. 3. Disabling of BPF programs might be preferable to removing them entirely on LSM policy changes as it would be possible to make the lockdown state less restrictive at a future point in time, allowing for the BPF program to be executed again. Once again, not sure if this is even possible. Part of why this gets really complex/impossible is that BPF programs in the kernel are reference counted from various sides, be it that there are references from user space to them (fd from application, BPF fs, or BPF links), hooks where they are attached to as well as tail call maps where one BPF prog calls into another. There is currently also no global infra of some sort where you could piggy back to atomically keep track of all the references in a list or such. And the other thing is that BPF progs have no ownership that is tied to a specific task after they have been loaded. Meaning, once they are loaded into the kernel by an application and attached to a specific hook, they can remain there potentially until reboot of the node, so lifecycle of the user space application != lifecycle of the BPF program. I don't think the disjoint lifecycle or lack of task ownership is a deal breaker from a LSM perspective as the LSMs can stash whatever info they need in the security pointer during the program allocation hook, e.g. selinux_bpf_prog_alloc() saves the security domain which allocates/loads the BPF program. The thing I'm worried about would be the case where a LSM policy change requires that an existing BPF program be removed or disabled. I'm guessing based on the refcounting that there is not presently a clean way to remove a BPF program from the system, but is this something we could resolve? If we can't safely remove a BPF program from the system, can we replace/swap it with an empty/NULL BPF program? Removing progs would somehow mean destroying those references from an async event and then /safely/ guaranteeing that nothing is accessing them anymore. But then if policy changes once more where they would be allowed again we would need to revert back to the original state, which brings us to your replace/swap question with an empty/null prog. It's not feasible either, because there are different BPF program types and they can have different return code semantics that lead to subsequent actions. If we were to replace them with an empty/NULL program, then essentially this will get us into an undefined system state given it's unclear what should be a default policy for each program type, etc. Just to pick one simple example, outside of tracing, that comes to mind: say, you attached a program with tc to a given device ingress hook. That program implements firewalling functionality, and potentially deep down in that program there is functionality to record/sample packets along with some meta data. Part of what is exported to the ring buffer to the user space reader may be a struct net_device field that is otherwise not available (or at least not yet), hence it's probe-read with mentioned helpers. If you were now to change the SELinux policy for that tc loader application, and therefore replace/swap the progs in the kernel that were loaded with it (given tc's lockdown policy was recorded in their sec blob) with an empty/NU
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/29/21 8:48 PM, Paul Moore wrote: [...] Daniel's patch side steps that worry by just doing the lockdown permission check when the BPF program is loaded, but that isn't a great solution if the policy changes afterward. I was hoping there might be some way to perform the permission check as needed, but the more I look the more that appears to be difficult, if not impossible (once again, corrections are welcome). Your observation is correct, will try to clarify below a bit. I'm now wondering if the right solution here is to make use of the LSM notifier mechanism. I'm not yet entirely sure if this would work from a BPF perspective, but I could envision the BPF subsystem registering a LSM notification callback via register_blocking_lsm_notifier(), see if Infiniband code as an example, and then when the LSM(s) policy changes the BPF subsystem would get a notification and it could revalidate the existing BPF programs and take block/remove/whatever the offending BPF programs. This obviously requires a few things which I'm not sure are easily done, or even possible: 1. Somehow the BPF programs would need to be "marked" at load/verification time with respect to their lockdown requirements so that decisions can be made later. Perhaps a flag in bpf_prog_aux? 2. While it looks like it should be possible to iterate over all of the loaded BPF programs in the LSM notifier callback via idr_for_each(prog_idr, ...), it is not clear to me if it is possible to safely remove, or somehow disable, BPF programs once they have been loaded. Hopefully the BPF folks can help answer that question. 3. Disabling of BPF programs might be preferable to removing them entirely on LSM policy changes as it would be possible to make the lockdown state less restrictive at a future point in time, allowing for the BPF program to be executed again. Once again, not sure if this is even possible. Part of why this gets really complex/impossible is that BPF programs in the kernel are reference counted from various sides, be it that there are references from user space to them (fd from application, BPF fs, or BPF links), hooks where they are attached to as well as tail call maps where one BPF prog calls into another. There is currently also no global infra of some sort where you could piggy back to atomically keep track of all the references in a list or such. And the other thing is that BPF progs have no ownership that is tied to a specific task after they have been loaded. Meaning, once they are loaded into the kernel by an application and attached to a specific hook, they can remain there potentially until reboot of the node, so lifecycle of the user space application != lifecycle of the BPF program. It's maybe best to compare this aspect to kernel modules in the sense that you have an application that loads it into the kernel (insmod, etc, where you could also enforce lockdown signature check), but after that, they can be managed by other entities as well (implicitly refcounted from kernel, removed by other applications, etc). My understanding of the lockdown settings are that users have options to select/enforce a lockdown level of CONFIG_LOCK_DOWN_KERNEL_FORCE_{INTEGRITY, CONFIDENTIALITY} at compilation time, they have a lockdown={integrity| confidentiality} boot-time parameter, /sys/kernel/security/lockdown, and then more fine-grained policy via 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown"). Once you have set a global policy level, you cannot revert back to a less strict mode. So the SELinux policy is specifically tied around tasks to further restrict applications in respect to the global policy. I presume that would mean for those users that majority of tasks have the confidentiality option set via SELinux with just a few necessary using the integrity global policy. So overall the enforcing option when BPF program is loaded is the only really sensible option to me given only there we have the valid current task where such policy can be enforced. Best, Daniel
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/28/21 5:47 PM, Paul Moore wrote: On Fri, May 28, 2021 at 3:10 AM Daniel Borkmann wrote: On 5/28/21 3:37 AM, Paul Moore wrote: On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek wrote: Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. However, in several places the security_locked_down() hook is called in situations where the current task isn't doing any action that would directly breach lockdown, leading to SELinux checks that are basically bogus. Since in most of these situations converting the callers such that security_locked_down() is called in a context where the current task would be meaningful for SELinux is impossible or very non-trivial (and could lead to TOCTOU issues for the classic Lockdown LSM implementation), fix this by modifying the hook to accept a struct cred pointer as argument, where NULL will be interpreted as a request for a "global", task-independent lockdown decision only. Then modify SELinux to ignore calls with cred == NULL. I'm not overly excited about skipping the access check when cred is NULL. Based on the description and the little bit that I've dug into thus far it looks like using SECINITSID_KERNEL as the subject would be much more appropriate. *Something* (the kernel in most of the relevant cases it looks like) is requesting that a potentially sensitive disclosure be made, and ignoring it seems like the wrong thing to do. Leaving the access control intact also provides a nice avenue to audit these requests should users want to do that. I think the rationale/workaround for ignoring calls with cred == NULL (or the previous patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his seen tracing cases: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. How would your suggestion above solve both i) and ii)? First off, a bit of general commentary - I'm not sure if Ondrej was aware of this, but info like that is good to have in the commit description. Perhaps it was in the linked RHBZ but I try not to look at those when reviewing patches; the commit descriptions must be self-sufficient since we can't rely on the accessibility or the lifetime of external references. It's fine if people want to include external links in their commits, I would actually even encourage it in some cases, but the links shouldn't replace a proper description of the problem and why the proposed solution is The Best Solution. With that out of the way, it sounds like your issue isn't so much the access check, but rather the frequency of the access denials and the resulting audit records in your particular use case. My initial reaction is that you might want to understand why you are getting so many SELinux access denials, your loaded security policy clearly does not match with your intended use :) Beyond that, if you want to basically leave things as-is but quiet the high frequency audit records that result from these SELinux denials you might want to look into the SELinux "dontaudit" policy rule, it was created for things like this. Some info can be found in The SELinux Notebook, relevant link below: * https://github.com/SELinuxProject/selinux-notebook/blob/main/src/avc_rules.md#dontaudit The deadlock issue that was previously reported remains an open case as far as I'm concerned; I'm presently occupied trying to sort out a rather serious issue with respect to io_uring and LSM/audit (plus general stuff at $DAYJOB) so I haven't had time to investigate this any further. Of course anyone else is welcome to dive into it (I always want to encourage this, especially from "performance people" who just want to shut it all off), however if the answer is basically "disable LSM and/or audit checks" you have to know that it is going to result in a high degree of skepticism from me, so heavy documentation on why it is The Best Solution would be a very good thing :) Beyond that, I think the suggestions above of "why do you have so many policy denials?" and "have you looked into dontaudit?" are solid places to look for a solution in your particular case. Since most callers will just want to pass current_cred() as the cred parameter, rename the hook to security_cred_locked_down() and provide the original security_locked_down() function as a simple wrapper around the new hook. [...] 3. kernel/trace/bpf_trace.c:bpf_probe_read_kernel{,_str}_common() Called when a BPF program calls a helper that could leak kernel memory. The task context is not relevant here, si
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/28/21 3:42 PM, Ondrej Mosnacek wrote: (I'm off work today and plan to reply also to Paul's comments next week, but for now let me at least share a couple quick thoughts on Daniel's patch.) On Fri, May 28, 2021 at 11:56 AM Daniel Borkmann wrote: On 5/28/21 9:09 AM, Daniel Borkmann wrote: On 5/28/21 3:37 AM, Paul Moore wrote: On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek wrote: Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. However, in several places the security_locked_down() hook is called in situations where the current task isn't doing any action that would directly breach lockdown, leading to SELinux checks that are basically bogus. Since in most of these situations converting the callers such that security_locked_down() is called in a context where the current task would be meaningful for SELinux is impossible or very non-trivial (and could lead to TOCTOU issues for the classic Lockdown LSM implementation), fix this by modifying the hook to accept a struct cred pointer as argument, where NULL will be interpreted as a request for a "global", task-independent lockdown decision only. Then modify SELinux to ignore calls with cred == NULL. I'm not overly excited about skipping the access check when cred is NULL. Based on the description and the little bit that I've dug into thus far it looks like using SECINITSID_KERNEL as the subject would be much more appropriate. *Something* (the kernel in most of the relevant cases it looks like) is requesting that a potentially sensitive disclosure be made, and ignoring it seems like the wrong thing to do. Leaving the access control intact also provides a nice avenue to audit these requests should users want to do that. I think the rationale/workaround for ignoring calls with cred == NULL (or the previous patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his seen tracing cases: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. Actually, I wasn't aware of the deadlock... But calling an LSM hook [that is backed by a SELinux access check] from within a BPF helper is calling for all kinds of trouble, so I'm not surprised :) Fully agree, it's just waiting to blow up in unpredictable ways.. :/ Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't looked at the rest but it's also kind of independent), the attached fix should address both reported issues, please take a look & test. Thanks, I like this solution, although there are a few gotchas: 1. This patch creates a slight "regression" in that if someone flips the Lockdown LSM into lockdown mode on runtime, existing (already loaded) BPF programs will still be able to call the confidentiality-breaching helpers, while before the lockdown would apply also to them. Personally, I don't think it's a big deal (and I bet there are other existing cases where some handle kept from before lockdown could leak data), but I wanted to mention it in case someone thinks the opposite. Yes, right, though this is nothing new either in the sense that there are plenty of other cases with security_locked_down() that operate this way e.g. take the open_kcore() for /proc/kcore access or the module_sig_check() for mod signatures just to pick some random ones, same approach where the enforcement is happen at open/load time. 2. IIUC. when a BPF program is rejected due to lockdown/SELinux, the kernel will return -EINVAL to userspace (looking at check_helper_call() in kernel/bpf/verifier.c; didn't have time to look at other callers...). It would be nicer if the error code from the security_locked_down() call would be passed through the call chain and eventually returned to the caller. It should be relatively straightforward to convert bpf_base_func_proto() to return a PTR_ERR() instead of NULL on error, but it looks like this would result in quite a big patch updating all the callers (and callers of callers, etc.) with a not-so-small chance of missing some NULL check and introducing a bug... I guess we could live with EINVAL-on-denied in stable kernels and only have the error path refactoring in -next; I'm not sure... Right, it would return a verifier log entry with reporting to the user that the prog is attempting to use an unavailable/unknown helper function. We do have similar return NULL with bpf_capable() and perfmon_capable() checks. Potentially, we could do PTR_ERR() in future where we tell if it failed due to missing CAPs, due to lockdown or just due to helper not compiled in.. 3. This
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/28/21 1:47 PM, Jiri Olsa wrote: On Fri, May 28, 2021 at 11:56:02AM +0200, Daniel Borkmann wrote: Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't looked at the rest but it's also kind of independent), the attached fix should address both reported issues, please take a look & test. Thanks a lot, Daniel From 5893ad528dc0a0a68933b8f2a81b18d3f539660d Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 28 May 2021 09:16:31 + Subject: [PATCH bpf] bpf, audit, lockdown: Fix bogus SELinux lockdown permission checks Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. Fix both at the same time by taking a completely different approach, that is, move the check into the program verification phase where we actually retrieve the func proto. This also reliably gets the task (current) that is trying to install the tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helpers which can be called millions of times per second. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/canyvdqn7h5tvp47fbycrasv4xf07eubsdwt_edchxjuj43j...@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat,runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/28/21 9:09 AM, Daniel Borkmann wrote: On 5/28/21 3:37 AM, Paul Moore wrote: On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek wrote: Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. However, in several places the security_locked_down() hook is called in situations where the current task isn't doing any action that would directly breach lockdown, leading to SELinux checks that are basically bogus. Since in most of these situations converting the callers such that security_locked_down() is called in a context where the current task would be meaningful for SELinux is impossible or very non-trivial (and could lead to TOCTOU issues for the classic Lockdown LSM implementation), fix this by modifying the hook to accept a struct cred pointer as argument, where NULL will be interpreted as a request for a "global", task-independent lockdown decision only. Then modify SELinux to ignore calls with cred == NULL. I'm not overly excited about skipping the access check when cred is NULL. Based on the description and the little bit that I've dug into thus far it looks like using SECINITSID_KERNEL as the subject would be much more appropriate. *Something* (the kernel in most of the relevant cases it looks like) is requesting that a potentially sensitive disclosure be made, and ignoring it seems like the wrong thing to do. Leaving the access control intact also provides a nice avenue to audit these requests should users want to do that. I think the rationale/workaround for ignoring calls with cred == NULL (or the previous patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his seen tracing cases: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't looked at the rest but it's also kind of independent), the attached fix should address both reported issues, please take a look & test. Thanks a lot, Daniel >From 5893ad528dc0a0a68933b8f2a81b18d3f539660d Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 28 May 2021 09:16:31 + Subject: [PATCH bpf] bpf, audit, lockdown: Fix bogus SELinux lockdown permission checks Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. Fix both at the same time by taking a completely different approach, that is, move the check into the program verification phase where we actually retrieve the func proto. This also reliably gets the task (current) that is trying to install the tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helpers which can be called millions of times per second. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr
Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks
On 5/28/21 3:37 AM, Paul Moore wrote: On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek wrote: Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. However, in several places the security_locked_down() hook is called in situations where the current task isn't doing any action that would directly breach lockdown, leading to SELinux checks that are basically bogus. Since in most of these situations converting the callers such that security_locked_down() is called in a context where the current task would be meaningful for SELinux is impossible or very non-trivial (and could lead to TOCTOU issues for the classic Lockdown LSM implementation), fix this by modifying the hook to accept a struct cred pointer as argument, where NULL will be interpreted as a request for a "global", task-independent lockdown decision only. Then modify SELinux to ignore calls with cred == NULL. I'm not overly excited about skipping the access check when cred is NULL. Based on the description and the little bit that I've dug into thus far it looks like using SECINITSID_KERNEL as the subject would be much more appropriate. *Something* (the kernel in most of the relevant cases it looks like) is requesting that a potentially sensitive disclosure be made, and ignoring it seems like the wrong thing to do. Leaving the access control intact also provides a nice avenue to audit these requests should users want to do that. I think the rationale/workaround for ignoring calls with cred == NULL (or the previous patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his seen tracing cases: i) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() when presumingly trying to wake up kauditd [1]. How would your suggestion above solve both i) and ii)? [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585 : I starting seeing this with F-34. When I run a container that is traced with eBPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [1] https://lore.kernel.org/linux-audit/canyvdqn7h5tvp47fbycrasv4xf07eubsdwt_edchxjuj43j...@mail.gmail.com/ : Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat,runqslower tools from bcc on ppc64le. Example stack trace from [1]: [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
On 4/15/21 11:32 AM, Jianlin Lv wrote: For debugging JITs, dumping the JITed image to kernel log is discouraged, "bpftool prog dump jited" is much better way to examine JITed dumps. This patch get rid of the code related to bpf_jit_enable=2 mode and update the proc handler of bpf_jit_enable, also added auxiliary information to explain how to use bpf_jit_disasm tool after this change. Signed-off-by: Jianlin Lv [...] diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c index 0a7a2870f111..8d36b4658076 100644 --- a/arch/x86/net/bpf_jit_comp32.c +++ b/arch/x86/net/bpf_jit_comp32.c @@ -2566,9 +2566,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) cond_resched(); } - if (bpf_jit_enable > 1) - bpf_jit_dump(prog->len, proglen, pass + 1, image); - if (image) { bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)image; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index c8496c1142c9..990b1720c7a4 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -273,16 +273,8 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write, tmp.data = _enable; ret = proc_dointvec_minmax(, write, buffer, lenp, ppos); - if (write && !ret) { - if (jit_enable < 2 || - (jit_enable == 2 && bpf_dump_raw_ok(current_cred( { - *(int *)table->data = jit_enable; - if (jit_enable == 2) - pr_warn("bpf_jit_enable = 2 was set! NEVER use this in production, only for JIT debugging!\n"); - } else { - ret = -EPERM; - } - } + if (write && !ret) + *(int *)table->data = jit_enable; return ret; } @@ -389,7 +381,7 @@ static struct ctl_table net_core_table[] = { .extra2 = SYSCTL_ONE, # else .extra1 = SYSCTL_ZERO, - .extra2 = , + .extra2 = SYSCTL_ONE, # endif }, # ifdef CONFIG_HAVE_EBPF_JIT diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c index c8ae95804728..efa4b17ae016 100644 --- a/tools/bpf/bpf_jit_disasm.c +++ b/tools/bpf/bpf_jit_disasm.c @@ -7,7 +7,7 @@ * * To get the disassembly of the JIT code, do the following: * - * 1) `echo 2 > /proc/sys/net/core/bpf_jit_enable` + * 1) Insert bpf_jit_dump() and recompile the kernel to output JITed image into log Hmm, if we remove bpf_jit_dump(), the next drive-by cleanup patch will be thrown at bpf@vger stating that bpf_jit_dump() has no in-tree users and should be removed. Maybe we should be removing bpf_jit_disasm.c along with it as well as bpf_jit_dump() itself ... I guess if it's ever needed in those rare occasions for JIT debugging we can resurrect it from old kernels just locally. But yeah, bpftool's jit dump should suffice for vast majority of use cases. There was a recent set for ppc32 jit which was merged into ppc tree which will create a merge conflict with this one [0]. So we would need a rebase and take it maybe during merge win once the ppc32 landed.. [0] https://lore.kernel.org/bpf/cover.1616430991.git.christophe.le...@csgroup.eu/ * 2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`) * 3) Run e.g. `bpf_jit_disasm -o` to read out the last JIT code * diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c index 40a88df275f9..98c7eec2923f 100644 --- a/tools/bpf/bpftool/feature.c +++ b/tools/bpf/bpftool/feature.c @@ -203,9 +203,6 @@ static void probe_jit_enable(void) case 1: printf("JIT compiler is enabled\n"); break; - case 2: - printf("JIT compiler is enabled with debugging traces in kernel logs\n"); - break; This would still need to be there for older kernels ... case -1: printf("Unable to retrieve JIT-compiler status\n"); break;
Re: [PATCH] powerpc: net: bpf_jit_comp: Fix misuse of fallthrough
On 9/28/20 11:00 AM, zhe...@windriver.com wrote: From: He Zhe The user defined label following "fallthrough" is not considered by GCC and causes build failure. kernel-source/include/linux/compiler_attributes.h:208:41: error: attribute 'fallthrough' not preceding a case label or default label [-Werror] 208 define fallthrough _attribute((fallthrough_)) ^ Signed-off-by: He Zhe Applied, thanks! I've also added Fixes tag with df561f6688fe ("treewide: Use fallthrough pseudo-keyword") which added the bug.
Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again
On 5/28/20 2:23 PM, Michael Ellerman wrote: Petr Mladek writes: On Thu 2020-05-28 11:03:43, Michael Ellerman wrote: Petr Mladek writes: The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") caused that bpf_probe_read{, str}() functions were not longer available on architectures where the same logical address might have different content in kernel and user memory mapping. These architectures should use probe_read_{user,kernel}_str helpers. For backward compatibility, the problematic functions are still available on architectures where the user and kernel address spaces are not overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. At the moment, these backward compatible functions are enabled only on x86_64, arm, and arm64. Let's do it also on powerpc that has the non overlapping address space as well. Signed-off-by: Petr Mladek This seems like it should have a Fixes: tag and go into v5.7? Good point: Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") And yes, it should ideally go into v5.7 either directly or via stable. Should I resend the patch with Fixes and Cc: sta...@vger.kernel.org #v45.7 lines, please? If it goes into v5.7 then it doesn't need a Cc: stable, and I guess a Fixes: tag is nice to have but not so important as it already mentions the commit that caused the problem. So a resend probably isn't necessary. Acked-by: Michael Ellerman Daniel can you pick this up, or should I? Yeah I'll take it into bpf tree for v5.7. Thanks everyone, Daniel
Re: [PATCH v2 1/4] powerpc/64s: implement probe_kernel_read/write without touching AMR
Hey Nicholas, On 4/7/20 6:01 AM, Nicholas Piggin wrote: Nicholas Piggin's on April 3, 2020 9:05 pm: Christophe Leroy's on April 3, 2020 8:31 pm: Le 03/04/2020 à 11:35, Nicholas Piggin a écrit : There is no need to allow user accesses when probing kernel addresses. I just discovered the following commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=75a1a607bb7e6d918be3aca11ec2214a275392f4 This commit adds probe_kernel_read_strict() and probe_kernel_write_strict(). When reading the commit log, I understand that probe_kernel_read() may be used to access some user memory. Which will not work anymore with your patch. Hmm, I looked at _strict but obviously not hard enough. Good catch. I don't think probe_kernel_read() should ever access user memory, the comment certainly says it doesn't, but that patch sort of implies that they do. I think it's wrong. The non-_strict maybe could return userspace data to you if you did pass a user address? I don't see why that shouldn't just be disallowed always though. And if the _strict version is required to be safe, then it seems like a bug or security issue to just allow everyone that doesn't explicitly override it to use the default implementation. Also, the way the weak linkage is done in that patch, means parisc and um archs that were previously overriding probe_kernel_read() now get the default probe_kernel_read_strict(), which would be wrong for them. The changelog in commit 75a1a607bb7 makes it a bit clearer. If the non-_strict variant is used on non-kernel addresses, then it might not return -EFAULT or it might cause a kernel warning. The _strict variant is supposed to be usable with any address and it will return -EFAULT if it was not a valid and mapped kernel address. The non-_strict variant can not portably access user memory because it uses KERNEL_DS, and its documentation says its only for kernel pointers. So powerpc should be fine to run that under KUAP AFAIKS. I don't know why the _strict behaviour is not just made default, but the implementation of it does seem to be broken on the archs that override the non-_strict variant. Yeah, we should make it default and only add a "opt out" for the old legacy cases; there was also same discussion started over here just recently [0]. Thanks, Daniel [0] https://lore.kernel.org/lkml/20200403133533.ga3...@infradead.org/T/
Re: [PATCH] libbpf: fix readelf output parsing on powerpc with recent binutils
On Mon, Dec 02, 2019 at 04:53:26PM +1100, Michael Ellerman wrote: > Aurelien Jarno writes: > > On powerpc with recent versions of binutils, readelf outputs an extra > > field when dumping the symbols of an object file. For example: > > > > 35: 083896 FUNCLOCAL DEFAULT [: 8] > > 1 btf_is_struct > > > > The extra "[: 8]" prevents the GLOBAL_SYM_COUNT variable to > > be computed correctly and causes the checkabi target to fail. > > > > Fix that by looking for the symbol name in the last field instead of the > > 8th one. This way it should also cope with future extra fields. > > > > Signed-off-by: Aurelien Jarno > > --- > > tools/lib/bpf/Makefile | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > Thanks for fixing that, it's been on my very long list of test failures > for a while. > > Tested-by: Michael Ellerman Looks good & also continues to work on x86. Applied, thanks!
Re: [PATCH] bpf: handle 32-bit zext during constant blinding
On 8/21/19 9:23 PM, Naveen N. Rao wrote: Since BPF constant blinding is performed after the verifier pass, the ALU32 instructions inserted for doubleword immediate loads don't have a corresponding zext instruction. This is causing a kernel oops on powerpc and can be reproduced by running 'test_cgroup_storage' with bpf_jit_harden=2. Fix this by emitting BPF_ZEXT during constant blinding if prog->aux->verifier_zext is set. Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result") Reported-by: Michael Ellerman Signed-off-by: Naveen N. Rao LGTM, applied to bpf, thanks!
Re: [PATCH 0/2] powerpc/bpf: DIV64 instruction fix
On 06/12/2019 08:51 PM, Naveen N. Rao wrote: > The first patch updates DIV64 overflow tests to properly detect error > conditions. The second patch fixes powerpc64 JIT to generate the proper > unsigned division instruction for BPF_ALU64. > > - Naveen > > Naveen N. Rao (2): > bpf: fix div64 overflow tests to properly detect errors > powerpc/bpf: use unsigned division instruction for 64-bit operations > > arch/powerpc/include/asm/ppc-opcode.h | 1 + > arch/powerpc/net/bpf_jit.h | 2 +- > arch/powerpc/net/bpf_jit_comp64.c | 8 > .../testing/selftests/bpf/verifier/div_overflow.c | 14 ++ > 4 files changed, 16 insertions(+), 9 deletions(-) > LGTM, applied to bpf, thanks!
Re: [PATCH] powerpc: bpf: Fix generation of load/store DW instructions
On 03/15/2019 03:51 PM, Naveen N. Rao wrote: > Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test > was failing on powerpc64 BE, and rightfully indicated that the PPC_LD() > macro is not masking away the last two bits of the offset per the ISA, > resulting in the generation of 'lwa' instruction instead of the intended > 'ld' instruction. > > Segher also pointed out that we can't simply mask away the last two bits > as that will result in loading/storing from/to a memory location that > was not intended. > > This patch addresses this by using ldx/stdx if the offset is not > word-aligned. We load the offset into a temporary register (TMP_REG_2) > and use that as the index register in a subsequent ldx/stdx. We fix > PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL() > and PPC_BPF_STL() to factor in the offset value and generate the proper > instruction sequence. We also convert all existing users of PPC_LD() and > PPC_STD() to use these macros. All existing uses of these macros have > been audited to ensure that TMP_REG_2 can be clobbered. > > Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended > BPF") > Cc: sta...@vger.kernel.org # v4.9+ > > Reported-by: Yauheni Kaliuta > Signed-off-by: Naveen N. Rao Applied, thanks!
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/10/2018 06:27 PM, Michael Roth wrote: > Quoting Daniel Borkmann (2018-12-10 08:26:31) >> On 12/07/2018 04:36 PM, Michael Roth wrote: >>> Quoting Michael Ellerman (2018-12-07 06:31:13) >>>> Michael Roth writes: >>>> >>>>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF >>>>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, >>>>> and is adjusted again at init time if MODULES_VADDR is defined. >>>>> >>>>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with >>>> >>>> But maybe it should be, I don't know why we don't define it. >>>> >>>>> the compile-time default at boot-time, which is 0x9c40 when >>>>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit >>>>> value: >>>>> >>>>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit >>>>> -1673527296 >>>>> >>>>> and can cause various unexpected failures throughout the network >>>>> stack. In one case `strace dhclient eth0` reported: >>>>> >>>>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, >>>>> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) >>>>> >>>>> and similar failures can be seen with tools like tcpdump. This doesn't >>>>> always reproduce however, and I'm not sure why. The more consistent >>>>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host >>>>> would time out on systemd/netplan configuring a virtio-net NIC with no >>>>> noticeable errors in the logs. >>>>> >>>>> Fix this by limiting the compile-time default for bpf_jit_limit to >>>>> INT_MAX. >>>> >>>> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on >>>> whether we should be using PAGE_SIZE here at all. I guess each BPF >>>> program uses at least one page is the thinking? >>> >>> That seems to be the case, at least, the max number of minimum-sized >>> allocations would be less on ppc64 since the allocations are always at >>> least PAGE_SIZE in size. The init-time default also limits to INT_MAX, >>> so it seemed consistent to do that here too. >>> >>>> >>>> Thanks for tracking this down. For some reason none of my ~10 test boxes >>>> have hit this, perhaps I don't have new enough userspace? >>> >>> I'm not too sure, I would've thought things like the dhclient case in >>> the commit log would fail every time, but sometimes I need to reboot the >>> guest before I start seeing the behavior. Maybe there's something special >>> about when JIT allocations are actually done that can affect >>> reproducibility? >>> >>> In my case at least the virtio-net networking timeout was consistent >>> enough for a bisect, but maybe it depends on the specific network >>> configuration (single NIC, basic DHCP through netplan/systemd in my case). >>> >>>> >>>> You don't mention why you needed to add BPF_MIN(), I assume because the >>>> kernel version of min() has gotten too complicated to work here? >>> >>> I wasn't sure if it was safe here or not, so I tried looking at other >>> users and came across: >>> >>> mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : >>> (y)) /* can't use min() */ >>> >>> I'm not sure what the reasoning was (or whether it still applies), but I >>> figured it was safer to do the same here. Maybe Nick still recalls? >>> >>>> >>>> Daniel I assume you'll merge this via your tree? >>>> >>>> cheers >>>> >>>>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv >>>>> allocations") >>>>> Cc: linuxppc-...@ozlabs.org >>>>> Cc: Daniel Borkmann >>>>> Cc: Sandipan Das >>>>> Cc: Alexei Starovoitov >>>>> Signed-off-by: Michael Roth >> >> Thanks for the reports / fixes and sorry for my late reply (bit too >> swamped last week), some more thoughts below. >> >>>>> kernel/bpf/core.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>>>> index b1a3545d0ec8..5
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/07/2018 04:36 PM, Michael Roth wrote: > Quoting Michael Ellerman (2018-12-07 06:31:13) >> Michael Roth writes: >> >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, >>> and is adjusted again at init time if MODULES_VADDR is defined. >>> >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with >> >> But maybe it should be, I don't know why we don't define it. >> >>> the compile-time default at boot-time, which is 0x9c40 when >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit >>> value: >>> >>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit >>> -1673527296 >>> >>> and can cause various unexpected failures throughout the network >>> stack. In one case `strace dhclient eth0` reported: >>> >>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, >>> 16) = -1 ENOTSUPP (Unknown error 524) >>> >>> and similar failures can be seen with tools like tcpdump. This doesn't >>> always reproduce however, and I'm not sure why. The more consistent >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host >>> would time out on systemd/netplan configuring a virtio-net NIC with no >>> noticeable errors in the logs. >>> >>> Fix this by limiting the compile-time default for bpf_jit_limit to >>> INT_MAX. >> >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on >> whether we should be using PAGE_SIZE here at all. I guess each BPF >> program uses at least one page is the thinking? > > That seems to be the case, at least, the max number of minimum-sized > allocations would be less on ppc64 since the allocations are always at > least PAGE_SIZE in size. The init-time default also limits to INT_MAX, > so it seemed consistent to do that here too. > >> >> Thanks for tracking this down. For some reason none of my ~10 test boxes >> have hit this, perhaps I don't have new enough userspace? > > I'm not too sure, I would've thought things like the dhclient case in > the commit log would fail every time, but sometimes I need to reboot the > guest before I start seeing the behavior. Maybe there's something special > about when JIT allocations are actually done that can affect > reproducibility? > > In my case at least the virtio-net networking timeout was consistent > enough for a bisect, but maybe it depends on the specific network > configuration (single NIC, basic DHCP through netplan/systemd in my case). > >> >> You don't mention why you needed to add BPF_MIN(), I assume because the >> kernel version of min() has gotten too complicated to work here? > > I wasn't sure if it was safe here or not, so I tried looking at other > users and came across: > > mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) > /* can't use min() */ > > I'm not sure what the reasoning was (or whether it still applies), but I > figured it was safer to do the same here. Maybe Nick still recalls? > >> >> Daniel I assume you'll merge this via your tree? >> >> cheers >> >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv >>> allocations") >>> Cc: linuxppc-...@ozlabs.org >>> Cc: Daniel Borkmann >>> Cc: Sandipan Das >>> Cc: Alexei Starovoitov >>> Signed-off-by: Michael Roth Thanks for the reports / fixes and sorry for my late reply (bit too swamped last week), some more thoughts below. >>> kernel/bpf/core.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>> index b1a3545d0ec8..55de4746cdfd 100644 >>> --- a/kernel/bpf/core.c >>> +++ b/kernel/bpf/core.c >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) >>> } >>> >>> #ifdef CONFIG_BPF_JIT >>> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) >>> +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX) >>> >>> /* All BPF JIT sysctl knobs here. */ >>> int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT define also given for 4.21 arm64 will have its own dedicated area for JIT allocations where neither the above limit nor the MOD
Re: [PATCH bpf] bpf: powerpc64: optimize JIT passes for bpf function calls
On 12/03/2018 02:26 PM, Sandipan Das wrote: > Hi Daniel, > > On 03/12/18 6:18 PM, Daniel Borkmann wrote: >> >> Thanks for the patch, just to clarify, it's targeted at bpf-next and >> not bpf, correct? > > This patch is targeted at the bpf tree. Ok, thanks for clarifying, applied to bpf!
Re: [PATCH bpf] bpf: powerpc64: optimize JIT passes for bpf function calls
Hi Sandipan, On 12/03/2018 01:21 PM, Sandipan Das wrote: > Once the JITed images for each function in a multi-function program > are generated after the first three JIT passes, we only need to fix > the target address for the branch instruction corresponding to each > bpf-to-bpf function call. > > This introduces the following optimizations for reducing the work > done by the JIT compiler when handling multi-function programs: > > [1] Instead of doing two extra passes to fix the bpf function calls, > do just one as that would be sufficient. > > [2] During the extra pass, only overwrite the instruction sequences > for the bpf-to-bpf function calls as everything else would still > remain exactly the same. This also reduces the number of writes > to the JITed image. > > [3] Do not regenerate the prologue and the epilogue during the extra > pass as that would be redundant. > > Signed-off-by: Sandipan Das Thanks for the patch, just to clarify, it's targeted at bpf-next and not bpf, correct? Thanks, Daniel
Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI
On 11/10/2018 07:58 PM, Michał Mirosław wrote: > Signed-off-by: Michał Mirosław Why you have empty commit messages for non-trivial changes like this in 4 out of 6 of your patches ... How was it tested on the JITs you were changing? Did you test on both, big and little endian machines? > --- > net/core/filter.c | 40 +--- > 1 file changed, 21 insertions(+), 19 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index e521c5ebc7d1..c151b906df53 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -296,22 +296,21 @@ static u32 convert_skb_access(int skb_field, int > dst_reg, int src_reg, > break; > > case SKF_AD_VLAN_TAG: > - case SKF_AD_VLAN_TAG_PRESENT: > BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2); > - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000); > > /* dst_reg = *(u16 *) (src_reg + offsetof(vlan_tci)) */ > *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg, > offsetof(struct sk_buff, vlan_tci)); > - if (skb_field == SKF_AD_VLAN_TAG) { > - *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, > - ~VLAN_TAG_PRESENT); > - } else { > - /* dst_reg >>= 12 */ > - *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 12); > - /* dst_reg &= 1 */ > +#ifdef VLAN_TAG_PRESENT > + *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, ~VLAN_TAG_PRESENT); > +#endif > + break; > + case SKF_AD_VLAN_TAG_PRESENT: > + *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg, > PKT_VLAN_PRESENT_OFFSET()); > + if (PKT_VLAN_PRESENT_BIT) > + *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, > PKT_VLAN_PRESENT_BIT); > + if (PKT_VLAN_PRESENT_BIT < 7) > *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, 1); > - } > break; > } > > @@ -6140,19 +6139,22 @@ static u32 bpf_convert_ctx_access(enum > bpf_access_type type, > break; > > case offsetof(struct __sk_buff, vlan_present): > + *target_size = 1; > + *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg, > + PKT_VLAN_PRESENT_OFFSET()); > + if (PKT_VLAN_PRESENT_BIT) > + *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, > PKT_VLAN_PRESENT_BIT); > + if (PKT_VLAN_PRESENT_BIT < 7) > + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1); > + break; > + > case offsetof(struct __sk_buff, vlan_tci): > - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000); > - > *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, > bpf_target_off(struct sk_buff, vlan_tci, > 2, >target_size)); > - if (si->off == offsetof(struct __sk_buff, vlan_tci)) { > - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, > - ~VLAN_TAG_PRESENT); > - } else { > - *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 12); > - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1); > - } > +#ifdef VLAN_TAG_PRESENT > + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, > ~VLAN_TAG_PRESENT); > +#endif > break; > > case offsetof(struct __sk_buff, cb[0]) ... >
Re: [PATCH net-next 0/6] Remove VLAN.CFI overload
On 11/10/2018 10:47 PM, David Miller wrote: > From: Michał Mirosław > Date: Sat, 10 Nov 2018 19:58:29 +0100 > >> Fix BPF code/JITs to allow for separate VLAN_PRESENT flag >> storage and finally move the flag to separate storage in skbuff. >> >> This is final step to make CLAN.CFI transparent to core Linux >> networking stack. >> >> An #ifdef is introduced temporarily to mark fragments masking >> VLAN_TAG_PRESENT. This is removed altogether in the final patch. > > Daniel and Alexei, please review. Sorry, was completely swamped due to plumbers, just getting to it now.
Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote: > Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") added a call to bpf_jit_uncharge_modmem() to the routine > bpf_jit_binary_free() which is called from the __weak bpf_jit_free(). > This function is overridden by arches, some of which do not call > bpf_jit_binary_free() to release the memory, and so the released > memory is not accounted for, potentially leading to spurious allocation > failures. > > So replace the direct calls to module_memfree() in the arch code with > calls to bpf_jit_binary_free(). Sorry but this patch is completely buggy, and above description on the accounting incorrect as well. Looks like this patch was not tested at all. The below cBPF JITs that use module_memfree() which you replace with bpf_jit_binary_free() are using module_alloc() internally to get the JIT image buffer ... > Signed-off-by: Ard Biesheuvel > --- > arch/mips/net/bpf_jit.c | 2 +- > arch/powerpc/net/bpf_jit_comp.c | 2 +- > arch/powerpc/net/bpf_jit_comp64.c | 5 + > arch/sparc/net/bpf_jit_comp_32.c | 2 +- > 4 files changed, 4 insertions(+), 7 deletions(-) > > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c > index 4d8cb9bb8365..1b69897274a1 100644 > --- a/arch/mips/net/bpf_jit.c > +++ b/arch/mips/net/bpf_jit.c > @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp) > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c > index d5bfe24bb3b5..a1ea1ea6b40d 100644 > --- a/arch/powerpc/net/bpf_jit_comp.c > +++ b/arch/powerpc/net/bpf_jit_comp.c > @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp) > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 50b129785aee..84c8f013a6c6 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog > *fp) > /* Overriding bpf_jit_free() as we don't set images read-only. */ > void bpf_jit_free(struct bpf_prog *fp) > { > - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; > - struct bpf_binary_header *bpf_hdr = (void *)addr; > - > if (fp->jited) > - bpf_jit_binary_free(bpf_hdr); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/sparc/net/bpf_jit_comp_32.c > b/arch/sparc/net/bpf_jit_comp_32.c > index a5ff88643d5c..01bda6bc9e7f 100644 > --- a/arch/sparc/net/bpf_jit_comp_32.c > +++ b/arch/sparc/net/bpf_jit_comp_32.c > @@ -759,7 +759,7 @@ cond_branch: f_offset = addrs[i + > filter[i].jf]; > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } >
Re: [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs
On 05/24/2018 10:25 AM, Sandipan Das wrote: > On 05/24/2018 01:04 PM, Daniel Borkmann wrote: >> On 05/24/2018 08:56 AM, Sandipan Das wrote: >>> For multi-function programs, loading the address of a callee >>> function to a register requires emitting instructions whose >>> count varies from one to five depending on the nature of the >>> address. >>> >>> Since we come to know of the callee's address only before the >>> extra pass, the number of instructions required to load this >>> address may vary from what was previously generated. This can >>> make the JITed image grow or shrink. >>> >>> To avoid this, we should generate a constant five-instruction >>> when loading function addresses by padding the optimized load >>> sequence with NOPs. >>> >>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com> >>> --- >>> arch/powerpc/net/bpf_jit_comp64.c | 34 +++--- >>> 1 file changed, 23 insertions(+), 11 deletions(-) >>> >>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c >>> b/arch/powerpc/net/bpf_jit_comp64.c >>> index 1bdb1aff0619..e4582744a31d 100644 >>> --- a/arch/powerpc/net/bpf_jit_comp64.c >>> +++ b/arch/powerpc/net/bpf_jit_comp64.c >>> @@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct >>> codegen_context *ctx) >>> >>> static void bpf_jit_emit_func_call(u32 *image, struct codegen_context >>> *ctx, u64 func) >>> { >>> + unsigned int i, ctx_idx = ctx->idx; >>> + >>> + /* Load function address into r12 */ >>> + PPC_LI64(12, func); >>> + >>> + /* For bpf-to-bpf function calls, the callee's address is unknown >>> +* until the last extra pass. As seen above, we use PPC_LI64() to >>> +* load the callee's address, but this may optimize the number of >>> +* instructions required based on the nature of the address. >>> +* >>> +* Since we don't want the number of instructions emitted to change, >>> +* we pad the optimized PPC_LI64() call with NOPs to guarantee that >>> +* we always have a five-instruction sequence, which is the maximum >>> +* that PPC_LI64() can emit. >>> +*/ >>> + for (i = ctx->idx - ctx_idx; i < 5; i++) >>> + PPC_NOP(); >> >> By the way, I think you can still optimize this. The nops are not really >> needed in case of insn->src_reg != BPF_PSEUDO_CALL since the address of >> a normal BPF helper call will always be at a fixed location and known a >> priori. > > Ah, true. Thanks for pointing this out. There are a few other things that > we are planning to do for the ppc64 JIT compiler. Will put out a patch for > this with that series. Awesome, thanks Sandipan!
Re: [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs
On 05/24/2018 08:56 AM, Sandipan Das wrote: > For multi-function programs, loading the address of a callee > function to a register requires emitting instructions whose > count varies from one to five depending on the nature of the > address. > > Since we come to know of the callee's address only before the > extra pass, the number of instructions required to load this > address may vary from what was previously generated. This can > make the JITed image grow or shrink. > > To avoid this, we should generate a constant five-instruction > when loading function addresses by padding the optimized load > sequence with NOPs. > > Signed-off-by: Sandipan Das> --- > arch/powerpc/net/bpf_jit_comp64.c | 34 +++--- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 1bdb1aff0619..e4582744a31d 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct > codegen_context *ctx) > > static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, > u64 func) > { > + unsigned int i, ctx_idx = ctx->idx; > + > + /* Load function address into r12 */ > + PPC_LI64(12, func); > + > + /* For bpf-to-bpf function calls, the callee's address is unknown > + * until the last extra pass. As seen above, we use PPC_LI64() to > + * load the callee's address, but this may optimize the number of > + * instructions required based on the nature of the address. > + * > + * Since we don't want the number of instructions emitted to change, > + * we pad the optimized PPC_LI64() call with NOPs to guarantee that > + * we always have a five-instruction sequence, which is the maximum > + * that PPC_LI64() can emit. > + */ > + for (i = ctx->idx - ctx_idx; i < 5; i++) > + PPC_NOP(); By the way, I think you can still optimize this. The nops are not really needed in case of insn->src_reg != BPF_PSEUDO_CALL since the address of a normal BPF helper call will always be at a fixed location and known a priori. > #ifdef PPC64_ELF_ABI_v1 > - /* func points to the function descriptor */ > - PPC_LI64(b2p[TMP_REG_2], func); > - /* Load actual entry point from function descriptor */ > - PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0); > - /* ... and move it to LR */ > - PPC_MTLR(b2p[TMP_REG_1]); > /* >* Load TOC from function descriptor at offset 8. >* We can clobber r2 since we get called through a >* function pointer (so caller will save/restore r2) >* and since we don't use a TOC ourself. >*/ > - PPC_BPF_LL(2, b2p[TMP_REG_2], 8); > -#else > - /* We can clobber r12 */ > - PPC_FUNC_ADDR(12, func); > - PPC_MTLR(12); > + PPC_BPF_LL(2, 12, 8); > + /* Load actual entry point from function descriptor */ > + PPC_BPF_LL(12, 12, 0); > #endif > + > + PPC_MTLR(12); > PPC_BLRL(); > } > >
Re: [PATCH bpf-next v4 00/10] bpf: enhancements for multi-function programs
On 05/24/2018 08:56 AM, Sandipan Das wrote: > [1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler. > > [2] Provide a way for resolving function calls because of the way JITed > images are allocated in powerpc64. > > [3] Fix to get JITed instruction dumps for multi-function programs from > the bpf system call. > > [4] Fix for bpftool to show delimited multi-function JITed image dumps. > > v4: > - Incorporate review comments from Jakub. > - Fix JSON output for bpftool. > > v3: > - Change base tree tag to bpf-next. > - Incorporate review comments from Alexei, Daniel and Jakub. > - Make sure that the JITed image does not grow or shrink after >the last pass due to the way the instruction sequence used >to load a callee's address maybe optimized. > - Make additional changes to the bpf system call and bpftool to >make multi-function JITed dumps easier to correlate. > > v2: > - Incorporate review comments from Jakub. > > Sandipan Das (10): > bpf: support 64-bit offsets for bpf function calls > bpf: powerpc64: pad function address loads with NOPs > bpf: powerpc64: add JIT support for multi-function programs > bpf: get kernel symbol addresses via syscall > tools: bpf: sync bpf uapi header > tools: bpftool: resolve calls without using imm field > bpf: fix multi-function JITed dump obtained via syscall > bpf: get JITed image lengths of functions via syscall > tools: bpf: sync bpf uapi header > tools: bpftool: add delimiters to multi-function JITed dumps > > arch/powerpc/net/bpf_jit_comp64.c | 110 > ++ > include/uapi/linux/bpf.h | 4 ++ > kernel/bpf/syscall.c | 82 ++-- > kernel/bpf/verifier.c | 22 +--- > tools/bpf/bpftool/prog.c | 97 - > tools/bpf/bpftool/xlated_dumper.c | 14 +++-- > tools/bpf/bpftool/xlated_dumper.h | 3 ++ > tools/include/uapi/linux/bpf.h| 4 ++ > 8 files changed, 301 insertions(+), 35 deletions(-) Applied to bpf-next, thanks a lot Sandipan!
Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
On 05/23/2018 12:37 PM, Sandipan Das wrote: [...] > Other than that, for powerpc64, there is a problem with the way the > binutils disassembler code (in "opcodes/ppc-dis.c") passes arguments > to the callback fprintf_json(). > > In fprintf_json(), we always expect the va_list elements to resolve > to strings (char *). But for powerpc64, the register or immediate > operands are always passed as integers. So, when the code attempts > to resolve these operands using va_arg(ap, char *), bpftool crashes. > For now, I am using a workaround based on vsnprintf() but this does > not get the semantics correct for memory operands. You can probably > see that for the store instructions in the JSON dump above this. > > Daniel, > > Would it be okay if I send out a fix for this in a different series? I'm fine either way with regards to the fix. Feels like a portability bug in the binutils disassembler? We could probably have a feature test like in test-disassembler-four-args and select a workaround in bpftool based on that outcome. Thanks Sandipan! [1] tools/build/feature/test-disassembler-four-args.c
Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
On 05/22/2018 09:55 PM, Jakub Kicinski wrote: > On Tue, 22 May 2018 22:46:13 +0530, Sandipan Das wrote: >> +if (info.nr_jited_func_lens && info.jited_func_lens) { >> +struct kernel_sym *sym = NULL; >> +unsigned char *img = buf; >> +__u64 *ksyms = NULL; >> +__u32 *lens; >> +__u32 i; >> + >> +if (info.nr_jited_ksyms) { >> +kernel_syms_load(); >> +ksyms = (__u64 *) info.jited_ksyms; >> +} >> + >> +lens = (__u32 *) info.jited_func_lens; >> +for (i = 0; i < info.nr_jited_func_lens; i++) { >> +if (ksyms) { >> +sym = kernel_syms_search(, ksyms[i]); >> +if (sym) >> +printf("%s:\n", sym->name); >> +else >> +printf("%016llx:\n", ksyms[i]); >> +} >> + >> +disasm_print_insn(img, lens[i], opcodes, name); >> +img += lens[i]; >> +printf("\n"); >> +} >> +} else { > > The output doesn't seem to be JSON-compatible :( We try to make sure > all bpftool command can produce valid JSON when run with -j (or -p) > switch. > > Would it be possible to make each function a separate JSON object with > "name" and "insn" array? Would that work? Sandipan, could you take a look at this? Given there's json output today we should definitely try not to break it; presumably this would be one final respin of your series with this fixed. Thanks, Daniel
Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
On 05/21/2018 09:42 PM, Sandipan Das wrote: > On 05/18/2018 09:21 PM, Daniel Borkmann wrote: >> On 05/18/2018 02:50 PM, Sandipan Das wrote: >>> Currently, for multi-function programs, we cannot get the JITed >>> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD >>> command. Because of this, userspace tools such as bpftool fail >>> to identify a multi-function program as being JITed or not. >>> >>> With the JIT enabled and the test program running, this can be >>> verified as follows: >>> >>> # cat /proc/sys/net/core/bpf_jit_enable >>> 1 >>> >>> Before applying this patch: >>> >>> # bpftool prog list >>> 1: kprobe name foo tag b811aab41a39ad3d gpl >>> loaded_at 2018-05-16T11:43:38+0530 uid 0 >>> xlated 216B not jited memlock 65536B >>> ... >>> >>> # bpftool prog dump jited id 1 >>> no instructions returned >>> >>> After applying this patch: >>> >>> # bpftool prog list >>> 1: kprobe name foo tag b811aab41a39ad3d gpl >>> loaded_at 2018-05-16T12:13:01+0530 uid 0 >>> xlated 216B jited 308B memlock 65536B >>> ... >> >> That's really nice! One comment inline below: >> >>> # bpftool prog dump jited id 1 >>> 0: nop >>> 4: nop >>> 8: mflrr0 >>> c: std r0,16(r1) >>> 10: stdur1,-112(r1) >>> 14: std r31,104(r1) >>> 18: addir31,r1,48 >>> 1c: li r3,10 >>> ... >>> >>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com> >>> --- >>> kernel/bpf/syscall.c | 38 -- >>> 1 file changed, 32 insertions(+), 6 deletions(-) >>> >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>> index 54a72fafe57c..2430d159078c 100644 >>> --- a/kernel/bpf/syscall.c >>> +++ b/kernel/bpf/syscall.c >>> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog >>> *prog, >>> struct bpf_prog_info info = {}; >>> u32 info_len = attr->info.info_len; >>> char __user *uinsns; >>> - u32 ulen; >>> + u32 ulen, i; >>> int err; >>> >>> err = check_uarg_tail_zero(uinfo, sizeof(info), info_len); >>> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog >>> *prog, >>> ulen = min_t(u32, info.nr_map_ids, ulen); >>> if (ulen) { >>> u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); >>> - u32 i; >>> >>> for (i = 0; i < ulen; i++) >>> if (put_user(prog->aux->used_maps[i]->id, >>> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog >>> *prog, >>> * for offload. >>> */ >>> ulen = info.jited_prog_len; >>> - info.jited_prog_len = prog->jited_len; >>> + if (prog->aux->func_cnt) { >>> + info.jited_prog_len = 0; >>> + for (i = 0; i < prog->aux->func_cnt; i++) >>> + info.jited_prog_len += prog->aux->func[i]->jited_len; >>> + } else { >>> + info.jited_prog_len = prog->jited_len; >>> + } >>> + >>> if (info.jited_prog_len && ulen) { >>> if (bpf_dump_raw_ok()) { >>> uinsns = u64_to_user_ptr(info.jited_prog_insns); >>> ulen = min_t(u32, info.jited_prog_len, ulen); >>> - if (copy_to_user(uinsns, prog->bpf_func, ulen)) >>> - return -EFAULT; >>> + >>> + /* for multi-function programs, copy the JITed >>> +* instructions for all the functions >>> +*/ >>> + if (prog->aux->func_cnt) { >>> + u32 len, free; >>> + u8 *img; >>> + >>> + free = ulen; >>> + for (i = 0; i < prog->aux->func_cnt; i++) { >>> + len = prog->aux->func[i]->jited_len; >>> + img = (u8 *) >>> prog->aux->func[i]->bpf_func; >>> + if (len > free) >>> + break; >>> + if (copy_to_user(uinsns, img, len)) >>> + return -EFAULT; >>> + uinsns += len; >>> + free -= len; >> >> Is there any way we can introduce a delimiter between the different >> images such that they could be more easily correlated with the call >> from the main (or other sub-)program instead of having one contiguous >> dump blob? > > Can we have another member in bpf_prog_info that points to a list of the > lengths of the > JITed images for each subprogram? We can use this information to split up the > dump. Seems okay to me. Thanks, Daniel
Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
On 05/18/2018 06:05 PM, Naveen N. Rao wrote: > Daniel Borkmann wrote: >> On 05/18/2018 02:50 PM, Sandipan Das wrote: >>> This adds support for bpf-to-bpf function calls in the powerpc64 >>> JIT compiler. The JIT compiler converts the bpf call instructions >>> to native branch instructions. After a round of the usual passes, >>> the start addresses of the JITed images for the callee functions >>> are known. Finally, to fixup the branch target addresses, we need >>> to perform an extra pass. >>> >>> Because of the address range in which JITed images are allocated >>> on powerpc64, the offsets of the start addresses of these images >>> from __bpf_call_base are as large as 64 bits. So, for a function >>> call, we cannot use the imm field of the instruction to determine >>> the callee's address. Instead, we use the alternative method of >>> getting it from the list of function addresses in the auxillary >>> data of the caller by using the off field as an index. >>> >>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com> >>> --- >>> arch/powerpc/net/bpf_jit_comp64.c | 79 >>> ++- >>> 1 file changed, 69 insertions(+), 10 deletions(-) >>> >>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c >>> b/arch/powerpc/net/bpf_jit_comp64.c >>> index 1bdb1aff0619..25939892d8f7 100644 >>> --- a/arch/powerpc/net/bpf_jit_comp64.c >>> +++ b/arch/powerpc/net/bpf_jit_comp64.c >>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct >>> codegen_context *ctx, u32 >>> /* Assemble the body code between the prologue & epilogue */ >>> static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, >>> struct codegen_context *ctx, >>> - u32 *addrs) >>> + u32 *addrs, bool extra_pass) >>> { >>> const struct bpf_insn *insn = fp->insnsi; >>> int flen = fp->len; >>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, >>> u32 *image, >>> break; >>> >>> /* >>> - * Call kernel helper >>> + * Call kernel helper or bpf function >>> */ >>> case BPF_JMP | BPF_CALL: >>> ctx->seen |= SEEN_FUNC; >>> - func = (u8 *) __bpf_call_base + imm; >>> + >>> + /* bpf function call */ >>> + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) >> >> Perhaps it might make sense here for !extra_pass to set func to some dummy >> address as otherwise the 'kernel helper call' branch used for this is a bit >> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call() >> optimizes the immediate addr, I presume the JIT can handle situations where >> in the final extra_pass the image needs to grow/shrink again (due to >> different >> final address for the call)? > > That's a good catch. We don't handle that -- we expect to get the size right > on first pass. We could probably have PPC_FUNC_ADDR() pad the result with > nops to make it a constant 5-instruction sequence. Yeah, arm64 does something similar by not optimizing the imm in order to always emit 4 insns for it. Thanks, Daniel
Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
On 05/18/2018 02:50 PM, Sandipan Das wrote: > Currently, for multi-function programs, we cannot get the JITed > instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD > command. Because of this, userspace tools such as bpftool fail > to identify a multi-function program as being JITed or not. > > With the JIT enabled and the test program running, this can be > verified as follows: > > # cat /proc/sys/net/core/bpf_jit_enable > 1 > > Before applying this patch: > > # bpftool prog list > 1: kprobe name foo tag b811aab41a39ad3d gpl > loaded_at 2018-05-16T11:43:38+0530 uid 0 > xlated 216B not jited memlock 65536B > ... > > # bpftool prog dump jited id 1 > no instructions returned > > After applying this patch: > > # bpftool prog list > 1: kprobe name foo tag b811aab41a39ad3d gpl > loaded_at 2018-05-16T12:13:01+0530 uid 0 > xlated 216B jited 308B memlock 65536B > ... That's really nice! One comment inline below: > # bpftool prog dump jited id 1 > 0: nop > 4: nop > 8: mflrr0 > c: std r0,16(r1) > 10: stdur1,-112(r1) > 14: std r31,104(r1) > 18: addir31,r1,48 > 1c: li r3,10 > ... > > Signed-off-by: Sandipan Das> --- > kernel/bpf/syscall.c | 38 -- > 1 file changed, 32 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 54a72fafe57c..2430d159078c 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > struct bpf_prog_info info = {}; > u32 info_len = attr->info.info_len; > char __user *uinsns; > - u32 ulen; > + u32 ulen, i; > int err; > > err = check_uarg_tail_zero(uinfo, sizeof(info), info_len); > @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > ulen = min_t(u32, info.nr_map_ids, ulen); > if (ulen) { > u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); > - u32 i; > > for (i = 0; i < ulen; i++) > if (put_user(prog->aux->used_maps[i]->id, > @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, >* for offload. >*/ > ulen = info.jited_prog_len; > - info.jited_prog_len = prog->jited_len; > + if (prog->aux->func_cnt) { > + info.jited_prog_len = 0; > + for (i = 0; i < prog->aux->func_cnt; i++) > + info.jited_prog_len += prog->aux->func[i]->jited_len; > + } else { > + info.jited_prog_len = prog->jited_len; > + } > + > if (info.jited_prog_len && ulen) { > if (bpf_dump_raw_ok()) { > uinsns = u64_to_user_ptr(info.jited_prog_insns); > ulen = min_t(u32, info.jited_prog_len, ulen); > - if (copy_to_user(uinsns, prog->bpf_func, ulen)) > - return -EFAULT; > + > + /* for multi-function programs, copy the JITed > + * instructions for all the functions > + */ > + if (prog->aux->func_cnt) { > + u32 len, free; > + u8 *img; > + > + free = ulen; > + for (i = 0; i < prog->aux->func_cnt; i++) { > + len = prog->aux->func[i]->jited_len; > + img = (u8 *) > prog->aux->func[i]->bpf_func; > + if (len > free) > + break; > + if (copy_to_user(uinsns, img, len)) > + return -EFAULT; > + uinsns += len; > + free -= len; Is there any way we can introduce a delimiter between the different images such that they could be more easily correlated with the call from the main (or other sub-)program instead of having one contiguous dump blob? > + } > + } else { > + if (copy_to_user(uinsns, prog->bpf_func, ulen)) > + return -EFAULT; > + } > } else { > info.jited_prog_insns = 0; > } > @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > if (info.nr_jited_ksyms && ulen) { > u64 __user *user_jited_ksyms = > u64_to_user_ptr(info.jited_ksyms); > ulong ksym_addr; > - u32 i; > > /* copy the address of the kernel symbol corresponding to >
Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
On 05/18/2018 02:50 PM, Sandipan Das wrote: > This adds new two new fields to struct bpf_prog_info. For > multi-function programs, these fields can be used to pass > a list of kernel symbol addresses for all functions in a > given program and to userspace using the bpf system call > with the BPF_OBJ_GET_INFO_BY_FD command. > > When bpf_jit_kallsyms is enabled, we can get the address > of the corresponding kernel symbol for a callee function > and resolve the symbol's name. The address is determined > by adding the value of the call instruction's imm field > to __bpf_call_base. This offset gets assigned to the imm > field by the verifier. > > For some architectures, such as powerpc64, the imm field > is not large enough to hold this offset. > > We resolve this by: > > [1] Assigning the subprog id to the imm field of a call > instruction in the verifier instead of the offset of > the callee's symbol's address from __bpf_call_base. > > [2] Determining the address of a callee's corresponding > symbol by using the imm field as an index for the > list of kernel symbol addresses now available from > the program info. > > Suggested-by: Daniel Borkmann <dan...@iogearbox.net> > Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com> > --- > include/uapi/linux/bpf.h | 2 ++ > kernel/bpf/syscall.c | 20 > kernel/bpf/verifier.c| 7 +-- > 3 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index d94d333a8225..040c9cac7303 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2188,6 +2188,8 @@ struct bpf_prog_info { > __u32 xlated_prog_len; > __aligned_u64 jited_prog_insns; > __aligned_u64 xlated_prog_insns; > + __aligned_u64 jited_ksyms; > + __u32 nr_jited_ksyms; > __u64 load_time;/* ns since boottime */ > __u32 created_by_uid; > __u32 nr_map_ids; > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index bfcde949c7f8..54a72fafe57c 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > if (!capable(CAP_SYS_ADMIN)) { > info.jited_prog_len = 0; > info.xlated_prog_len = 0; > + info.nr_jited_ksyms = 0; > goto done; > } > > @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > } > } > > + ulen = info.nr_jited_ksyms; > + info.nr_jited_ksyms = prog->aux->func_cnt; > + if (info.nr_jited_ksyms && ulen) { Since this exposes addresses (though masked one which is correct), this definitely needs to be guarded with bpf_dump_raw_ok() like we do in other places here (see JIT dump for example). > + u64 __user *user_jited_ksyms = > u64_to_user_ptr(info.jited_ksyms); > + ulong ksym_addr; > + u32 i; > + > + /* copy the address of the kernel symbol corresponding to > + * each function > + */ > + ulen = min_t(u32, info.nr_jited_ksyms, ulen); > + for (i = 0; i < ulen; i++) { > + ksym_addr = (ulong) prog->aux->func[i]->bpf_func; > + ksym_addr &= PAGE_MASK; > + if (put_user((u64) ksym_addr, _jited_ksyms[i])) > + return -EFAULT; > + } > + } > + > done: > if (copy_to_user(uinfo, , info_len) || > put_user(info_len, >info.info_len)) > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 6c56cce9c4e3..e826c396aba2 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env) >* later look the same as if they were interpreted only. >*/ > for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) { > - unsigned long addr; > - > if (insn->code != (BPF_JMP | BPF_CALL) || > insn->src_reg != BPF_PSEUDO_CALL) > continue; > insn->off = env->insn_aux_data[i].call_imm; > subprog = find_subprog(env, i + insn->off + 1); > - addr = (unsigned long)func[subprog]->bpf_func; Hmm, in current bpf tree this says 'subprog + 1' here, so this is not rebased against bpf tree but bpf-next (unlike what subject says)? https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351 > - addr &= PAGE_MASK; > - insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) > - addr - __bpf_call_base; > + insn->imm = subprog; > } > > prog->jited = 1; >
Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
On 05/18/2018 02:50 PM, Sandipan Das wrote: > The imm field of a bpf instruction is a signed 32-bit integer. > For JIT bpf-to-bpf function calls, it stores the offset of the > start address of the callee's JITed image from __bpf_call_base. > > For some architectures, such as powerpc64, this offset may be > as large as 64 bits and cannot be accomodated in the imm field > without truncation. > > We resolve this by: > > [1] Additionally using the auxillary data of each function to > keep a list of start addresses of the JITed images for all > functions determined by the verifier. > > [2] Retaining the subprog id inside the off field of the call > instructions and using it to index into the list mentioned > above and lookup the callee's address. > > To make sure that the existing JIT compilers continue to work > without requiring changes, we keep the imm field as it is. > > Signed-off-by: Sandipan Das> --- > kernel/bpf/verifier.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index a9e4b1372da6..6c56cce9c4e3 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env) > insn->src_reg != BPF_PSEUDO_CALL) > continue; > subprog = insn->off; > - insn->off = 0; > insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) > func[subprog]->bpf_func - > __bpf_call_base; > } > + > + /* we use the aux data to keep a list of the start addresses > + * of the JITed images for each function in the program > + * > + * for some architectures, such as powerpc64, the imm field > + * might not be large enough to hold the offset of the start > + * address of the callee's JITed image from __bpf_call_base > + * > + * in such cases, we can lookup the start address of a callee > + * by using its subprog id, available from the off field of > + * the call instruction, as an index for this list > + */ > + func[i]->aux->func = func; > + func[i]->aux->func_cnt = env->subprog_cnt + 1; The target tree you have here is infact bpf, since in bpf-next there was a cleanup where the + 1 is removed. Just for the record that we need to keep this in mind for bpf into bpf-next merge since this would otherwise subtly break. > } > for (i = 0; i < env->subprog_cnt; i++) { > old_bpf_func = func[i]->bpf_func; >
Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
On 05/18/2018 02:50 PM, Sandipan Das wrote: > This adds support for bpf-to-bpf function calls in the powerpc64 > JIT compiler. The JIT compiler converts the bpf call instructions > to native branch instructions. After a round of the usual passes, > the start addresses of the JITed images for the callee functions > are known. Finally, to fixup the branch target addresses, we need > to perform an extra pass. > > Because of the address range in which JITed images are allocated > on powerpc64, the offsets of the start addresses of these images > from __bpf_call_base are as large as 64 bits. So, for a function > call, we cannot use the imm field of the instruction to determine > the callee's address. Instead, we use the alternative method of > getting it from the list of function addresses in the auxillary > data of the caller by using the off field as an index. > > Signed-off-by: Sandipan Das> --- > arch/powerpc/net/bpf_jit_comp64.c | 79 > ++- > 1 file changed, 69 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 1bdb1aff0619..25939892d8f7 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct > codegen_context *ctx, u32 > /* Assemble the body code between the prologue & epilogue */ > static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, > struct codegen_context *ctx, > - u32 *addrs) > + u32 *addrs, bool extra_pass) > { > const struct bpf_insn *insn = fp->insnsi; > int flen = fp->len; > @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 > *image, > break; > > /* > - * Call kernel helper > + * Call kernel helper or bpf function >*/ > case BPF_JMP | BPF_CALL: > ctx->seen |= SEEN_FUNC; > - func = (u8 *) __bpf_call_base + imm; > + > + /* bpf function call */ > + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) Perhaps it might make sense here for !extra_pass to set func to some dummy address as otherwise the 'kernel helper call' branch used for this is a bit misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call() optimizes the immediate addr, I presume the JIT can handle situations where in the final extra_pass the image needs to grow/shrink again (due to different final address for the call)? > + if (fp->aux->func && off < fp->aux->func_cnt) > + /* use the subprog id from the off > + * field to lookup the callee address > + */ > + func = (u8 *) > fp->aux->func[off]->bpf_func; > + else > + return -EINVAL; > + /* kernel helper call */ > + else > + func = (u8 *) __bpf_call_base + imm; > > bpf_jit_emit_func_call(image, ctx, (u64)func); > > @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 > *image, > return 0; > } > > +struct powerpc64_jit_data { > + struct bpf_binary_header *header; > + u32 *addrs; > + u8 *image; > + u32 proglen; > + struct codegen_context ctx; > +}; > + > struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > { > u32 proglen; > @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > u8 *image = NULL; > u32 *code_base; > u32 *addrs; > + struct powerpc64_jit_data *jit_data; > struct codegen_context cgctx; > int pass; > int flen; > @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > struct bpf_prog *org_fp = fp; > struct bpf_prog *tmp_fp; > bool bpf_blinded = false; > + bool extra_pass = false; > > if (!fp->jit_requested) > return org_fp; > @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > fp = tmp_fp; > } > > + jit_data = fp->aux->jit_data; > + if (!jit_data) { > + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); > + if (!jit_data) { > + fp = org_fp; > + goto out; > + } > + fp->aux->jit_data = jit_data; > + } > + > flen = fp->len; > + addrs = jit_data->addrs; > + if (addrs) { > + cgctx = jit_data->ctx; > + image = jit_data->image; > + bpf_hdr = jit_data->header; > + proglen
Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
On 02/27/2018 01:13 PM, Sandipan Das wrote: > "Naveen N. Rao" wrote: >> I'm wondering if we can instead encode the bpf prog id in >> imm32. That way, we should be able to indicate the BPF >> function being called into. Daniel, is that something we >> can consider? > > Since each subprog does not get a separate id, we cannot fetch > the fd and therefore the tag of a subprog. Instead we can use > the tag of the complete program as shown below. > > "Daniel Borkmann" wrote: >> I think one limitation that would still need to be addressed later >> with such approach would be regarding the xlated prog dump in bpftool, >> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation >> of maps and helpers in dump"). Any ideas for this (potentially if we >> could use off + imm for calls, we'd get to 48 bits, but that seems >> still not be enough as you say)? > > As an alternative, this is what I am thinking of: > > Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled, > bpftool looks up the name of the corresponding symbol for the > JIT-ed subprogram and shows it in the xlated dump alongside the > actual call instruction. However, the lookup is based on the > target address which is calculated using the imm field of the > instruction. So, once again, if imm is truncated, we will end > up with the wrong address. Also, the subprog aux data (which > has been proposed as a mitigation for this) is not accessible > from this tool. > > We can still access the tag for the complete bpf program and use > this with the correct offset in an objdump-like notation as an > alterative for the name of the subprog that is the target of a > bpf-to-bpf call instruction. > > Currently, an xlated dump looks like this: >0: (85) call pc+2#bpf_prog_5f76847930402518_F >1: (b7) r0 = 1 >2: (95) exit >3: (b7) r0 = 2 >4: (95) exit > > With this patch, it will look like this: >0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3 (Note the +2 is the insn->off already.) >1: (b7) r0 = 1 >2: (95) exit >3: (b7) r0 = 2 >4: (95) exit > > where 8f85936f29a7790a is the tag of the bpf program and 3 is > the offset to the start of the subprog from the start of the > program. The problem with this approach would be that right now the name is something like bpf_prog_5f76847930402518_F where the subprog tag is just a placeholder so in future, this may well adapt to e.g. the actual function name from the elf file. Note that when kallsyms is enabled then a name like bpf_prog_5f76847930402518_F will also appear in stack traces, perf records, etc, so for correlation/debugging it would really help to have them the same everywhere. Worst case if there's nothing better, potentially what one could do in bpf_prog_get_info_by_fd() is to dump an array of full addresses and have the imm part as the index pointing to one of them, just unfortunate that it's likely only needed in ppc64.
Re: [PATCH] bpf, powerpc: fix jit for seccomp_data access
On 02/21/2018 01:33 AM, Michael Ellerman wrote: > Mark Lordwrites: > >> I am using SECCOMP to filter syscalls on a ppc32 platform, >> and noticed that the JIT compiler was failing on the BPF >> even though the interpreter was working fine. >> >> The issue was that the compiler was missing one of the instructions >> used by SECCOMP, so here is a patch to enable JIT for that instruction. >> >> Signed-Off-By: Mark Lord > > Thanks. > > What is the failure mode without this patch? I assume when you have the > JIT enabled the seccomp filter fails to load entirely? Or do we have > logic to fall back to the interpreter? The logic for all JITs is that once a BPF insn cannot be JITed then it falls back to BPF interpreter. Here, since ppc32 is cBPF the path is: cBPF insn -> cBPF JIT -> fail -> migrate cBPF to eBPF -> run in eBPF interpreter. In the case where interpreter is compiled out (CONFIG_BPF_JIT_ALWAYS_ON), then the filter is rejected. > Either way we should probably back port to stable. I assume this has > been broken since we originally added 32-bit support, so: Note that arm32 before it was converted to be an eBPF JIT (eBPF JITs do handle seccomp BPF in any case) was the only cBPF JIT that had it 'JIT-able', so currently, other cBPF ones like sparc32 or mips32 don't translate it either. Meaning, it would be nice to have as feature; I wouldn't necessarily frame it as a bug though (or at least a stable-urgent one, imho, but I leave that to you, of course). > Fixes: eb84bab0fb38 ("ppc: Kconfig: Enable BPF JIT on ppc32") > Cc: sta...@vger.kernel.org # v4.1+ > > cheers > > >> --- old/arch/powerpc/net/bpf_jit_comp.c 2018-02-16 14:07:01.0 -0500 >> +++ linux/arch/powerpc/net/bpf_jit_comp.c 2018-02-20 >> 14:41:20.805227494 -0500 >> @@ -329,6 +329,9 @@ static int bpf_jit_build_body(struct bpf >> BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4); >> PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, >> len)); >> break; >> + case BPF_LDX | BPF_W | BPF_ABS: /* A = *((u32 >> *)(seccomp_data + K)); */ >> + PPC_LWZ_OFFS(r_A, r_skb, K); >> + break; >> case BPF_LDX | BPF_W | BPF_LEN: /* X = skb->len; */ >> PPC_LWZ_OFFS(r_X, r_skb, offsetof(struct sk_buff, >> len)); >> break; >> -- >> Mark Lord >> Real-Time Remedies Inc. >> ml...@pobox.com
Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
On 02/15/2018 05:25 PM, Daniel Borkmann wrote: > On 02/13/2018 05:05 AM, Sandipan Das wrote: >> The imm field of a bpf_insn is a signed 32-bit integer. For >> JIT-ed bpf-to-bpf function calls, it stores the offset from >> __bpf_call_base to the start of the callee function. >> >> For some architectures, such as powerpc64, it was found that >> this offset may be as large as 64 bits because of which this >> cannot be accomodated in the imm field without truncation. >> >> To resolve this, we additionally make aux->func within each >> bpf_prog associated with the functions to point to the list >> of all function addresses determined by the verifier. >> >> We keep the value assigned to the off field of the bpf_insn >> as a way to index into aux->func and also set aux->func_cnt >> so that this can be used for performing basic upper bound >> checks for the off field. >> >> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com> >> --- >> v2: Make aux->func point to the list of functions determined >> by the verifier rather than allocating a separate callee >> list for each function. > > Approach looks good to me; do you know whether s390x JIT would > have similar requirement? I think one limitation that would still > need to be addressed later with such approach would be regarding the > xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087 > ("bpf: allow for correlation of maps and helpers in dump"). Any > ideas for this (potentially if we could use off + imm for calls, > we'd get to 48 bits, but that seems still not be enough as you say)? One other random thought, although I'm not sure how feasible this is for ppc64 JIT to realize ... but idea would be to have something like the below: diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 29ca920..daa7258 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, return ret; } +void * __weak bpf_jit_image_alloc(unsigned long size) +{ + return module_alloc(size); +} + struct bpf_binary_header * bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, * random section of illegal instructions. */ size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE); - hdr = module_alloc(size); + hdr = bpf_jit_image_alloc(size); if (hdr == NULL) return NULL; And ppc64 JIT could override bpf_jit_image_alloc() in a similar way like some archs would override the module_alloc() helper through a custom implementation, usually via __vmalloc_node_range(), so we could perhaps fit the range for BPF JITed images in a way that they could use the 32bit imm in the end? There are not that many progs loaded typically, so the range could be a bit narrower in such case anyway. (Not sure if this would work out though, but I thought to bring it up.) Thanks, Daniel
Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
On 02/13/2018 05:05 AM, Sandipan Das wrote: > The imm field of a bpf_insn is a signed 32-bit integer. For > JIT-ed bpf-to-bpf function calls, it stores the offset from > __bpf_call_base to the start of the callee function. > > For some architectures, such as powerpc64, it was found that > this offset may be as large as 64 bits because of which this > cannot be accomodated in the imm field without truncation. > > To resolve this, we additionally make aux->func within each > bpf_prog associated with the functions to point to the list > of all function addresses determined by the verifier. > > We keep the value assigned to the off field of the bpf_insn > as a way to index into aux->func and also set aux->func_cnt > so that this can be used for performing basic upper bound > checks for the off field. > > Signed-off-by: Sandipan Das> --- > v2: Make aux->func point to the list of functions determined > by the verifier rather than allocating a separate callee > list for each function. Approach looks good to me; do you know whether s390x JIT would have similar requirement? I think one limitation that would still need to be addressed later with such approach would be regarding the xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump"). Any ideas for this (potentially if we could use off + imm for calls, we'd get to 48 bits, but that seems still not be enough as you say)? Thanks, Daniel
Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT
On 09/01/2017 08:53 PM, Sandipan Das wrote: Take advantage of stack_depth tracking, originally introduced for x64, in powerpc JIT as well. Round up allocated stack by 16 bytes to make sure it stays aligned for functions called from JITed bpf program. Signed-off-by: Sandipan DasAwesome, thanks for following up! :)
Re: [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer
On 01/13/2017 06:10 PM, Naveen N. Rao wrote: With bpf_jit_binary_alloc(), we allocate at a page granularity and fill the rest of the space with illegal instructions to mitigate BPF spraying attacks, while having the actual JIT'ed BPF program at a random location within the allocated space. Under this scenario, it would be better to flush the entire allocated buffer rather than just the part containing the actual program. We already flush the buffer from start to the end of the BPF program. Extend this to include the illegal instructions after the BPF program. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Re: [PATCH 2/3] bpf powerpc: implement support for tail calls
On 09/26/2016 10:56 AM, Naveen N. Rao wrote: On 2016/09/24 03:30AM, Alexei Starovoitov wrote: On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote: On 09/23/2016 10:35 PM, Naveen N. Rao wrote: Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF programs. This can be achieved either by: (1) retaining the stack setup by the first eBPF program and having all subsequent eBPF programs re-using it, or, (2) by unwinding/tearing down the stack and having each eBPF program deal with its own stack as it sees fit. To ensure that this does not create loops, there is a limit to how many tail calls can be done (currently 32). This requires the JIT'ed code to maintain a count of the number of tail calls done so far. Approach (1) is simple, but requires every eBPF program to have (almost) the same prologue/epilogue, regardless of whether they need it. This is inefficient for small eBPF programs which may not sometimes need a prologue at all. As such, to minimize impact of tail call implementation, we use approach (2) here which needs each eBPF program in the chain to use its own prologue/epilogue. This is not ideal when many tail calls are involved and when all the eBPF programs in the chain have similar prologue/epilogue. However, the impact is restricted to programs that do tail calls. Individual eBPF programs are not affected. We maintain the tail call count in a fixed location on the stack and updated tail call count values are passed in through this. The very first eBPF program in a chain sets this up to 0 (the first 2 instructions). Subsequent tail calls skip the first two eBPF JIT instructions to maintain the count. For programs that don't do tail calls themselves, the first two instructions are NOPs. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Thanks for adding support, Naveen, that's really great! I think 2) seems fine as well in this context as prologue size can vary quite a bit here, and depending on program types likelihood of tail call usage as well (but I wouldn't expect deep nesting). Thanks a lot! Great stuff. In this circumstances approach 2 makes sense to me as well. Alexie, Daniel, Thanks for the quick review! The patches would go via Michael's tree (same way as with the JIT itself in the past), right?
Re: [PATCH 2/3] bpf powerpc: implement support for tail calls
On 09/23/2016 10:35 PM, Naveen N. Rao wrote: Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF programs. This can be achieved either by: (1) retaining the stack setup by the first eBPF program and having all subsequent eBPF programs re-using it, or, (2) by unwinding/tearing down the stack and having each eBPF program deal with its own stack as it sees fit. To ensure that this does not create loops, there is a limit to how many tail calls can be done (currently 32). This requires the JIT'ed code to maintain a count of the number of tail calls done so far. Approach (1) is simple, but requires every eBPF program to have (almost) the same prologue/epilogue, regardless of whether they need it. This is inefficient for small eBPF programs which may not sometimes need a prologue at all. As such, to minimize impact of tail call implementation, we use approach (2) here which needs each eBPF program in the chain to use its own prologue/epilogue. This is not ideal when many tail calls are involved and when all the eBPF programs in the chain have similar prologue/epilogue. However, the impact is restricted to programs that do tail calls. Individual eBPF programs are not affected. We maintain the tail call count in a fixed location on the stack and updated tail call count values are passed in through this. The very first eBPF program in a chain sets this up to 0 (the first 2 instructions). Subsequent tail calls skip the first two eBPF JIT instructions to maintain the count. For programs that don't do tail calls themselves, the first two instructions are NOPs. Signed-off-by: Naveen N. RaoThanks for adding support, Naveen, that's really great! I think 2) seems fine as well in this context as prologue size can vary quite a bit here, and depending on program types likelihood of tail call usage as well (but I wouldn't expect deep nesting). Thanks a lot!
Re: [PATCH 3/3] bpf powerpc: add support for bpf constant blinding
On 09/23/2016 10:35 PM, Naveen N. Rao wrote: In line with similar support for other architectures by Daniel Borkmann. 'MOD Default X' from test_bpf without constant blinding: 84 bytes emitted from JIT compiler (pass:3, flen:7) d58a4688 + : 0: nop 4: nop 8: std r27,-40(r1) c: std r28,-32(r1) 10: xor r8,r8,r8 14: xor r28,r28,r28 18: mr r27,r3 1c: li r8,66 20: cmpwi r28,0 24: bne 0x0030 28: li r8,0 2c: b 0x0044 30: divwu r9,r8,r28 34: mullw r9,r28,r9 38: subfr8,r9,r8 3c: rotlwi r8,r8,0 40: li r8,66 44: ld r27,-40(r1) 48: ld r28,-32(r1) 4c: mr r3,r8 50: blr ... and with constant blinding: 140 bytes emitted from JIT compiler (pass:3, flen:11) dbd6ab24 + : 0: nop 4: nop 8: std r27,-40(r1) c: std r28,-32(r1) 10: xor r8,r8,r8 14: xor r28,r28,r28 18: mr r27,r3 1c: lis r2,-22834 20: ori r2,r2,36083 24: rotlwi r2,r2,0 28: xorir2,r2,36017 2c: xoris r2,r2,42702 30: rotlwi r2,r2,0 34: mr r8,r2 38: rotlwi r8,r8,0 3c: cmpwi r28,0 40: bne 0x004c 44: li r8,0 48: b 0x007c 4c: divwu r9,r8,r28 50: mullw r9,r28,r9 54: subfr8,r9,r8 58: rotlwi r8,r8,0 5c: lis r2,-17137 60: ori r2,r2,39065 64: rotlwi r2,r2,0 68: xorir2,r2,39131 6c: xoris r2,r2,48399 70: rotlwi r2,r2,0 74: mr r8,r2 78: rotlwi r8,r8,0 7c: ld r27,-40(r1) 80: ld r28,-32(r1) 84: mr r3,r8 88: blr Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Re: [PATCH net 4/4] lib/test_bpf: Add additional BPF_ADD tests
On 04/05/2016 12:02 PM, Naveen N. Rao wrote: Some of these tests proved useful with the powerpc eBPF JIT port due to sign-extended 16-bit immediate loads. Though some of these aspects get covered in other tests, it is better to have explicit tests so as to quickly tag the precise problem. Cc: Alexei Starovoitov <a...@fb.com> Cc: Daniel Borkmann <dan...@iogearbox.net> Cc: "David S. Miller" <da...@davemloft.net> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Paul Mackerras <pau...@samba.org> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Thanks for adding these! Acked-by: Daniel Borkmann <dan...@iogearbox.net> ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net 3/4] lib/test_bpf: Add test to check for result of 32-bit add that overflows
On 04/05/2016 12:02 PM, Naveen N. Rao wrote: BPF_ALU32 and BPF_ALU64 tests for adding two 32-bit values that results in 32-bit overflow. Cc: Alexei Starovoitov <a...@fb.com> Cc: Daniel Borkmann <dan...@iogearbox.net> Cc: "David S. Miller" <da...@davemloft.net> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Paul Mackerras <pau...@samba.org> Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <dan...@iogearbox.net> ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net 2/4] lib/test_bpf: Add tests for unsigned BPF_JGT
On 04/05/2016 12:02 PM, Naveen N. Rao wrote: Unsigned Jump-if-Greater-Than. Cc: Alexei Starovoitov <a...@fb.com> Cc: Daniel Borkmann <dan...@iogearbox.net> Cc: "David S. Miller" <da...@davemloft.net> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Paul Mackerras <pau...@samba.org> Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <dan...@iogearbox.net> ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net 1/4] lib/test_bpf: Fix JMP_JSET tests
On 04/05/2016 12:02 PM, Naveen N. Rao wrote: JMP_JSET tests incorrectly used BPF_JNE. Fix the same. Cc: Alexei Starovoitov <a...@fb.com> Cc: Daniel Borkmann <dan...@iogearbox.net> Cc: "David S. Miller" <da...@davemloft.net> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Paul Mackerras <pau...@samba.org> Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <dan...@iogearbox.net> --- lib/test_bpf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 27a7a26..e76fa4d 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -4303,7 +4303,7 @@ static struct bpf_test tests[] = { .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0), BPF_LD_IMM64(R1, 3), - BPF_JMP_IMM(BPF_JNE, R1, 2, 1), + BPF_JMP_IMM(BPF_JSET, R1, 2, 1), BPF_EXIT_INSN(), BPF_ALU32_IMM(BPF_MOV, R0, 1), BPF_EXIT_INSN(), @@ -4317,7 +4317,7 @@ static struct bpf_test tests[] = { .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0), BPF_LD_IMM64(R1, 3), - BPF_JMP_IMM(BPF_JNE, R1, 0x, 1), + BPF_JMP_IMM(BPF_JSET, R1, 0x, 1), BPF_EXIT_INSN(), BPF_ALU32_IMM(BPF_MOV, R0, 1), BPF_EXIT_INSN(), @@ -4474,7 +4474,7 @@ static struct bpf_test tests[] = { BPF_ALU32_IMM(BPF_MOV, R0, 0), BPF_LD_IMM64(R1, 3), BPF_LD_IMM64(R2, 2), - BPF_JMP_REG(BPF_JNE, R1, R2, 1), + BPF_JMP_REG(BPF_JSET, R1, R2, 1), BPF_EXIT_INSN(), BPF_ALU32_IMM(BPF_MOV, R0, 1), BPF_EXIT_INSN(), @@ -4489,7 +4489,7 @@ static struct bpf_test tests[] = { BPF_ALU32_IMM(BPF_MOV, R0, 0), BPF_LD_IMM64(R1, 3), BPF_LD_IMM64(R2, 0x), - BPF_JMP_REG(BPF_JNE, R1, R2, 1), + BPF_JMP_REG(BPF_JSET, R1, R2, 1), BPF_EXIT_INSN(), BPF_ALU32_IMM(BPF_MOV, R0, 1), BPF_EXIT_INSN(), ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF
On 04/01/2016 08:10 PM, Alexei Starovoitov wrote: On 4/1/16 2:58 AM, Naveen N. Rao wrote: PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2. Enable with: echo 1 > /proc/sys/net/core/bpf_jit_enable or echo 2 > /proc/sys/net/core/bpf_jit_enable ... to see the generated JIT code. This can further be processed with tools/net/bpf_jit_disasm. With CONFIG_TEST_BPF=m and 'modprobe test_bpf': test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed] ... on both ppc64 BE and LE. The details of the approach are documented through various comments in the code, as are the TODOs. Some of the prominent TODOs include implementing BPF tail calls and skb loads. Cc: Matt EvansCc: Michael Ellerman Cc: Paul Mackerras Cc: Alexei Starovoitov Cc: "David S. Miller" Cc: Ananth N Mavinakayanahalli Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/ppc-opcode.h | 19 +- arch/powerpc/net/Makefile | 4 + arch/powerpc/net/bpf_jit.h| 66 ++- arch/powerpc/net/bpf_jit64.h | 58 +++ arch/powerpc/net/bpf_jit_comp64.c | 828 ++ 5 files changed, 973 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/net/bpf_jit64.h create mode 100644 arch/powerpc/net/bpf_jit_comp64.c ... -#ifdef CONFIG_PPC64 +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) impressive stuff! +1, awesome to see another one! Everything nicely documented. Could you add few words for the above condition as well ? Or may be a new macro, since it occurs many times? What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ? Will there ever be v3 ? Minor TODO would also be to convert to use bpf_jit_binary_alloc() and bpf_jit_binary_free() API for the image, which is done by other eBPF jits, too. So far most of the bpf jits were going via net-next tree, but if in this case no changes to the core is necessary then I guess it's fine to do it via powerpc tree. What's your plan? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value
On 03/31/2016 07:46 PM, Alexei Starovoitov wrote: On 3/31/16 4:25 AM, Naveen N. Rao wrote: While at it, fix some typos in the comment. Cc: Alexei StarovoitovCc: David S. Miller Cc: Ananth N Mavinakayanahalli Cc: Michael Ellerman Signed-off-by: Naveen N. Rao --- samples/bpf/Makefile | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 502c9fc..88bc5a0 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -76,16 +76,13 @@ HOSTLOADLIBES_offwaketime += -lelf HOSTLOADLIBES_spintest += -lelf HOSTLOADLIBES_map_perf_test += -lelf -lrt -# point this to your LLVM backend with bpf support -LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc - -# asm/sysreg.h inline assmbly used by it is incompatible with llvm. -# But, ehere is not easy way to fix it, so just exclude it since it is +# asm/sysreg.h - inline assembly used by it is incompatible with llvm. +# But, there is no easy way to fix it, so just exclude it since it is # useless for BPF samples. $(obj)/%.o: $(src)/%.c clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ --O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@ +-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@ clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ --O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s +-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s that was a workaround when clang/llvm didn't have bpf support. Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove manual calls to llc completely. Just use 'clang -target bpf -O2 -D... -c $< -o $@' +1, the clang part in that Makefile should also more correctly be called with '-target bpf' as it turns out (despite llc with '-march=bpf' ...). Better to use clang directly as suggested by Alexei. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
On 01/05/2016 05:03 PM, Rabin Vincent wrote: On Tue, Jan 05, 2016 at 08:00:45AM -0800, Eric Dumazet wrote: On Tue, 2016-01-05 at 16:23 +0100, Rabin Vincent wrote: The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data instructions since it XORs A with X while all the others replace A with some loaded value. All the BPF JITs fail to clear A if this is used as the first instruction in a filter. Is x86_64 part of this 'All' subset ? ;) No, because it's an eBPF JIT. Correct, filter conversion to eBPF clears it already. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
On 01/05/2016 04:23 PM, Rabin Vincent wrote: The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data instructions since it XORs A with X while all the others replace A with some loaded value. All the BPF JITs fail to clear A if this is used as the first instruction in a filter. This was found using american fuzzy lop. Add a helper to determine if A needs to be cleared given the first instruction in a filter, and use this in the JITs. Except for ARM, the rest have only been compile-tested. Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum") Signed-off-by: Rabin Vincent <ra...@rab.in> Excellent catch, thanks a lot! The fix looks good to me and should go to -net tree. Acked-by: Daniel Borkmann <dan...@iogearbox.net> If you're interested, feel free to add a small test case for the SKF_AD_ALU_XOR_X issue to lib/test_bpf.c for -net-next tree. Thanks! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32
On 02/16/2015 08:13 AM, Denis Kirjanov wrote: ... Well, I don't see significant challenges to enable eBPF on ppc64 in the future. I'll start working on it after I get this merged Cool, awesome! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32
On 02/15/2015 07:06 PM, Denis Kirjanov wrote: This patch series enables BPF JIT on ppc32. There are relatevily few chnages in the code to make it work. All test_bpf tests passed both on 7447a and P2041-based machines. I'm just wondering, next to the feedback that has already been provided, would opening this up for ppc32 make it significantly more difficult in future to migrate from classic BPF JIT to eBPF JIT eventually (which is what we want long-term)? Being curious, is there any ongoing effort from ppc people? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
ppc64 and Documentation/mic/mpssd/mpssd.c issues
Hi Caz, I think commit ... commit 8d49751580db804a02caf6a5b7cebe2ff26c0d7e Author: Caz Yokoyama caz.yokoy...@intel.com Date: Thu Sep 5 16:42:39 2013 -0700 Sample Implementation of Intel MIC User Space Daemon. ... actually triggers a build error on my default config ppc64 (I'm using net-next tree): HOSTCC Documentation/mic/mpssd/mpssd.o Documentation/mic/mpssd/mpssd.c:93: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:96: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:113: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:116: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:119: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:146: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:149: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:151: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:152: error: braced-group within expression allowed only inside a function make[3]: *** [Documentation/mic/mpssd/mpssd.o] Error 1 make[2]: *** [Documentation/mic/mpssd] Error 2 make[1]: *** [Documentation/mic] Error 2 make: *** [vmlinux] Error 2 gcc --version gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) Cheers, Daniel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc64 and Documentation/mic/mpssd/mpssd.c issues
On 12/04/2014 05:57 PM, Sudeep Dutt wrote: On Thu, 2014-12-04 at 08:48 -0800, Yokoyama, Caz wrote: Thank you for the report. Have you tried to compile on x86_64? Which rhel do you use? Rhel6.X? MIC card is expected to work with x86_64 host, not with ppc64. We have never compiled on ppc host while our primary development platform was rhel6.X. Yep, I was using RHEL6 on ppc64 by chance to build an upstream kernel. You might want to fix the Kconfig entry then to limit this to x86_64 only. ;) Thanks, Daniel I am adding Peter Foley since he enabled the mpssd build by default recently with commit adb19fb. Peter, is there a way to enable the mpssd build only for x86_64? Thanks, Sudeep Dutt Line 92 is the first use of htole16 and MIC_VRING_ENTRIES. Is there any change on them? Thank you. -caz, caz.yokoy...@intel.com, yokoy...@member.fsf.org -Original Message- From: Daniel Borkmann [mailto:dbork...@redhat.com] Sent: Thursday, December 04, 2014 5:42 AM To: Yokoyama, Caz Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: ppc64 and Documentation/mic/mpssd/mpssd.c issues Hi Caz, I think commit ... commit 8d49751580db804a02caf6a5b7cebe2ff26c0d7e Author: Caz Yokoyama caz.yokoy...@intel.com Date: Thu Sep 5 16:42:39 2013 -0700 Sample Implementation of Intel MIC User Space Daemon. ... actually triggers a build error on my default config ppc64 (I'm using net-next tree): HOSTCC Documentation/mic/mpssd/mpssd.o Documentation/mic/mpssd/mpssd.c:93: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:96: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:113: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:116: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:119: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:146: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:149: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:151: error: braced-group within expression allowed only inside a function Documentation/mic/mpssd/mpssd.c:152: error: braced-group within expression allowed only inside a function make[3]: *** [Documentation/mic/mpssd/mpssd.o] Error 1 make[2]: *** [Documentation/mic/mpssd] Error 2 make[1]: *** [Documentation/mic] Error 2 make: *** [vmlinux] Error 2 gcc --version gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) Cheers, Daniel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
On 11/01/2014 07:00 PM, David Miller wrote: From: Denis Kirjanov k...@linux-powerpc.org Date: Sat, 1 Nov 2014 21:49:27 +0400 David, you need a feedback from other guys to apply this patch, right? Alexei wanted some output before/after the patch. Michael Ellerman wanted the explanation what a BPF_ANC | SKF_AD_PKTTYPE means. The BPF_ANC | SKF_AD_PKTTYPE case means that this is an ancillary operation aka BPF extension which loads the value of skb-pkt_type into the accumulator. A similar transformation, that is, from BPF into eBPF insns can be found in convert_bpf_extensions() in the SKF_AD_PKTTYPE case, or commit 709f6c58d4dc (sparc: bpf_jit: add SKF_AD_PKTTYPE support to JIT) that recently enabled the same in sparc. So I'm waiting the ack/nack from them... I don't really think performance metrics are necessary just for adding SKF_AD_PKTTYPE support, that's sort of an over the top requirement if you ask me. Right, lib/test_bpf.c actually brings the quoted output w/ numbers for free. I think the important point was that the 'After:' case with ``echo 1 /proc/sys/net/core/bpf_jit_enable'' runs through for that test case, which has been shown here. It's pretty obvious that we should support as many operations as possible to each JIT, because all of program has to do is use that unsupported opcode and then we have none of that program being JIT'd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
On 10/31/2014 07:09 AM, Denis Kirjanov wrote: On 10/30/14, Alexei Starovoitov alexei.starovoi...@gmail.com wrote: On Wed, Oct 29, 2014 at 11:12 PM, Denis Kirjanov k...@linux-powerpc.org wrote: Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load skb-pkt_type field. Before: [ 88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS [ 88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS After: [ 80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS [ 80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS if you'd only quoted #12, it would all make sense ;) but #11 test is not using PKTTYPE. So your patch shouldn't make a difference. Are these numbers with JIT on and off? Right. Ok. Please mention this in future log messages, as it was not quite clear that before was actually with JIT off, and after was with JIT on. One could have read it that actually both cases were with JIT on, and thus the inconsistent result for LD_IND_NET is a bit confusing since you've quoted it here as well. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/2] powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test
On 06/27/2014 07:08 AM, Michael Ellerman wrote: On Thu, 2014-06-26 at 10:36 +0200, Daniel Borkmann wrote: On 06/26/2014 10:30 AM, Michael Ellerman wrote: On Wed, 2014-06-25 at 21:34 +0400, Denis Kirjanov wrote: We have to return the boolean here if the tag presents or not, not just ANDing the TCI with the mask which results to: [ 709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT [ 709.412245] ret 4096 != 1 [ 709.412332] ret 4096 != 1 [ 709.412333] FAIL (2 times) Hi Denis, Is this the same version of test_bpf that is in Linus' tree? I don't see any fails with that version, am I missing something? [ 187.690640] test_bpf: #18 LD_VLAN_TAG_PRESENT 46 50 PASS Did you try to modprobe test_bpf after enabling JIT, i.e. : echo 1 /proc/sys/net/core/bpf_jit_enable Last time I tested with ppc64, I saw the interpreter passing while JIT failed, which the fix above should address. Right, I thought CONFIG_BPF_JIT=y was sufficient. But that just makes it available, I need to *also* enable it at runtime. Yes, it's an extra runtime knob which still needs to be enabled by the admin for security reasons. This knob also allows you to enable the debugging interface `echo 2 /proc/sys/net/core/bpf_jit_enable` where the disassembly can be read out via: tools/net/bpf_jit_disasm.c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/2] powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test
On 06/26/2014 10:30 AM, Michael Ellerman wrote: On Wed, 2014-06-25 at 21:34 +0400, Denis Kirjanov wrote: We have to return the boolean here if the tag presents or not, not just ANDing the TCI with the mask which results to: [ 709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT [ 709.412245] ret 4096 != 1 [ 709.412332] ret 4096 != 1 [ 709.412333] FAIL (2 times) Hi Denis, Is this the same version of test_bpf that is in Linus' tree? I don't see any fails with that version, am I missing something? [ 187.690640] test_bpf: #18 LD_VLAN_TAG_PRESENT 46 50 PASS Did you try to modprobe test_bpf after enabling JIT, i.e. : echo 1 /proc/sys/net/core/bpf_jit_enable Last time I tested with ppc64, I saw the interpreter passing while JIT failed, which the fix above should address. Cheers, Daniel ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: bpf_jit: support MOD operation
On 09/03/2013 10:52 PM, Daniel Borkmann wrote: On 09/03/2013 09:58 PM, Vladimir Murzin wrote: [...] Do you have a test case/suite by any chance ? Ben. Hi Ben! Thanks for your feedback. This patch is only compile tested. I have no real hardware, but I'll probably bring up qemu ppc64 till end of the week... Meanwhile, I've made simple how-to for testing. You can use it if you wish. It is mainly based on the [1] and rechecked on x86-64. Please also cc netdev on BPF related changes. Actually, your test plan can be further simplified ... For retrieving and disassembling the JIT image, we have bpf_jit_disasm [1]. 1) echo 2 /proc/sys/net/core/bpf_jit_enable 2) ... attach filter ... 3) bpf_jit_disasm -o For generating a simple stupid test filter, you can use bpfc [2] (also see its man page). E.g. ... # cat blub ldi #10 mod #8 ret a # bpfc blub { 0x0, 0, 0, 0x000a }, { 0x94, 0, 0, 0x0008 }, { 0x16, 0, 0, 0x }, Plus something like ... ldxi #0 mod x ret a For longer-term testing, also trinity has BPF support. ;) And load this array e.g. either into a small C program that attaches this as BPF filter, or simply do bpfc blub blub2 and run netsniff-ng -f blub2\ -s -i eth0, that should also do it. Then, when attached, the kernel should truncate incoming frames for pf_packet into max length of 2, just as an example. [1] kernel tree, tools/net/bpf_jit_disasm.c [2] git clone git://github.com/borkmann/netsniff-ng.git ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: bpf_jit: support MOD operation
On 09/03/2013 09:58 PM, Vladimir Murzin wrote: On Tue, Sep 03, 2013 at 06:45:50AM +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-09-02 at 19:48 +0200, Vladimir Murzin wrote: Ping On Wed, Aug 28, 2013 at 02:49:52AM +0400, Vladimir Murzin wrote: commit b6069a9570 (filter: add MOD operation) added generic support for modulus operation in BPF. Sorry, nobody got a chance to review that yet. Unfortunately Matt doesn't work for us anymore and none of us has experience with the BPF code, so somebody (possibly me) will need to spend a bit of time figuring it out before verifying that is correct. Do you have a test case/suite by any chance ? Ben. Hi Ben! Thanks for your feedback. This patch is only compile tested. I have no real hardware, but I'll probably bring up qemu ppc64 till end of the week... Meanwhile, I've made simple how-to for testing. You can use it if you wish. It is mainly based on the [1] and rechecked on x86-64. Please also cc netdev on BPF related changes. Actually, your test plan can be further simplified ... For retrieving and disassembling the JIT image, we have bpf_jit_disasm [1]. 1) echo 2 /proc/sys/net/core/bpf_jit_enable 2) ... attach filter ... 3) bpf_jit_disasm -o For generating a simple stupid test filter, you can use bpfc [2] (also see its man page). E.g. ... # cat blub ldi #10 mod #8 ret a # bpfc blub { 0x0, 0, 0, 0x000a }, { 0x94, 0, 0, 0x0008 }, { 0x16, 0, 0, 0x }, And load this array e.g. either into a small C program that attaches this as BPF filter, or simply do bpfc blub blub2 and run netsniff-ng -f blub2\ -s -i eth0, that should also do it. Then, when attached, the kernel should truncate incoming frames for pf_packet into max length of 2, just as an example. [1] kernel tree, tools/net/bpf_jit_disasm.c [2] git clone git://github.com/borkmann/netsniff-ng.git 1. get the tcpdump utility (git clone git://bpf.tcpdump.org/tcpdump) 2. get the libcap library (git clone git://bpf.tcpdump.org/libpcap) 2.1. apply patch for libcap [2] (against libcap-1.3 branch) 2.2. build libcap (./configure make ln -s libcap.so.1.3.0 libcap.so) 3. build tcpdump (LDFLAGS=-L/path/to/libcap ./configure make) 4. run # ./tcpdump -d (ip[2:2] - 20) % 5 != 0 ip[6] 0x20 = 0x20 (000) ldh [14] (001) jeq #0x800 jt 2 jf 10 (002) ldh [18] (003) sub #20 (004) mod #5 (005) jeq #0x0 jt 10 jf 6 (006) ldb [22] (007) and #0x20 (008) jeq #0x20 jt 9 jf 10 (009) ret #65535 (010) ret #0 to get pseudo code (we are interested the most into line #4) 5. enable bpf jit compiler # echo 2 /proc/sys/net/core/bpf_jit_enable 6. run ./tcpdump -nv (ip[2:2] - 20) % 5 != 0 ip[6] 0x20 = 0x20 7. check dmesg for lines starting with (output for x86-64 is provided as an example) [ 3768.329253] flen=11 proglen=99 pass=3 image=a003c000 [ 3768.329254] JIT code: a003c000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60 [ 3768.329255] JIT code: a003c010: 44 2b 4f 64 4c 8b 87 c0 00 00 00 0f b7 47 76 86 [ 3768.329256] JIT code: a003c020: c4 3d 00 08 00 00 75 37 be 02 00 00 00 e8 9f 3e [ 3768.329257] JIT code: a003c030: 02 e1 83 e8 14 31 d2 b9 05 00 00 00 f7 f1 89 d0 [ 3768.329258] JIT code: a003c040: 85 c0 74 1b be 06 00 00 00 e8 9f 3e 02 e1 25 20 [ 3768.329259] JIT code: a003c050: 00 00 00 83 f8 20 75 07 b8 ff ff 00 00 eb 02 31 [ 3768.329259] JIT code: a003c060: c0 c9 c3 8. make sure generated opcodes (JIT code) implement pseudo code form step 4. Reference [1] http://comments.gmane.org/gmane.linux.network/242456 [2] http://permalink.gmane.org/gmane.network.tcpdump.devel/5973 P.S. I hope net people will corect me if I'm wrong there Cheers Vladimir Murzin This patch brings JIT support for PPC64 Signed-off-by: Vladimir Murzin murzi...@gmail.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: fix ics_rtas_init and start_secondary section mismatch
It seems, we're fine with just annotating the two functions. Thus, this fixes the following build warnings on ppc64: WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1664): The function .ics_rtas_init() references the function __init .xics_register_ics(). This is often because .ics_rtas_init lacks a __init annotation or the annotation of .xics_register_ics is wrong. WARNING: arch/powerpc/sysdev/built-in.o(.text+0x6044): The function .ics_rtas_init() references the function __init .xics_register_ics(). This is often because .ics_rtas_init lacks a __init annotation or the annotation of .xics_register_ics is wrong. WARNING: arch/powerpc/kernel/built-in.o(.text+0x2db30): The function .start_secondary() references the function __cpuinit .vdso_getcpu_init(). This is often because .start_secondary lacks a __cpuinit annotation or the annotation of .vdso_getcpu_init is wrong. Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Daniel Borkmann dbork...@redhat.com --- Note: compile-tested only! arch/powerpc/kernel/smp.c |2 +- arch/powerpc/sysdev/xics/ics-rtas.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 793401e..76bd9da 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -610,7 +610,7 @@ static struct device_node *cpu_to_l2cache(int cpu) } /* Activate a secondary processor. */ -void start_secondary(void *unused) +__cpuinit void start_secondary(void *unused) { unsigned int cpu = smp_processor_id(); struct device_node *l2_cache; diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c index c782f85..936575d 100644 --- a/arch/powerpc/sysdev/xics/ics-rtas.c +++ b/arch/powerpc/sysdev/xics/ics-rtas.c @@ -213,7 +213,7 @@ static int ics_rtas_host_match(struct ics *ics, struct device_node *node) return !of_device_is_compatible(node, chrp,iic); } -int ics_rtas_init(void) +__init int ics_rtas_init(void) { ibm_get_xive = rtas_token(ibm,get-xive); ibm_set_xive = rtas_token(ibm,set-xive); -- 1.7.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev