Re: [PATCH 1/2] MAINTAINERS: Update email address of Naveen

2024-07-17 Thread Daniel Borkmann

On 7/17/24 11:43 PM, Masami Hiramatsu (Google) wrote:

On Wed, 17 Jul 2024 13:58:35 +1000
Michael Ellerman  wrote:

Masami Hiramatsu (Google)  writes:

On Sun, 14 Jul 2024 14:04:23 +0530
Naveen N Rao  wrote:


I have switched to using my @kernel.org id for my contributions. Update
MAINTAINERS and mailmap to reflect the same.


Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) 

Would powerpc maintainer pick this?


Yeah I can take both.


Thank you for pick them up!


Looks like patchbot did not send a reply, but I already took them
to bpf tree.

Thanks,
Daniel


Re: [PATCH] arch: powerpc: net: bpf_jit_comp32.c: Fixed 'instead' typo

2023-10-16 Thread Daniel Borkmann

On 10/13/23 7:31 AM, Muhammad Muzammil wrote:

Fixed 'instead' typo

Signed-off-by: Muhammad Muzammil 


Michael, I presume you'll pick it up?

Thanks,
Daniel


---
  arch/powerpc/net/bpf_jit_comp32.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 7f91ea064c08..bc7f92ec7f2d 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -940,7 +940,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 * !fp->aux->verifier_zext. Emit NOP otherwise.
 *
 * Note that "li reg_h,0" is emitted for 
BPF_B/H/W case,
-* if necessary. So, jump there insted of 
emitting an
+* if necessary. So, jump there instead of 
emitting an
 * additional "li reg_h,0" instruction.
 */
if (size == BPF_DW && !fp->aux->verifier_zext)





Re: [PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator

2023-10-16 Thread Daniel Borkmann

On 10/12/23 10:03 PM, Hari Bathini wrote:

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
   to avoid performance overhead observed on ppc32 with that.


I presume this will be routed via Michael?

Thanks,
Daniel


Re: [bpf-next v1] bpf: drop deprecated bpf_jit_enable == 2

2023-01-03 Thread Daniel Borkmann

On 1/3/23 2:24 PM, x...@infragraf.org wrote:

From: Tonghao Zhang 

The x86_64 can't dump the valid insn in this way. A test BPF prog
which include subprog:

$ llvm-objdump -d subprog.o
Disassembly of section .text:
 :
0:  18 01 00 00 73 75 62 70 00 00 00 00 72 6f 67 00 r1 = 
29114459903653235 ll
2:  7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3:  bf a1 00 00 00 00 00 00 r1 = r10
4:  07 01 00 00 f8 ff ff ff r1 += -8
5:  b7 02 00 00 08 00 00 00 r2 = 8
6:  85 00 00 00 06 00 00 00 call 6
7:  95 00 00 00 00 00 00 00 exit
Disassembly of section raw_tp/sys_enter:
 :
0:  85 10 00 00 ff ff ff ff call -1
1:  b7 00 00 00 00 00 00 00 r0 = 0
2:  95 00 00 00 00 00 00 00 exit

kernel print message:
[  580.775387] flen=8 proglen=51 pass=3 image=a000c20c from=kprobe-load 
pid=1643
[  580.777236] JIT code: : cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
cc
[  580.779037] JIT code: 0010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
cc
[  580.780767] JIT code: 0020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
cc
[  580.782568] JIT code: 0030: cc cc cc

$ bpf_jit_disasm
51 bytes emitted from JIT compiler (pass:3, flen:8)
a000c20c + :
0:  int3
1:  int3
2:  int3
3:  int3
4:  int3
5:  int3
...

Until bpf_jit_binary_pack_finalize is invoked, we copy rw_header to header
and then image/insn is valid. BTW, we can use the "bpftool prog dump" JITed 
instructions.

* clean up the doc
* remove bpf_jit_disasm tool
* set bpf_jit_enable only 0 or 1.

Signed-off-by: Tonghao Zhang 
Suggested-by: Alexei Starovoitov 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Andrii Nakryiko 
Cc: Martin KaFai Lau 
Cc: Song Liu 
Cc: Yonghong Song 
Cc: John Fastabend 
Cc: KP Singh 
Cc: Stanislav Fomichev 
Cc: Hao Luo 
Cc: Jiri Olsa 
Cc: Hou Tao 
---
  Documentation/admin-guide/sysctl/net.rst |   2 +-
  Documentation/networking/filter.rst  |  98 +--
  arch/arm/net/bpf_jit_32.c|   4 -
  arch/arm64/net/bpf_jit_comp.c|   4 -
  arch/loongarch/net/bpf_jit.c |   4 -
  arch/mips/net/bpf_jit_comp.c |   3 -
  arch/powerpc/net/bpf_jit_comp.c  |  11 -
  arch/riscv/net/bpf_jit_core.c|   3 -
  arch/s390/net/bpf_jit_comp.c |   4 -
  arch/sparc/net/bpf_jit_comp_32.c |   3 -
  arch/sparc/net/bpf_jit_comp_64.c |  13 -
  arch/x86/net/bpf_jit_comp.c  |   3 -
  arch/x86/net/bpf_jit_comp32.c|   3 -
  net/core/sysctl_net_core.c   |  14 +-
  tools/bpf/.gitignore |   1 -
  tools/bpf/Makefile   |  10 +-
  tools/bpf/bpf_jit_disasm.c   | 332 ---
  17 files changed, 8 insertions(+), 504 deletions(-)
  delete mode 100644 tools/bpf/bpf_jit_disasm.c

diff --git a/Documentation/admin-guide/sysctl/net.rst 
b/Documentation/admin-guide/sysctl/net.rst
index 6394f5dc2303..45d3d965276c 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -87,7 +87,7 @@ Values:
  
  	- 0 - disable the JIT (default value)

- 1 - enable the JIT
-   - 2 - enable the JIT and ask the compiler to emit traces on kernel log.
+   - 2 - deprecated in linux 6.2


I'd make it more clear in the docs and reword it as follows (also, it'll be in 
6.3, not 6.2):

- 2 - enable the JIT and ask the compiler to emit traces on kernel log.
  (deprecated since v6.3, use ``bpftool prog dump jited id `` 
instead)

  
  bpf_jit_harden

  --

[...]

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 5b1ce656baa1..731a2eb0f68e 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -275,16 +275,8 @@ static int proc_dointvec_minmax_bpf_enable(struct 
ctl_table *table, int write,
  
  	tmp.data = _enable;

ret = proc_dointvec_minmax(, write, buffer, lenp, ppos);
-   if (write && !ret) {
-   if (jit_enable < 2 ||
-   (jit_enable == 2 && bpf_dump_raw_ok(current_cred( {
-   *(int *)table->data = jit_enable;
-   if (jit_enable == 2)
-   pr_warn("bpf_jit_enable = 2 was set! NEVER use this 
in production, only for JIT debugging!\n");
-   } else {
-   ret = -EPERM;
-   }
-   }
+   if (write && !ret)
+   *(int *)table->data = jit_enable;
  
  	if (write && ret && min == max)

pr_info_once("CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is 
permanently set to 1.\n");
@@ -395,7 +387,7 @@ static struct ctl_table net_core_table[] = {
.extra2 = SYSCTL_ONE,
  # else

Re: [PATCH 0/5] Atomics support for eBPF on powerpc

2022-05-12 Thread Daniel Borkmann

On 5/12/22 9:45 AM, Hari Bathini wrote:

This patchset adds atomic operations to the eBPF instruction set on
powerpc. The instructions that are added here can be summarised with
this list of kernel operations for ppc64:

* atomic[64]_[fetch_]add
* atomic[64]_[fetch_]and
* atomic[64]_[fetch_]or
* atomic[64]_[fetch_]xor
* atomic[64]_xchg
* atomic[64]_cmpxchg

and this list of kernel operations for ppc32:

* atomic_[fetch_]add
* atomic_[fetch_]and
* atomic_[fetch_]or
* atomic_[fetch_]xor
* atomic_xchg
* atomic_cmpxchg

The following are left out of scope for this effort:

* 64 bit operations on ppc32.
* Explicit memory barriers, 16 and 8 bit operations on both ppc32
   & ppc64.

The first patch adds support for bitwsie atomic operations on ppc64.
The next patch adds fetch variant support for these instructions. The
third patch adds support for xchg and cmpxchg atomic operations on
ppc64. Patch #4 adds support for 32-bit atomic bitwise operations on
ppc32. patch #5 adds support for xchg and cmpxchg atomic operations
on ppc32.


Thanks for adding these, Hari! I presume they'll get routed via Michael,
right? One thing that may be worth adding to the commit log as well is
the test result from test_bpf.ko given it has an extensive suite around
atomics useful for testing corner cases in JITs.


Hari Bathini (5):
   bpf ppc64: add support for BPF_ATOMIC bitwise operations
   bpf ppc64: add support for atomic fetch operations
   bpf ppc64: Add instructions for atomic_[cmp]xchg
   bpf ppc32: add support for BPF_ATOMIC bitwise operations
   bpf ppc32: Add instructions for atomic_[cmp]xchg

  arch/powerpc/net/bpf_jit_comp32.c | 62 +-
  arch/powerpc/net/bpf_jit_comp64.c | 87 +--
  2 files changed, 108 insertions(+), 41 deletions(-)





Re: [PATCH 01/13] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()

2022-01-07 Thread Daniel Borkmann

On 1/6/22 12:45 PM, Naveen N. Rao wrote:

task_pt_regs() can return NULL on powerpc for kernel threads. This is
then used in __bpf_get_stack() to check for user mode, resulting in a
kernel oops. Guard against this by checking return value of
task_pt_regs() before trying to obtain the call chain.

Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()")
Cc: sta...@vger.kernel.org # v5.9+
Signed-off-by: Naveen N. Rao 


Acked-by: Daniel Borkmann 


Re: [PATCH] powerpc/bpf: fix write protecting JIT code

2021-10-25 Thread Daniel Borkmann

On 10/25/21 8:15 AM, Naveen N. Rao wrote:

Hari Bathini wrote:

Running program with bpf-to-bpf function calls results in data access
exception (0x300) with the below call trace:

    [c0113f28] bpf_int_jit_compile+0x238/0x750 (unreliable)
    [c037d2f8] bpf_check+0x2008/0x2710
    [c0360050] bpf_prog_load+0xb00/0x13a0
    [c0361d94] __sys_bpf+0x6f4/0x27c0
    [c0363f0c] sys_bpf+0x2c/0x40
    [c0032434] system_call_exception+0x164/0x330
    [c000c1e8] system_call_vectored_common+0xe8/0x278

as bpf_int_jit_compile() tries writing to write protected JIT code
location during the extra pass.

Fix it by holding off write protection of JIT code until the extra
pass, where branch target addresses fixup happens.

Cc: sta...@vger.kernel.org
Fixes: 62e3d4210ac9 ("powerpc/bpf: Write protect JIT code")
Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Thanks for the fix!

Reviewed-by: Naveen N. Rao 


LGTM, I presume this fix will be routed via Michael.

BPF selftests have plenty of BPF-to-BPF calls in there, too bad this was
caught so late. :/


Re: [PATCH v4 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler

2021-10-04 Thread Daniel Borkmann

On 10/4/21 12:49 AM, Michael Ellerman wrote:

Daniel Borkmann  writes:

On 9/29/21 1:18 PM, Hari Bathini wrote:

Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT
compiler code with the aim to simplify adding BPF_PROBE_MEM support.
Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding
branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64
& PPC32 JIT compilers respectively. Patch #6 & #8 handle bad userspace
pointers for PPC64 & PPC32 cases respectively.


Michael, are you planning to pick up the series or shall we route via bpf-next?


Yeah I'll plan to take it, unless you think there is a strong reason it
needs to go via the bpf tree (doesn't look like it from the diffstat).


Sounds good to me, in that case, please also route the recent JIT fixes from
Naveen through your tree.

Thanks,
Daniel


Re: [PATCH v4 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler

2021-09-30 Thread Daniel Borkmann

On 9/29/21 1:18 PM, Hari Bathini wrote:

Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT
compiler code with the aim to simplify adding BPF_PROBE_MEM support.
Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding
branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64
& PPC32 JIT compilers respectively. Patch #6 & #8 handle bad userspace
pointers for PPC64 & PPC32 cases respectively.


Michael, are you planning to pick up the series or shall we route via bpf-next?

Thanks,
Daniel


Changes in v4:
* Addressed all the review comments from Christophe.


Hari Bathini (4):
   bpf powerpc: refactor JIT compiler code
   powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
   bpf ppc32: Add BPF_PROBE_MEM support for JIT
   bpf ppc32: Access only if addr is kernel address

Ravi Bangoria (4):
   bpf powerpc: Remove unused SEEN_STACK
   bpf powerpc: Remove extra_pass from bpf_jit_build_body()
   bpf ppc64: Add BPF_PROBE_MEM support for JIT
   bpf ppc64: Access only if addr is kernel address

  arch/powerpc/include/asm/ppc-opcode.h |   2 +
  arch/powerpc/net/bpf_jit.h|  19 +++--
  arch/powerpc/net/bpf_jit_comp.c   |  72 --
  arch/powerpc/net/bpf_jit_comp32.c | 101 ++
  arch/powerpc/net/bpf_jit_comp64.c |  72 ++
  5 files changed, 224 insertions(+), 42 deletions(-)





Re: [PATCH bpf-next v2] bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33

2021-09-14 Thread Daniel Borkmann

On 9/11/21 3:56 AM, Tiezhu Yang wrote:

In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32), the actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT, there is some confusion and need to
spend some time to think about the reason at the first glance.

We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b63
("bpf: Undo off-by-one in interpreter tail call count limit").

In order to avoid changing existing behavior, the actual limit is 33 now,
this is reasonable.

After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
  # echo 0 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep -w FAIL
  Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep -w FAIL
  Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

So it is necessary to change the value of MAX_TAIL_CALL_CNT from 32 to 33,
then do some small changes of the related code.

With this patch, it does not change the current limit 33, MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, the tailcall selftests can work
well, and also the above failed testcase in test_bpf can be fixed for the
interpreter (all archs) and the JIT (all archs except for x86).

  # uname -m
  x86_64
  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep -w FAIL
  Tail call error path, max count reached jited:1 ret 33 != 34 FAIL


Could you also state in here which archs you have tested with this change? I
presume /every/ arch which has a JIT?


Signed-off-by: Tiezhu Yang 
---

v2:
   -- fix the typos in the commit message and update the commit message.
   -- fix the failed tailcall selftests for x86 jit.
  I am not quite sure the change on x86 is proper, with this change,
  tailcall selftests passed, but tailcall limit test in test_bpf.ko
  failed, I do not know the reason now, I think this is another issue,
  maybe someone more versed in x86 jit could take a look.


There should be a series from Johan coming today with regards to test_bpf.ko
that will fix the "tail call error path, max count reached" test which had an
assumption in that R0 would always be valid for the fall-through and could be
passed to the bpf_exit insn whereas it is not guaranteed and verifier, for
example, forbids a subsequent access to R0 w/o reinit. For your testing, I
would suggested to recheck once this series is out.


  arch/arm/net/bpf_jit_32.c | 11 ++-
  arch/arm64/net/bpf_jit_comp.c |  7 ---
  arch/mips/net/ebpf_jit.c  |  4 ++--
  arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
  arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
  arch/riscv/net/bpf_jit_comp32.c   |  4 ++--
  arch/riscv/net/bpf_jit_comp64.c   |  4 ++--
  arch/sparc/net/bpf_jit_comp_64.c  |  8 
  arch/x86/net/bpf_jit_comp.c   | 10 +-
  include/linux/bpf.h   |  2 +-
  kernel/bpf/core.c |  4 ++--
  11 files changed, 36 insertions(+), 34 deletions(-)

[...]

/* prog = array->ptrs[index]
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 41c23f4..5d6c843 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -286,14 +286,15 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
emit(A64_CMP(0, r3, tmp), ctx);
emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
  
-	/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)

-* goto out;
+   /*
 * tail_call_cnt++;
+* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+* goto out;
 */
+   emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx);
emit(A64_CMP(1, tcc, tmp), ctx);
emit(A64_B_(A64_COND_HI, jmp_offset), ctx);
-   emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
  
  	/* prog = array->ptrs[index];

 * if (prog == NULL)

[...]

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 0fe6aac..74a9e61 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -402,7 +402,7 @@ static int get_pop_bytes(bool *callee_regs_used)
   * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
   *   if (index >= array->map.max_entries)
   * goto out;
- *   if (++tail_call_cnt > MAX_TAIL_CALL_CNT)
+ *   if (tail_call_cnt++ == MAX_TAIL_CALL_CNT)


Why such inconsistency to e.g. above with arm64 case but also compared to
x86 32 bit which uses JAE? If so, we should cleanly follow the reference
implementation (== interpreter) _everywhere_ and _not_ introduce additional
variants/implementations across JITs.


   * goto out;
   *   prog = 

Re: [PATCH bpf-next] bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33

2021-09-09 Thread Daniel Borkmann

On 9/9/21 7:50 AM, Andrii Nakryiko wrote:

On Wed, Sep 8, 2021 at 8:33 PM Tiezhu Yang  wrote:


In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32), the actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT, there is some confusion and need to
spend some time to think the reason at the first glance.


think *about* the reason


We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b63
("bpf: Undo off-by-one in interpreter tail call count limit").

In order to avoid changing existing behavior, the actual limit is 33 now,
this is resonable.


typo: reasonable


After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
  # echo 0 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep -w FAIL
  Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep -w FAIL
  Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

So it is necessary to change the value of MAX_TAIL_CALL_CNT from 32 to 33,
then do some small changes of the related code.

With this patch, it does not change the current limit, MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, and the above failed testcase
can be fixed.

Signed-off-by: Tiezhu Yang 
---


This change breaks selftests ([0]), please fix them at the same time
as you are changing the kernel behavior:


The below selftests shouldn't have to change given there is no change in
behavior intended (MAX_TAIL_CALL_CNT is bumped to 33 but counter inc'ed
prior to the comparison). It just means that /all/ JITs must be changed
and in particular properly _tested_.


   test_tailcall_2:PASS:tailcall 128 nsec
   test_tailcall_2:PASS:tailcall 128 nsec
   test_tailcall_2:FAIL:tailcall err 0 errno 2 retval 4
   #135/2 tailcalls/tailcall_2:FAIL
   test_tailcall_3:PASS:tailcall 128 nsec
   test_tailcall_3:FAIL:tailcall count err 0 errno 2 count 34
   test_tailcall_3:PASS:tailcall 128 nsec
   #135/3 tailcalls/tailcall_3:FAIL
   #135/4 tailcalls/tailcall_4:OK
   #135/5 tailcalls/tailcall_5:OK
   #135/6 tailcalls/tailcall_bpf2bpf_1:OK
   test_tailcall_bpf2bpf_2:PASS:tailcall 128 nsec
   test_tailcall_bpf2bpf_2:FAIL:tailcall count err 0 errno 2 count 34
   test_tailcall_bpf2bpf_2:PASS:tailcall 128 nsec
   #135/7 tailcalls/tailcall_bpf2bpf_2:FAIL
   #135/8 tailcalls/tailcall_bpf2bpf_3:OK
   test_tailcall_bpf2bpf_4:PASS:tailcall 54 nsec
   test_tailcall_bpf2bpf_4:FAIL:tailcall count err 0 errno 2 count 32
   #135/9 tailcalls/tailcall_bpf2bpf_4:FAIL
   test_tailcall_bpf2bpf_4:PASS:tailcall 54 nsec
   test_tailcall_bpf2bpf_4:FAIL:tailcall count err 0 errno 2 count 32
   #135/10 tailcalls/tailcall_bpf2bpf_5:FAIL
   #135 tailcalls:FAIL

   [0] 
https://github.com/kernel-patches/bpf/pull/1747/checks?check_run_id=3552002906


  arch/arm/net/bpf_jit_32.c | 11 ++-
  arch/arm64/net/bpf_jit_comp.c |  7 ---
  arch/mips/net/ebpf_jit.c  |  4 ++--
  arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
  arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
  arch/riscv/net/bpf_jit_comp32.c   |  4 ++--
  arch/riscv/net/bpf_jit_comp64.c   |  4 ++--
  arch/sparc/net/bpf_jit_comp_64.c  |  8 
  include/linux/bpf.h   |  2 +-
  kernel/bpf/core.c |  4 ++--
  10 files changed, 31 insertions(+), 29 deletions(-)



[...]





Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-06-04 Thread Daniel Borkmann

On 6/4/21 6:50 AM, Paul Moore wrote:

On Thu, Jun 3, 2021 at 2:53 PM Daniel Borkmann  wrote:

[...]

I did run this entire discussion by both of the other BPF co-maintainers
(Alexei, Andrii, CC'ed) and together we did further brainstorming on the
matter on how we could solve this, but couldn't find a sensible & clean
solution so far.


Before I jump into the patch below I just want to say that I
appreciate you looking into solutions on the BPF side of things.
However, I voted "no" on this patch previously and since you haven't
really changed it, my "no"/NACK vote remains, at least until we
exhaust a few more options.


Just to set the record straight, you previously did neither ACK nor NACK it. And
again, as summarized earlier, this patch is _fixing_ the majority of the damage
caused by 59438b46471a for at least the BPF side of things where users run into 
this,
Ondrej the rest. Everything else can be discussed on top, and so far it seems 
there
is no clean solution in front of us either, not even speaking of one that is 
small
and suitable for _stable_ trees. Let me reiterate where we are: it's not that 
the
original implementation in 9d1f8be5cf42 ("bpf: Restrict bpf when kernel 
lockdown is
in confidentiality mode") was broken, it's that the later added _SELinux_ 
locked_down
hook implementation that is broken, and not other LSMs. Now you're trying to 
retrofittingly
ask us for hacks at all costs just because of /a/ broken LSM implementation. 
Maybe
another avenue is to just swallow the pill and revert 59438b46471a since it made
assumptions that don't work /generally/. And the use case for an application 
runtime
policy change in particular in case of lockdown where users only have 3 choices 
is
extremely tiny/rare, if it's not then something is very wrong in your 
deployment.
Let me ask you this ... are you also planning to address *all* the other cases 
inside
the kernel? If your answer is no, then this entire discussion is pointless.


[PATCH] bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks

Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:

1) The audit events that are triggered due to calls to 
security_locked_down()
   can OOM kill a machine, see below details [0].

2) It also seems to be causing a deadlock via 
avc_has_perm()/slow_avc_audit()
   when trying to wake up kauditd, for example, when using 
trace_sched_switch()
   tracepoint, see details in [1]. Triggering this was not via some 
hypothetical
   corner case, but with existing tools like runqlat & runqslower from bcc, 
for
   example, which make use of this tracepoint. Rough call sequence goes 
like:

   rq_lock(rq) -> -+
 trace_sched_switch() ->   |
   bpf_prog_xyz() ->   +-> deadlock
 selinux_lockdown() -> |
   audit_log_end() ->  |
 wake_up_interruptible() ->|
   try_to_wake_up() -> |
 rq_lock(rq) --+


Since BPF is a bit of chaotic nightmare in the sense that it basically
out-of-tree kernel code that can be called from anywhere and do pretty
much anything; it presents quite the challenge for those of us worried
about LSM access controls.


There is no need to generalize ... for those worried, BPF subsystem has LSM 
access
controls for the syscall since 2017 via afdb09c720b6 ("security: bpf: Add LSM 
hooks
for bpf object related syscall").

[...]

So let's look at this from a different angle.  Let's look at the two
problems you mention above.

If we start with the runqueue deadlock we see the main problem is that
audit_log_end() pokes the kauditd_wait waitqueue to ensure the
kauditd_thread thread wakes up and processes the audit queue.  The
audit_log_start() function does something similar, but it is
conditional on a number of factors and isn't as likely to be hit.  If
we relocate these kauditd wakeup calls we can remove the deadlock in
trace_sched_switch().  In the case of CONFIG_AUDITSYSCALL=y we can
probably just move the wakeup to __audit_syscall_exit() and in the
case of CONFIG_AUDITSYSCALL=n we can likely just change the
wait_event_freezable() call in kauditd_thread to a
wait_event_freezable_timeout() call with a HZ timeout (the audit
stream will be much less on these systems anyway so a queue overflow
is much less likely).  I'm building a kernel with these changes now, I
should have something to test when I wake up tomorrow mornin

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-06-03 Thread Daniel Borkmann
run_bpf_submit+0x4d/0xc0
  [  730.890579]  perf_trace_sched_switch+0x142/0x180
  [  730.891485]  ? __schedule+0x6d8/0xb20
  [  730.892209]  __schedule+0x6d8/0xb20
  [  730.892899]  schedule+0x5b/0xc0
  [  730.893522]  exit_to_user_mode_prepare+0x11d/0x240
  [  730.894457]  syscall_exit_to_user_mode+0x27/0x70
  [  730.895361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
Reported-by: Ondrej Mosnacek 
Reported-by: Jakub Hrozek 
Reported-by: Serhei Makarov 
Reported-by: Jiri Olsa 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Tested-by: Jiri Olsa 
Cc: Paul Moore 
Cc: James Morris 
Cc: Jerome Marchand 
Cc: Frank Eigler 
Cc: Linus Torvalds 
Link: 
https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f84...@iogearbox.net
---
 kernel/bpf/helpers.c |  7 +--
 kernel/trace/bpf_trace.c | 32 
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 73443498d88f..a2f1f15ce432 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "../../lib/kstrtox.h"

@@ -1069,11 +1070,13 @@ bpf_base_func_proto(enum bpf_func_id func_id)
case BPF_FUNC_probe_read_user:
return _probe_read_user_proto;
case BPF_FUNC_probe_read_kernel:
-   return _probe_read_kernel_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_kernel_proto;
case BPF_FUNC_probe_read_user_str:
return _probe_read_user_str_proto;
case BPF_FUNC_probe_read_kernel_str:
-   return _probe_read_kernel_str_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_kernel_str_proto;
case BPF_FUNC_snprintf_btf:
return _snprintf_btf_proto;
case BPF_FUNC_snprintf:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d2d7cf6cfe83..7a52bc172841 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -215,16 +215,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto 
= {
 static __always_inline int
 bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
 {
-   int ret = security_locked_down(LOCKDOWN_BPF_READ);
+   int ret;

-   if (unlikely(ret < 0))
-   goto fail;
ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
-   goto fail;
-   return ret;
-fail:
-   memset(dst, 0, size);
+   memset(dst, 0, size);
return ret;
 }

@@ -246,10 +241,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = {
 static __always_inline int
 bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr)
 {
-   int ret = security_locked_down(LOCKDOWN_BPF_READ);
-
-   if (unlikely(ret < 0))
-   goto fail;
+   int ret;

/*
 * The strncpy_from_kernel_nofault() call will likely not fill the
@@ -262,11 +254,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, 
const void *unsafe_ptr)
 */
ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
-   goto fail;
-
-   return ret;
-fail:
-   memset(dst, 0, size);
+   memset(dst, 0, size);
return ret;
 }

@@ -1011,16 +999,20 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
case BPF_FUNC_probe_read_user:
return _probe_read_user_proto;
case BPF_FUNC_probe_read_kernel:
-   return _probe_read_kernel_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_kernel_proto;
case BPF_FUNC_probe_read_user_str:
return _probe_read_user_str_proto;
case BPF_FUNC_probe_read_kernel_str:
-   return _probe_read_kernel_str_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_kernel_str_proto;
 #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
case BPF_FUNC_probe_read:
-   return _probe_read_compat_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_compat_proto;
case BPF_FUNC_probe_read_str:
-   return _probe_read_compat_str_proto;
+   return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
+  NULL : _probe_read_compat_str_proto;
 #endif
 #ifdef CONFIG_CGROUPS
case BPF_FUNC_get_current_cgroup_id:
--
2.21.0



Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-06-02 Thread Daniel Borkmann

On 6/1/21 10:47 PM, Paul Moore wrote:

On Mon, May 31, 2021 at 4:24 AM Daniel Borkmann  wrote:

On 5/29/21 8:48 PM, Paul Moore wrote:
[...]

Daniel's patch side steps that worry by just doing the lockdown
permission check when the BPF program is loaded, but that isn't a
great solution if the policy changes afterward.  I was hoping there
might be some way to perform the permission check as needed, but the
more I look the more that appears to be difficult, if not impossible
(once again, corrections are welcome).


Your observation is correct, will try to clarify below a bit.


I'm now wondering if the right solution here is to make use of the LSM
notifier mechanism.  I'm not yet entirely sure if this would work from
a BPF perspective, but I could envision the BPF subsystem registering
a LSM notification callback via register_blocking_lsm_notifier(), see
if Infiniband code as an example, and then when the LSM(s) policy
changes the BPF subsystem would get a notification and it could
revalidate the existing BPF programs and take block/remove/whatever
the offending BPF programs.  This obviously requires a few things
which I'm not sure are easily done, or even possible:

1. Somehow the BPF programs would need to be "marked" at
load/verification time with respect to their lockdown requirements so
that decisions can be made later.  Perhaps a flag in bpf_prog_aux?

2. While it looks like it should be possible to iterate over all of
the loaded BPF programs in the LSM notifier callback via
idr_for_each(prog_idr, ...), it is not clear to me if it is possible
to safely remove, or somehow disable, BPF programs once they have been
loaded.  Hopefully the BPF folks can help answer that question.

3. Disabling of BPF programs might be preferable to removing them
entirely on LSM policy changes as it would be possible to make the
lockdown state less restrictive at a future point in time, allowing
for the BPF program to be executed again.  Once again, not sure if
this is even possible.


Part of why this gets really complex/impossible is that BPF programs in
the kernel are reference counted from various sides, be it that there
are references from user space to them (fd from application, BPF fs, or
BPF links), hooks where they are attached to as well as tail call maps
where one BPF prog calls into another. There is currently also no global
infra of some sort where you could piggy back to atomically keep track of
all the references in a list or such. And the other thing is that BPF progs
have no ownership that is tied to a specific task after they have been
loaded. Meaning, once they are loaded into the kernel by an application
and attached to a specific hook, they can remain there potentially until
reboot of the node, so lifecycle of the user space application != lifecycle
of the BPF program.


I don't think the disjoint lifecycle or lack of task ownership is a
deal breaker from a LSM perspective as the LSMs can stash whatever
info they need in the security pointer during the program allocation
hook, e.g. selinux_bpf_prog_alloc() saves the security domain which
allocates/loads the BPF program.

The thing I'm worried about would be the case where a LSM policy
change requires that an existing BPF program be removed or disabled.
I'm guessing based on the refcounting that there is not presently a
clean way to remove a BPF program from the system, but is this
something we could resolve?  If we can't safely remove a BPF program
from the system, can we replace/swap it with an empty/NULL BPF
program?


Removing progs would somehow mean destroying those references from an
async event and then /safely/ guaranteeing that nothing is accessing
them anymore. But then if policy changes once more where they would
be allowed again we would need to revert back to the original state,
which brings us to your replace/swap question with an empty/null prog.
It's not feasible either, because there are different BPF program types
and they can have different return code semantics that lead to subsequent
actions. If we were to replace them with an empty/NULL program, then
essentially this will get us into an undefined system state given it's
unclear what should be a default policy for each program type, etc.
Just to pick one simple example, outside of tracing, that comes to mind:
say, you attached a program with tc to a given device ingress hook. That
program implements firewalling functionality, and potentially deep down
in that program there is functionality to record/sample packets along
with some meta data. Part of what is exported to the ring buffer to the
user space reader may be a struct net_device field that is otherwise not
available (or at least not yet), hence it's probe-read with mentioned
helpers. If you were now to change the SELinux policy for that tc loader
application, and therefore replace/swap the progs in the kernel that were
loaded with it (given tc's lockdown policy was recorded in their sec blob)
with an empty/NU

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-31 Thread Daniel Borkmann

On 5/29/21 8:48 PM, Paul Moore wrote:
[...]

Daniel's patch side steps that worry by just doing the lockdown
permission check when the BPF program is loaded, but that isn't a
great solution if the policy changes afterward.  I was hoping there
might be some way to perform the permission check as needed, but the
more I look the more that appears to be difficult, if not impossible
(once again, corrections are welcome).


Your observation is correct, will try to clarify below a bit.


I'm now wondering if the right solution here is to make use of the LSM
notifier mechanism.  I'm not yet entirely sure if this would work from
a BPF perspective, but I could envision the BPF subsystem registering
a LSM notification callback via register_blocking_lsm_notifier(), see
if Infiniband code as an example, and then when the LSM(s) policy
changes the BPF subsystem would get a notification and it could
revalidate the existing BPF programs and take block/remove/whatever
the offending BPF programs.  This obviously requires a few things
which I'm not sure are easily done, or even possible:

1. Somehow the BPF programs would need to be "marked" at
load/verification time with respect to their lockdown requirements so
that decisions can be made later.  Perhaps a flag in bpf_prog_aux?

2. While it looks like it should be possible to iterate over all of
the loaded BPF programs in the LSM notifier callback via
idr_for_each(prog_idr, ...), it is not clear to me if it is possible
to safely remove, or somehow disable, BPF programs once they have been
loaded.  Hopefully the BPF folks can help answer that question.

3. Disabling of BPF programs might be preferable to removing them
entirely on LSM policy changes as it would be possible to make the
lockdown state less restrictive at a future point in time, allowing
for the BPF program to be executed again.  Once again, not sure if
this is even possible.


Part of why this gets really complex/impossible is that BPF programs in
the kernel are reference counted from various sides, be it that there
are references from user space to them (fd from application, BPF fs, or
BPF links), hooks where they are attached to as well as tail call maps
where one BPF prog calls into another. There is currently also no global
infra of some sort where you could piggy back to atomically keep track of
all the references in a list or such. And the other thing is that BPF progs
have no ownership that is tied to a specific task after they have been
loaded. Meaning, once they are loaded into the kernel by an application
and attached to a specific hook, they can remain there potentially until
reboot of the node, so lifecycle of the user space application != lifecycle
of the BPF program.

It's maybe best to compare this aspect to kernel modules in the sense that
you have an application that loads it into the kernel (insmod, etc, where
you could also enforce lockdown signature check), but after that, they can
be managed by other entities as well (implicitly refcounted from kernel,
removed by other applications, etc).

My understanding of the lockdown settings are that users have options
to select/enforce a lockdown level of CONFIG_LOCK_DOWN_KERNEL_FORCE_{INTEGRITY,
CONFIDENTIALITY} at compilation time, they have a lockdown={integrity|
confidentiality} boot-time parameter, /sys/kernel/security/lockdown,
and then more fine-grained policy via 59438b46471a ("security,lockdown,selinux:
implement SELinux lockdown"). Once you have set a global policy level,
you cannot revert back to a less strict mode. So the SELinux policy is
specifically tied around tasks to further restrict applications in respect
to the global policy. I presume that would mean for those users that majority
of tasks have the confidentiality option set via SELinux with just a few
necessary using the integrity global policy. So overall the enforcing
option when BPF program is loaded is the only really sensible option to
me given only there we have the valid current task where such policy can
be enforced.

Best,
Daniel


Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-28 Thread Daniel Borkmann

On 5/28/21 5:47 PM, Paul Moore wrote:

On Fri, May 28, 2021 at 3:10 AM Daniel Borkmann  wrote:

On 5/28/21 3:37 AM, Paul Moore wrote:

On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek  wrote:


Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

Since in most of these situations converting the callers such that
security_locked_down() is called in a context where the current task
would be meaningful for SELinux is impossible or very non-trivial (and
could lead to TOCTOU issues for the classic Lockdown LSM
implementation), fix this by modifying the hook to accept a struct cred
pointer as argument, where NULL will be interpreted as a request for a
"global", task-independent lockdown decision only. Then modify SELinux
to ignore calls with cred == NULL.


I'm not overly excited about skipping the access check when cred is
NULL.  Based on the description and the little bit that I've dug into
thus far it looks like using SECINITSID_KERNEL as the subject would be
much more appropriate.  *Something* (the kernel in most of the
relevant cases it looks like) is requesting that a potentially
sensitive disclosure be made, and ignoring it seems like the wrong
thing to do.  Leaving the access control intact also provides a nice
avenue to audit these requests should users want to do that.


I think the rationale/workaround for ignoring calls with cred == NULL (or the 
previous
patch with the unimplemented hook) from Ondrej was two-fold, at least speaking 
for his
seen tracing cases:

i) The audit events that are triggered due to calls to 
security_locked_down()
   can OOM kill a machine, see below details [0].

   ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
   when presumingly trying to wake up kauditd [1].

How would your suggestion above solve both i) and ii)?


First off, a bit of general commentary - I'm not sure if Ondrej was
aware of this, but info like that is good to have in the commit
description.  Perhaps it was in the linked RHBZ but I try not to look
at those when reviewing patches; the commit descriptions must be
self-sufficient since we can't rely on the accessibility or the
lifetime of external references.  It's fine if people want to include
external links in their commits, I would actually even encourage it in
some cases, but the links shouldn't replace a proper description of
the problem and why the proposed solution is The Best Solution.

With that out of the way, it sounds like your issue isn't so much the
access check, but rather the frequency of the access denials and the
resulting audit records in your particular use case.  My initial
reaction is that you might want to understand why you are getting so
many SELinux access denials, your loaded security policy clearly does
not match with your intended use :)  Beyond that, if you want to
basically leave things as-is but quiet the high frequency audit
records that result from these SELinux denials you might want to look
into the SELinux "dontaudit" policy rule, it was created for things
like this.  Some info can be found in The SELinux Notebook, relevant
link below:

* 
https://github.com/SELinuxProject/selinux-notebook/blob/main/src/avc_rules.md#dontaudit

The deadlock issue that was previously reported remains an open case
as far as I'm concerned; I'm presently occupied trying to sort out a
rather serious issue with respect to io_uring and LSM/audit (plus
general stuff at $DAYJOB) so I haven't had time to investigate this
any further.  Of course anyone else is welcome to dive into it (I
always want to encourage this, especially from "performance people"
who just want to shut it all off), however if the answer is basically
"disable LSM and/or audit checks" you have to know that it is going to
result in a high degree of skepticism from me, so heavy documentation
on why it is The Best Solution would be a very good thing :)  Beyond
that, I think the suggestions above of "why do you have so many policy
denials?" and "have you looked into dontaudit?" are solid places to
look for a solution in your particular case.


Since most callers will just want to pass current_cred() as the cred
parameter, rename the hook to security_cred_locked_down() and provide
the original security_locked_down() function as a simple wrapper around
the new hook.


[...]



3. kernel/trace/bpf_trace.c:bpf_probe_read_kernel{,_str}_common()
   Called when a BPF program calls a helper that could leak kernel
   memory. The task context is not relevant here, si

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-28 Thread Daniel Borkmann

On 5/28/21 3:42 PM, Ondrej Mosnacek wrote:

(I'm off work today and plan to reply also to Paul's comments next
week, but for now let me at least share a couple quick thoughts on
Daniel's patch.)

On Fri, May 28, 2021 at 11:56 AM Daniel Borkmann  wrote:

On 5/28/21 9:09 AM, Daniel Borkmann wrote:

On 5/28/21 3:37 AM, Paul Moore wrote:

On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek  wrote:


Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

Since in most of these situations converting the callers such that
security_locked_down() is called in a context where the current task
would be meaningful for SELinux is impossible or very non-trivial (and
could lead to TOCTOU issues for the classic Lockdown LSM
implementation), fix this by modifying the hook to accept a struct cred
pointer as argument, where NULL will be interpreted as a request for a
"global", task-independent lockdown decision only. Then modify SELinux
to ignore calls with cred == NULL.


I'm not overly excited about skipping the access check when cred is
NULL.  Based on the description and the little bit that I've dug into
thus far it looks like using SECINITSID_KERNEL as the subject would be
much more appropriate.  *Something* (the kernel in most of the
relevant cases it looks like) is requesting that a potentially
sensitive disclosure be made, and ignoring it seems like the wrong
thing to do.  Leaving the access control intact also provides a nice
avenue to audit these requests should users want to do that.


I think the rationale/workaround for ignoring calls with cred == NULL (or the 
previous
patch with the unimplemented hook) from Ondrej was two-fold, at least speaking 
for his
seen tracing cases:

i) The audit events that are triggered due to calls to 
security_locked_down()
   can OOM kill a machine, see below details [0].

   ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
   when presumingly trying to wake up kauditd [1].


Actually, I wasn't aware of the deadlock... But calling an LSM hook
[that is backed by a SELinux access check] from within a BPF helper is
calling for all kinds of trouble, so I'm not surprised :)


Fully agree, it's just waiting to blow up in unpredictable ways.. :/


Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't 
looked
at the rest but it's also kind of independent), the attached fix should address 
both
reported issues, please take a look & test.


Thanks, I like this solution, although there are a few gotchas:

1. This patch creates a slight "regression" in that if someone flips
the Lockdown LSM into lockdown mode on runtime, existing (already
loaded) BPF programs will still be able to call the
confidentiality-breaching helpers, while before the lockdown would
apply also to them. Personally, I don't think it's a big deal (and I
bet there are other existing cases where some handle kept from before
lockdown could leak data), but I wanted to mention it in case someone
thinks the opposite.


Yes, right, though this is nothing new either in the sense that there are
plenty of other cases with security_locked_down() that operate this way
e.g. take the open_kcore() for /proc/kcore access or the module_sig_check()
for mod signatures just to pick some random ones, same approach where the
enforcement is happen at open/load time.


2. IIUC. when a BPF program is rejected due to lockdown/SELinux, the
kernel will return -EINVAL to userspace (looking at
check_helper_call() in kernel/bpf/verifier.c; didn't have time to look
at other callers...). It would be nicer if the error code from the
security_locked_down() call would be passed through the call chain and
eventually returned to the caller. It should be relatively
straightforward to convert bpf_base_func_proto() to return a PTR_ERR()
instead of NULL on error, but it looks like this would result in quite
a big patch updating all the callers (and callers of callers, etc.)
with a not-so-small chance of missing some NULL check and introducing
a bug... I guess we could live with EINVAL-on-denied in stable kernels
and only have the error path refactoring in -next; I'm not sure...


Right, it would return a verifier log entry with reporting to the user that
the prog is attempting to use an unavailable/unknown helper function. We do
have similar return NULL with bpf_capable() and perfmon_capable() checks.
Potentially, we could do PTR_ERR() in future where we tell if it failed due
to missing CAPs, due to lockdown or just due to helper not compiled in..


3. This 

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-28 Thread Daniel Borkmann

On 5/28/21 1:47 PM, Jiri Olsa wrote:

On Fri, May 28, 2021 at 11:56:02AM +0200, Daniel Borkmann wrote:


Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't 
looked
at the rest but it's also kind of independent), the attached fix should address 
both
reported issues, please take a look & test.

Thanks a lot,
Daniel



 From 5893ad528dc0a0a68933b8f2a81b18d3f539660d Mon Sep 17 00:00:00 2001
From: Daniel Borkmann 
Date: Fri, 28 May 2021 09:16:31 +
Subject: [PATCH bpf] bpf, audit, lockdown: Fix bogus SELinux lockdown 
permission checks

Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:

   i) The audit events that are triggered due to calls to security_locked_down()
  can OOM kill a machine, see below details [0].

  ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
  when presumingly trying to wake up kauditd [1].

Fix both at the same time by taking a completely different approach, that is,
move the check into the program verification phase where we actually retrieve
the func proto. This also reliably gets the task (current) that is trying to
install the tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also
fixes the OOM since we're moving this out of the BPF helpers which can be called
millions of times per second.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:

   I starting seeing this with F-34. When I run a container that is traced with
   BPF to record the syscalls it is doing, auditd is flooded with messages like:

   type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality }
 for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
   scontext=system_u:system_r:auditd_t:s0 
tcontext=system_u:system_r:auditd_t:s0
 tclass=lockdown permissive=0

   This seems to be leading to auditd running out of space in the backlog buffer
   and eventually OOMs the machine.

   [...]
   auditd running at 99% CPU presumably processing all the messages, eventually 
I get:
   Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
   Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
   Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > 
audit_backlog_limit=64
   Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > 
audit_backlog_limit=64
   Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > 
audit_backlog_limit=64
   Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 
audit_backlog_limit=64
   Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: 
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
   Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not 
tainted 5.11.12-300.fc34.x86_64 #1
   Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + 
PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
   [...]

[1] 
https://lore.kernel.org/linux-audit/canyvdqn7h5tvp47fbycrasv4xf07eubsdwt_edchxjuj43j...@mail.gmail.com/,
 Serhei Makarov says:

   Upstream kernel 5.11.0-rc7 and later was found to deadlock during a
   bpf_probe_read_compat() call within a sched_switch tracepoint. The problem
   is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend
   testsuite on x86_64 as well as the runqlat,runqslower tools from bcc on
   ppc64le. Example stack trace:

   [...]
   [  730.868702] stack backtrace:
   [  730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 
5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1
   [  730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-2.fc32 04/01/2014
   [  730.873278] Call Trace:
   [  730.873770]  dump_stack+0x7f/0xa1
   [  730.874433]  check_noncircular+0xdf/0x100
   [  730.875232]  __lock_acquire+0x1202/0x1e10
   [  730.876031]  ? __lock_acquire+0xfc0/0x1e10
   [  730.876844]  lock_acquire+0xc2/0x3a0
   [  730.877551]  ? __wake_up_common_lock+0x52/0x90
   [  730.878434]  ? lock_acquire+0xc2/0x3a0
   [  730.879186]  ? lock_is_held_type+0xa7/0x120
   [  730.880044]  ? skb_queue_tail+0x1b/0x50
   [  730.880800]  _raw_spin_lock_irqsave+0x4d/0x90
   [  730.881656]  ? __wake_up_common_lock+0x52/0x90
   [  730.882532]  __wake_up_common_lock+0x52/0x90
   [  730.883375]  audit_log_end+0x5b/0x100
   [  730.884104]  slow_avc_audit+0x69/0x90
   [  730.884836]  avc_has_perm+0x8b/0xb0
   [  730.885532]  selinux_lockdown+0xa5/0xd0
   [  730.886297]  security_locked_down+0x20/0x40
   [  730.887133]  bpf_probe_read_compat+0x66/0xd0
   [

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-28 Thread Daniel Borkmann

On 5/28/21 9:09 AM, Daniel Borkmann wrote:

On 5/28/21 3:37 AM, Paul Moore wrote:

On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek  wrote:


Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

Since in most of these situations converting the callers such that
security_locked_down() is called in a context where the current task
would be meaningful for SELinux is impossible or very non-trivial (and
could lead to TOCTOU issues for the classic Lockdown LSM
implementation), fix this by modifying the hook to accept a struct cred
pointer as argument, where NULL will be interpreted as a request for a
"global", task-independent lockdown decision only. Then modify SELinux
to ignore calls with cred == NULL.


I'm not overly excited about skipping the access check when cred is
NULL.  Based on the description and the little bit that I've dug into
thus far it looks like using SECINITSID_KERNEL as the subject would be
much more appropriate.  *Something* (the kernel in most of the
relevant cases it looks like) is requesting that a potentially
sensitive disclosure be made, and ignoring it seems like the wrong
thing to do.  Leaving the access control intact also provides a nice
avenue to audit these requests should users want to do that.


I think the rationale/workaround for ignoring calls with cred == NULL (or the 
previous
patch with the unimplemented hook) from Ondrej was two-fold, at least speaking 
for his
seen tracing cases:

   i) The audit events that are triggered due to calls to security_locked_down()
  can OOM kill a machine, see below details [0].

  ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
  when presumingly trying to wake up kauditd [1].


Ondrej / Paul / Jiri: at least for the BPF tracing case specifically (I haven't 
looked
at the rest but it's also kind of independent), the attached fix should address 
both
reported issues, please take a look & test.

Thanks a lot,
Daniel
>From 5893ad528dc0a0a68933b8f2a81b18d3f539660d Mon Sep 17 00:00:00 2001
From: Daniel Borkmann 
Date: Fri, 28 May 2021 09:16:31 +
Subject: [PATCH bpf] bpf, audit, lockdown: Fix bogus SELinux lockdown permission checks

Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:

  i) The audit events that are triggered due to calls to security_locked_down()
 can OOM kill a machine, see below details [0].

 ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
 when presumingly trying to wake up kauditd [1].

Fix both at the same time by taking a completely different approach, that is,
move the check into the program verification phase where we actually retrieve
the func proto. This also reliably gets the task (current) that is trying to
install the tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also
fixes the OOM since we're moving this out of the BPF helpers which can be called
millions of times per second.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:

  I starting seeing this with F-34. When I run a container that is traced with
  BPF to record the syscalls it is doing, auditd is flooded with messages like:

  type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality }
for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
  scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0
tclass=lockdown permissive=0

  This seems to be leading to auditd running out of space in the backlog buffer
  and eventually OOMs the machine.

  [...]
  auditd running at 99% CPU presumably processing all the messages, eventually I get:
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64
  Apr

Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-05-28 Thread Daniel Borkmann

On 5/28/21 3:37 AM, Paul Moore wrote:

On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek  wrote:


Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

Since in most of these situations converting the callers such that
security_locked_down() is called in a context where the current task
would be meaningful for SELinux is impossible or very non-trivial (and
could lead to TOCTOU issues for the classic Lockdown LSM
implementation), fix this by modifying the hook to accept a struct cred
pointer as argument, where NULL will be interpreted as a request for a
"global", task-independent lockdown decision only. Then modify SELinux
to ignore calls with cred == NULL.


I'm not overly excited about skipping the access check when cred is
NULL.  Based on the description and the little bit that I've dug into
thus far it looks like using SECINITSID_KERNEL as the subject would be
much more appropriate.  *Something* (the kernel in most of the
relevant cases it looks like) is requesting that a potentially
sensitive disclosure be made, and ignoring it seems like the wrong
thing to do.  Leaving the access control intact also provides a nice
avenue to audit these requests should users want to do that.


I think the rationale/workaround for ignoring calls with cred == NULL (or the 
previous
patch with the unimplemented hook) from Ondrej was two-fold, at least speaking 
for his
seen tracing cases:

  i) The audit events that are triggered due to calls to security_locked_down()
 can OOM kill a machine, see below details [0].

 ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end()
 when presumingly trying to wake up kauditd [1].

How would your suggestion above solve both i) and ii)?

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585 :

  I starting seeing this with F-34. When I run a container that is traced with 
eBPF
  to record the syscalls it is doing, auditd is flooded with messages like:

  type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality } 
for
   pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
scontext=system_u:system_r:auditd_t:s0 
tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0

  This seems to be leading to auditd running out of space in the backlog buffer 
and
  eventually OOMs the machine.

  auditd running at 99% CPU presumably processing all the messages, eventually 
I get:
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > 
audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > 
audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > 
audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 
audit_backlog_limit=64
  Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: 
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
  Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not 
tainted 5.11.12-300.fc34.x86_64 #1
  Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + 
PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014

[1] 
https://lore.kernel.org/linux-audit/canyvdqn7h5tvp47fbycrasv4xf07eubsdwt_edchxjuj43j...@mail.gmail.com/
 :

  Upstream kernel 5.11.0-rc7 and later was found to deadlock during a 
bpf_probe_read_compat()
  call within a sched_switch tracepoint. The problem is reproducible with the 
reg_alloc3
  testcase from SystemTap's BPF backend testsuite on x86_64 as well as the 
runqlat,runqslower
  tools from bcc on ppc64le. Example stack trace from [1]:

  [  730.868702] stack backtrace:
  [  730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 
5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1
  [  730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-2.fc32 04/01/2014
  [  730.873278] Call Trace:
  [  730.873770]  dump_stack+0x7f/0xa1
  [  730.874433]  check_noncircular+0xdf/0x100
  [  730.875232]  __lock_acquire+0x1202/0x1e10
  [  730.876031]  ? __lock_acquire+0xfc0/0x1e10
  [  730.876844]  lock_acquire+0xc2/0x3a0
  [  730.877551]  ? __wake_up_common_lock+0x52/0x90
  [  730.878434]  ? lock_acquire+0xc2/0x3a0
  [  730.879186]  ? lock_is_held_type+0xa7/0x120
  [  730.880044]  ? skb_queue_tail+0x1b/0x50
  [  730.880800]  _raw_spin_lock_irqsave+0x4d/0x90
  [  730.881656]  ? __wake_up_common_lock+0x52/0x90
  [  730.882532]  __wake_up_common_lock+0x52/0x90
  [  

Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-15 Thread Daniel Borkmann

On 4/15/21 11:32 AM, Jianlin Lv wrote:

For debugging JITs, dumping the JITed image to kernel log is discouraged,
"bpftool prog dump jited" is much better way to examine JITed dumps.
This patch get rid of the code related to bpf_jit_enable=2 mode and
update the proc handler of bpf_jit_enable, also added auxiliary
information to explain how to use bpf_jit_disasm tool after this change.

Signed-off-by: Jianlin Lv 

[...]

diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 0a7a2870f111..8d36b4658076 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -2566,9 +2566,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
cond_resched();
}
  
-	if (bpf_jit_enable > 1)

-   bpf_jit_dump(prog->len, proglen, pass + 1, image);
-
if (image) {
bpf_jit_binary_lock_ro(header);
prog->bpf_func = (void *)image;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index c8496c1142c9..990b1720c7a4 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -273,16 +273,8 @@ static int proc_dointvec_minmax_bpf_enable(struct 
ctl_table *table, int write,
  
  	tmp.data = _enable;

ret = proc_dointvec_minmax(, write, buffer, lenp, ppos);
-   if (write && !ret) {
-   if (jit_enable < 2 ||
-   (jit_enable == 2 && bpf_dump_raw_ok(current_cred( {
-   *(int *)table->data = jit_enable;
-   if (jit_enable == 2)
-   pr_warn("bpf_jit_enable = 2 was set! NEVER use this 
in production, only for JIT debugging!\n");
-   } else {
-   ret = -EPERM;
-   }
-   }
+   if (write && !ret)
+   *(int *)table->data = jit_enable;
return ret;
  }
  
@@ -389,7 +381,7 @@ static struct ctl_table net_core_table[] = {

.extra2 = SYSCTL_ONE,
  # else
.extra1 = SYSCTL_ZERO,
-   .extra2 = ,
+   .extra2 = SYSCTL_ONE,
  # endif
},
  # ifdef CONFIG_HAVE_EBPF_JIT
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index c8ae95804728..efa4b17ae016 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -7,7 +7,7 @@
   *
   * To get the disassembly of the JIT code, do the following:
   *
- *  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
+ *  1) Insert bpf_jit_dump() and recompile the kernel to output JITed image 
into log


Hmm, if we remove bpf_jit_dump(), the next drive-by cleanup patch will be thrown
at bpf@vger stating that bpf_jit_dump() has no in-tree users and should be 
removed.
Maybe we should be removing bpf_jit_disasm.c along with it as well as 
bpf_jit_dump()
itself ... I guess if it's ever needed in those rare occasions for JIT 
debugging we
can resurrect it from old kernels just locally. But yeah, bpftool's jit dump 
should
suffice for vast majority of use cases.

There was a recent set for ppc32 jit which was merged into ppc tree which will 
create
a merge conflict with this one [0]. So we would need a rebase and take it maybe 
during
merge win once the ppc32 landed..

  [0] 
https://lore.kernel.org/bpf/cover.1616430991.git.christophe.le...@csgroup.eu/


   *  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 
192.168.20.0/24`)
   *  3) Run e.g. `bpf_jit_disasm -o` to read out the last JIT code
   *
diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
index 40a88df275f9..98c7eec2923f 100644
--- a/tools/bpf/bpftool/feature.c
+++ b/tools/bpf/bpftool/feature.c
@@ -203,9 +203,6 @@ static void probe_jit_enable(void)
case 1:
printf("JIT compiler is enabled\n");
break;
-   case 2:
-   printf("JIT compiler is enabled with debugging traces in 
kernel logs\n");
-   break;


This would still need to be there for older kernels ...


case -1:
printf("Unable to retrieve JIT-compiler status\n");
break;





Re: [PATCH] powerpc: net: bpf_jit_comp: Fix misuse of fallthrough

2020-09-29 Thread Daniel Borkmann

On 9/28/20 11:00 AM, zhe...@windriver.com wrote:

From: He Zhe 

The user defined label following "fallthrough" is not considered by GCC
and causes build failure.

kernel-source/include/linux/compiler_attributes.h:208:41: error: attribute
'fallthrough' not preceding a case label or default label [-Werror]
  208   define fallthrough _attribute((fallthrough_))
   ^

Signed-off-by: He Zhe 


Applied, thanks! I've also added Fixes tag with df561f6688fe ("treewide: Use 
fallthrough pseudo-keyword")
which added the bug.


Re: [PATCH] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

2020-05-28 Thread Daniel Borkmann

On 5/28/20 2:23 PM, Michael Ellerman wrote:

Petr Mladek  writes:

On Thu 2020-05-28 11:03:43, Michael Ellerman wrote:

Petr Mladek  writes:

The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
to archs where they work") caused that bpf_probe_read{, str}() functions
were not longer available on architectures where the same logical address
might have different content in kernel and user memory mapping. These
architectures should use probe_read_{user,kernel}_str helpers.

For backward compatibility, the problematic functions are still available
on architectures where the user and kernel address spaces are not
overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

At the moment, these backward compatible functions are enabled only
on x86_64, arm, and arm64. Let's do it also on powerpc that has
the non overlapping address space as well.

Signed-off-by: Petr Mladek 


This seems like it should have a Fixes: tag and go into v5.7?


Good point:

Fixes: commit 0ebeea8ca8a4d1d4 ("bpf: Restrict bpf_probe_read{, str}() only to archs 
where they work")

And yes, it should ideally go into v5.7 either directly or via stable.

Should I resend the patch with Fixes and
Cc: sta...@vger.kernel.org #v45.7 lines, please?


If it goes into v5.7 then it doesn't need a Cc: stable, and I guess a
Fixes: tag is nice to have but not so important as it already mentions
the commit that caused the problem. So a resend probably isn't
necessary.

Acked-by: Michael Ellerman 

Daniel can you pick this up, or should I?


Yeah I'll take it into bpf tree for v5.7.

Thanks everyone,
Daniel


Re: [PATCH v2 1/4] powerpc/64s: implement probe_kernel_read/write without touching AMR

2020-04-07 Thread Daniel Borkmann

Hey Nicholas,

On 4/7/20 6:01 AM, Nicholas Piggin wrote:

Nicholas Piggin's on April 3, 2020 9:05 pm:

Christophe Leroy's on April 3, 2020 8:31 pm:

Le 03/04/2020 à 11:35, Nicholas Piggin a écrit :

There is no need to allow user accesses when probing kernel addresses.


I just discovered the following commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=75a1a607bb7e6d918be3aca11ec2214a275392f4

This commit adds probe_kernel_read_strict() and probe_kernel_write_strict().

When reading the commit log, I understand that probe_kernel_read() may
be used to access some user memory. Which will not work anymore with
your patch.


Hmm, I looked at _strict but obviously not hard enough. Good catch.

I don't think probe_kernel_read() should ever access user memory,
the comment certainly says it doesn't, but that patch sort of implies
that they do.

I think it's wrong. The non-_strict maybe could return userspace data to
you if you did pass a user address? I don't see why that shouldn't just
be disallowed always though.

And if the _strict version is required to be safe, then it seems like a
bug or security issue to just allow everyone that doesn't explicitly
override it to use the default implementation.

Also, the way the weak linkage is done in that patch, means parisc and
um archs that were previously overriding probe_kernel_read() now get
the default probe_kernel_read_strict(), which would be wrong for them.


The changelog in commit 75a1a607bb7 makes it a bit clearer. If the
non-_strict variant is used on non-kernel addresses, then it might not
return -EFAULT or it might cause a kernel warning. The _strict variant
is supposed to be usable with any address and it will return -EFAULT if
it was not a valid and mapped kernel address.

The non-_strict variant can not portably access user memory because it
uses KERNEL_DS, and its documentation says its only for kernel pointers.
So powerpc should be fine to run that under KUAP AFAIKS.

I don't know why the _strict behaviour is not just made default, but
the implementation of it does seem to be broken on the archs that
override the non-_strict variant.


Yeah, we should make it default and only add a "opt out" for the old legacy
cases; there was also same discussion started over here just recently [0].

Thanks,
Daniel

  [0] https://lore.kernel.org/lkml/20200403133533.ga3...@infradead.org/T/


Re: [PATCH] libbpf: fix readelf output parsing on powerpc with recent binutils

2019-12-02 Thread Daniel Borkmann
On Mon, Dec 02, 2019 at 04:53:26PM +1100, Michael Ellerman wrote:
> Aurelien Jarno  writes:
> > On powerpc with recent versions of binutils, readelf outputs an extra
> > field when dumping the symbols of an object file. For example:
> >
> > 35: 083896 FUNCLOCAL  DEFAULT [: 8] 
> > 1 btf_is_struct
> >
> > The extra "[: 8]" prevents the GLOBAL_SYM_COUNT variable to
> > be computed correctly and causes the checkabi target to fail.
> >
> > Fix that by looking for the symbol name in the last field instead of the
> > 8th one. This way it should also cope with future extra fields.
> >
> > Signed-off-by: Aurelien Jarno 
> > ---
> >  tools/lib/bpf/Makefile | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Thanks for fixing that, it's been on my very long list of test failures
> for a while.
> 
> Tested-by: Michael Ellerman 

Looks good & also continues to work on x86. Applied, thanks!


Re: [PATCH] bpf: handle 32-bit zext during constant blinding

2019-08-26 Thread Daniel Borkmann

On 8/21/19 9:23 PM, Naveen N. Rao wrote:

Since BPF constant blinding is performed after the verifier pass, the
ALU32 instructions inserted for doubleword immediate loads don't have a
corresponding zext instruction. This is causing a kernel oops on powerpc
and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.

Fix this by emitting BPF_ZEXT during constant blinding if
prog->aux->verifier_zext is set.

Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis 
result")
Reported-by: Michael Ellerman 
Signed-off-by: Naveen N. Rao 


LGTM, applied to bpf, thanks!


Re: [PATCH 0/2] powerpc/bpf: DIV64 instruction fix

2019-06-13 Thread Daniel Borkmann
On 06/12/2019 08:51 PM, Naveen N. Rao wrote:
> The first patch updates DIV64 overflow tests to properly detect error 
> conditions. The second patch fixes powerpc64 JIT to generate the proper 
> unsigned division instruction for BPF_ALU64.
> 
> - Naveen
> 
> Naveen N. Rao (2):
>   bpf: fix div64 overflow tests to properly detect errors
>   powerpc/bpf: use unsigned division instruction for 64-bit operations
> 
>  arch/powerpc/include/asm/ppc-opcode.h  |  1 +
>  arch/powerpc/net/bpf_jit.h |  2 +-
>  arch/powerpc/net/bpf_jit_comp64.c  |  8 
>  .../testing/selftests/bpf/verifier/div_overflow.c  | 14 ++
>  4 files changed, 16 insertions(+), 9 deletions(-)
> 

LGTM, applied to bpf, thanks!


Re: [PATCH] powerpc: bpf: Fix generation of load/store DW instructions

2019-03-15 Thread Daniel Borkmann
On 03/15/2019 03:51 PM, Naveen N. Rao wrote:
> Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
> was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
> macro is not masking away the last two bits of the offset per the ISA,
> resulting in the generation of 'lwa' instruction instead of the intended
> 'ld' instruction.
> 
> Segher also pointed out that we can't simply mask away the last two bits
> as that will result in loading/storing from/to a memory location that
> was not intended.
> 
> This patch addresses this by using ldx/stdx if the offset is not
> word-aligned. We load the offset into a temporary register (TMP_REG_2)
> and use that as the index register in a subsequent ldx/stdx. We fix
> PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
> and PPC_BPF_STL() to factor in the offset value and generate the proper
> instruction sequence. We also convert all existing users of PPC_LD() and
> PPC_STD() to use these macros. All existing uses of these macros have
> been audited to ensure that TMP_REG_2 can be clobbered.
> 
> Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended 
> BPF")
> Cc: sta...@vger.kernel.org # v4.9+
> 
> Reported-by: Yauheni Kaliuta 
> Signed-off-by: Naveen N. Rao 

Applied, thanks!


Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-10 Thread Daniel Borkmann
On 12/10/2018 06:27 PM, Michael Roth wrote:
> Quoting Daniel Borkmann (2018-12-10 08:26:31)
>> On 12/07/2018 04:36 PM, Michael Roth wrote:
>>> Quoting Michael Ellerman (2018-12-07 06:31:13)
>>>> Michael Roth  writes:
>>>>
>>>>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
>>>>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
>>>>> and is adjusted again at init time if MODULES_VADDR is defined.
>>>>>
>>>>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
>>>>
>>>> But maybe it should be, I don't know why we don't define it.
>>>>
>>>>> the compile-time default at boot-time, which is 0x9c40 when
>>>>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
>>>>> value:
>>>>>
>>>>>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>>>>>   -1673527296
>>>>>
>>>>> and can cause various unexpected failures throughout the network
>>>>> stack. In one case `strace dhclient eth0` reported:
>>>>>
>>>>>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, 
>>>>> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
>>>>>
>>>>> and similar failures can be seen with tools like tcpdump. This doesn't
>>>>> always reproduce however, and I'm not sure why. The more consistent
>>>>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
>>>>> would time out on systemd/netplan configuring a virtio-net NIC with no
>>>>> noticeable errors in the logs.
>>>>>
>>>>> Fix this by limiting the compile-time default for bpf_jit_limit to
>>>>> INT_MAX.
>>>>
>>>> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
>>>> whether we should be using PAGE_SIZE here at all. I guess each BPF
>>>> program uses at least one page is the thinking?
>>>
>>> That seems to be the case, at least, the max number of minimum-sized
>>> allocations would be less on ppc64 since the allocations are always at
>>> least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
>>> so it seemed consistent to do that here too.
>>>
>>>>
>>>> Thanks for tracking this down. For some reason none of my ~10 test boxes
>>>> have hit this, perhaps I don't have new enough userspace?
>>>
>>> I'm not too sure, I would've thought things like the dhclient case in
>>> the commit log would fail every time, but sometimes I need to reboot the
>>> guest before I start seeing the behavior. Maybe there's something special
>>> about when JIT allocations are actually done that can affect
>>> reproducibility?
>>>
>>> In my case at least the virtio-net networking timeout was consistent
>>> enough for a bisect, but maybe it depends on the specific network
>>> configuration (single NIC, basic DHCP through netplan/systemd in my case).
>>>
>>>>
>>>> You don't mention why you needed to add BPF_MIN(), I assume because the
>>>> kernel version of min() has gotten too complicated to work here?
>>>
>>> I wasn't sure if it was safe here or not, so I tried looking at other
>>> users and came across:
>>>
>>> mm/vmalloc.c:777:#define VMAP_MIN(x, y)   ((x) < (y) ? (x) : 
>>> (y)) /* can't use min() */
>>>
>>> I'm not sure what the reasoning was (or whether it still applies), but I
>>> figured it was safer to do the same here. Maybe Nick still recalls?
>>>
>>>>
>>>> Daniel I assume you'll merge this via your tree?
>>>>
>>>> cheers
>>>>
>>>>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
>>>>> allocations")
>>>>> Cc: linuxppc-...@ozlabs.org
>>>>> Cc: Daniel Borkmann 
>>>>> Cc: Sandipan Das 
>>>>> Cc: Alexei Starovoitov 
>>>>> Signed-off-by: Michael Roth 
>>
>> Thanks for the reports / fixes and sorry for my late reply (bit too
>> swamped last week), some more thoughts below.
>>
>>>>>  kernel/bpf/core.c | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>>>> index b1a3545d0ec8..5

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-10 Thread Daniel Borkmann
On 12/07/2018 04:36 PM, Michael Roth wrote:
> Quoting Michael Ellerman (2018-12-07 06:31:13)
>> Michael Roth  writes:
>>
>>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
>>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
>>> and is adjusted again at init time if MODULES_VADDR is defined.
>>>
>>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
>>
>> But maybe it should be, I don't know why we don't define it.
>>
>>> the compile-time default at boot-time, which is 0x9c40 when
>>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit
>>> value:
>>>
>>>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>>>   -1673527296
>>>
>>> and can cause various unexpected failures throughout the network
>>> stack. In one case `strace dhclient eth0` reported:
>>>
>>>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 
>>> 16) = -1 ENOTSUPP (Unknown error 524)
>>>
>>> and similar failures can be seen with tools like tcpdump. This doesn't
>>> always reproduce however, and I'm not sure why. The more consistent
>>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
>>> would time out on systemd/netplan configuring a virtio-net NIC with no
>>> noticeable errors in the logs.
>>>
>>> Fix this by limiting the compile-time default for bpf_jit_limit to
>>> INT_MAX.
>>
>> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on
>> whether we should be using PAGE_SIZE here at all. I guess each BPF
>> program uses at least one page is the thinking?
> 
> That seems to be the case, at least, the max number of minimum-sized
> allocations would be less on ppc64 since the allocations are always at
> least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
> so it seemed consistent to do that here too.
> 
>>
>> Thanks for tracking this down. For some reason none of my ~10 test boxes
>> have hit this, perhaps I don't have new enough userspace?
> 
> I'm not too sure, I would've thought things like the dhclient case in
> the commit log would fail every time, but sometimes I need to reboot the
> guest before I start seeing the behavior. Maybe there's something special
> about when JIT allocations are actually done that can affect
> reproducibility?
> 
> In my case at least the virtio-net networking timeout was consistent
> enough for a bisect, but maybe it depends on the specific network
> configuration (single NIC, basic DHCP through netplan/systemd in my case).
> 
>>
>> You don't mention why you needed to add BPF_MIN(), I assume because the
>> kernel version of min() has gotten too complicated to work here?
> 
> I wasn't sure if it was safe here or not, so I tried looking at other
> users and came across:
> 
> mm/vmalloc.c:777:#define VMAP_MIN(x, y)   ((x) < (y) ? (x) : (y)) 
> /* can't use min() */
> 
> I'm not sure what the reasoning was (or whether it still applies), but I
> figured it was safer to do the same here. Maybe Nick still recalls?
> 
>>
>> Daniel I assume you'll merge this via your tree?
>>
>> cheers
>>
>>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
>>> allocations")
>>> Cc: linuxppc-...@ozlabs.org
>>> Cc: Daniel Borkmann 
>>> Cc: Sandipan Das 
>>> Cc: Alexei Starovoitov 
>>> Signed-off-by: Michael Roth 

Thanks for the reports / fixes and sorry for my late reply (bit too
swamped last week), some more thoughts below.

>>>  kernel/bpf/core.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index b1a3545d0ec8..55de4746cdfd 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>>>  }
>>>  
>>>  #ifdef CONFIG_BPF_JIT
>>> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
>>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
>>> +# define BPF_JIT_LIMIT_DEFAULT   BPF_MIN((PAGE_SIZE * 4), INT_MAX)
>>>  
>>>  /* All BPF JIT sysctl knobs here. */
>>>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);

I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
define also given for 4.21 arm64 will have its own dedicated area for
JIT allocations where neither the above limit nor the MOD

Re: [PATCH bpf] bpf: powerpc64: optimize JIT passes for bpf function calls

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 02:26 PM, Sandipan Das wrote:
> Hi Daniel,
> 
> On 03/12/18 6:18 PM, Daniel Borkmann wrote:
>>
>> Thanks for the patch, just to clarify, it's targeted at bpf-next and
>> not bpf, correct?
> 
> This patch is targeted at the bpf tree.

Ok, thanks for clarifying, applied to bpf!


Re: [PATCH bpf] bpf: powerpc64: optimize JIT passes for bpf function calls

2018-12-03 Thread Daniel Borkmann
Hi Sandipan,

On 12/03/2018 01:21 PM, Sandipan Das wrote:
> Once the JITed images for each function in a multi-function program
> are generated after the first three JIT passes, we only need to fix
> the target address for the branch instruction corresponding to each
> bpf-to-bpf function call.
> 
> This introduces the following optimizations for reducing the work
> done by the JIT compiler when handling multi-function programs:
> 
>   [1] Instead of doing two extra passes to fix the bpf function calls,
>   do just one as that would be sufficient.
> 
>   [2] During the extra pass, only overwrite the instruction sequences
>   for the bpf-to-bpf function calls as everything else would still
>   remain exactly the same. This also reduces the number of writes
>   to the JITed image.
> 
>   [3] Do not regenerate the prologue and the epilogue during the extra
>   pass as that would be redundant.
> 
> Signed-off-by: Sandipan Das 

Thanks for the patch, just to clarify, it's targeted at bpf-next and
not bpf, correct?

Thanks,
Daniel


Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 07:58 PM, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław 

Why you have empty commit messages for non-trivial changes like this in
4 out of 6 of your patches ...

How was it tested on the JITs you were changing? Did you test on both,
big and little endian machines?

> ---
>  net/core/filter.c | 40 +---
>  1 file changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e521c5ebc7d1..c151b906df53 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -296,22 +296,21 @@ static u32 convert_skb_access(int skb_field, int 
> dst_reg, int src_reg,
>   break;
>  
>   case SKF_AD_VLAN_TAG:
> - case SKF_AD_VLAN_TAG_PRESENT:
>   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
>  
>   /* dst_reg = *(u16 *) (src_reg + offsetof(vlan_tci)) */
>   *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg,
> offsetof(struct sk_buff, vlan_tci));
> - if (skb_field == SKF_AD_VLAN_TAG) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - /* dst_reg >>= 12 */
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 12);
> - /* dst_reg &= 1 */
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, ~VLAN_TAG_PRESENT);
> +#endif
> + break;
> + case SKF_AD_VLAN_TAG_PRESENT:
> + *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg, 
> PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
>   *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, 1);
> - }
>   break;
>   }
>  
> @@ -6140,19 +6139,22 @@ static u32 bpf_convert_ctx_access(enum 
> bpf_access_type type,
>   break;
>  
>   case offsetof(struct __sk_buff, vlan_present):
> + *target_size = 1;
> + *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg,
> +   PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> + break;
> +
>   case offsetof(struct __sk_buff, vlan_tci):
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
> -
>   *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> bpf_target_off(struct sk_buff, vlan_tci, 
> 2,
>target_size));
> - if (si->off == offsetof(struct __sk_buff, vlan_tci)) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 12);
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> - }
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 
> ~VLAN_TAG_PRESENT);
> +#endif
>   break;
>  
>   case offsetof(struct __sk_buff, cb[0]) ...
> 



Re: [PATCH net-next 0/6] Remove VLAN.CFI overload

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 10:47 PM, David Miller wrote:
> From: Michał Mirosław 
> Date: Sat, 10 Nov 2018 19:58:29 +0100
> 
>> Fix BPF code/JITs to allow for separate VLAN_PRESENT flag
>> storage and finally move the flag to separate storage in skbuff.
>>
>> This is final step to make CLAN.CFI transparent to core Linux
>> networking stack.
>>
>> An #ifdef is introduced temporarily to mark fragments masking
>> VLAN_TAG_PRESENT. This is removed altogether in the final patch.
> 
> Daniel and Alexei, please review.

Sorry, was completely swamped due to plumbers, just getting to it now.


Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code

2018-11-19 Thread Daniel Borkmann
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> This function is overridden by arches, some of which do not call
> bpf_jit_binary_free() to release the memory, and so the released
> memory is not accounted for, potentially leading to spurious allocation
> failures.
> 
> So replace the direct calls to module_memfree() in the arch code with
> calls to bpf_jit_binary_free().

Sorry but this patch is completely buggy, and above description on the
accounting incorrect as well. Looks like this patch was not tested at all.

The below cBPF JITs that use module_memfree() which you replace with
bpf_jit_binary_free() are using module_alloc() internally to get the JIT
image buffer ...

> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/mips/net/bpf_jit.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp64.c | 5 +
>  arch/sparc/net/bpf_jit_comp_32.c  | 2 +-
>  4 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 4d8cb9bb8365..1b69897274a1 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index d5bfe24bb3b5..a1ea1ea6b40d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 50b129785aee..84c8f013a6c6 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> *fp)
>  /* Overriding bpf_jit_free() as we don't set images read-only. */
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
> - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> - struct bpf_binary_header *bpf_hdr = (void *)addr;
> -
>   if (fp->jited)
> - bpf_jit_binary_free(bpf_hdr);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c 
> b/arch/sparc/net/bpf_jit_comp_32.c
> index a5ff88643d5c..01bda6bc9e7f 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -759,7 +759,7 @@ cond_branch:  f_offset = addrs[i + 
> filter[i].jf];
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> 



Re: [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs

2018-05-24 Thread Daniel Borkmann
On 05/24/2018 10:25 AM, Sandipan Das wrote:
> On 05/24/2018 01:04 PM, Daniel Borkmann wrote:
>> On 05/24/2018 08:56 AM, Sandipan Das wrote:
>>> For multi-function programs, loading the address of a callee
>>> function to a register requires emitting instructions whose
>>> count varies from one to five depending on the nature of the
>>> address.
>>>
>>> Since we come to know of the callee's address only before the
>>> extra pass, the number of instructions required to load this
>>> address may vary from what was previously generated. This can
>>> make the JITed image grow or shrink.
>>>
>>> To avoid this, we should generate a constant five-instruction
>>> when loading function addresses by padding the optimized load
>>> sequence with NOPs.
>>>
>>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 34 +++---
>>>  1 file changed, 23 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
>>> b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..e4582744a31d 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct 
>>> codegen_context *ctx)
>>>  
>>>  static void bpf_jit_emit_func_call(u32 *image, struct codegen_context 
>>> *ctx, u64 func)
>>>  {
>>> +   unsigned int i, ctx_idx = ctx->idx;
>>> +
>>> +   /* Load function address into r12 */
>>> +   PPC_LI64(12, func);
>>> +
>>> +   /* For bpf-to-bpf function calls, the callee's address is unknown
>>> +* until the last extra pass. As seen above, we use PPC_LI64() to
>>> +* load the callee's address, but this may optimize the number of
>>> +* instructions required based on the nature of the address.
>>> +*
>>> +* Since we don't want the number of instructions emitted to change,
>>> +* we pad the optimized PPC_LI64() call with NOPs to guarantee that
>>> +* we always have a five-instruction sequence, which is the maximum
>>> +* that PPC_LI64() can emit.
>>> +*/
>>> +   for (i = ctx->idx - ctx_idx; i < 5; i++)
>>> +   PPC_NOP();
>>
>> By the way, I think you can still optimize this. The nops are not really
>> needed in case of insn->src_reg != BPF_PSEUDO_CALL since the address of
>> a normal BPF helper call will always be at a fixed location and known a
>> priori.
> 
> Ah, true. Thanks for pointing this out. There are a few other things that
> we are planning to do for the ppc64 JIT compiler. Will put out a patch for
> this with that series.

Awesome, thanks Sandipan!


Re: [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs

2018-05-24 Thread Daniel Borkmann
On 05/24/2018 08:56 AM, Sandipan Das wrote:
> For multi-function programs, loading the address of a callee
> function to a register requires emitting instructions whose
> count varies from one to five depending on the nature of the
> address.
> 
> Since we come to know of the callee's address only before the
> extra pass, the number of instructions required to load this
> address may vary from what was previously generated. This can
> make the JITed image grow or shrink.
> 
> To avoid this, we should generate a constant five-instruction
> when loading function addresses by padding the optimized load
> sequence with NOPs.
> 
> Signed-off-by: Sandipan Das 
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 34 +++---
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 1bdb1aff0619..e4582744a31d 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct 
> codegen_context *ctx)
>  
>  static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, 
> u64 func)
>  {
> + unsigned int i, ctx_idx = ctx->idx;
> +
> + /* Load function address into r12 */
> + PPC_LI64(12, func);
> +
> + /* For bpf-to-bpf function calls, the callee's address is unknown
> +  * until the last extra pass. As seen above, we use PPC_LI64() to
> +  * load the callee's address, but this may optimize the number of
> +  * instructions required based on the nature of the address.
> +  *
> +  * Since we don't want the number of instructions emitted to change,
> +  * we pad the optimized PPC_LI64() call with NOPs to guarantee that
> +  * we always have a five-instruction sequence, which is the maximum
> +  * that PPC_LI64() can emit.
> +  */
> + for (i = ctx->idx - ctx_idx; i < 5; i++)
> + PPC_NOP();

By the way, I think you can still optimize this. The nops are not really
needed in case of insn->src_reg != BPF_PSEUDO_CALL since the address of
a normal BPF helper call will always be at a fixed location and known a
priori.

>  #ifdef PPC64_ELF_ABI_v1
> - /* func points to the function descriptor */
> - PPC_LI64(b2p[TMP_REG_2], func);
> - /* Load actual entry point from function descriptor */
> - PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
> - /* ... and move it to LR */
> - PPC_MTLR(b2p[TMP_REG_1]);
>   /*
>* Load TOC from function descriptor at offset 8.
>* We can clobber r2 since we get called through a
>* function pointer (so caller will save/restore r2)
>* and since we don't use a TOC ourself.
>*/
> - PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
> -#else
> - /* We can clobber r12 */
> - PPC_FUNC_ADDR(12, func);
> - PPC_MTLR(12);
> + PPC_BPF_LL(2, 12, 8);
> + /* Load actual entry point from function descriptor */
> + PPC_BPF_LL(12, 12, 0);
>  #endif
> +
> + PPC_MTLR(12);
>   PPC_BLRL();
>  }
>  
> 



Re: [PATCH bpf-next v4 00/10] bpf: enhancements for multi-function programs

2018-05-24 Thread Daniel Borkmann
On 05/24/2018 08:56 AM, Sandipan Das wrote:
> [1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.
> 
> [2] Provide a way for resolving function calls because of the way JITed
> images are allocated in powerpc64.
> 
> [3] Fix to get JITed instruction dumps for multi-function programs from
> the bpf system call.
> 
> [4] Fix for bpftool to show delimited multi-function JITed image dumps.
> 
> v4:
>  - Incorporate review comments from Jakub.
>  - Fix JSON output for bpftool.
> 
> v3:
>  - Change base tree tag to bpf-next.
>  - Incorporate review comments from Alexei, Daniel and Jakub.
>  - Make sure that the JITed image does not grow or shrink after
>the last pass due to the way the instruction sequence used
>to load a callee's address maybe optimized.
>  - Make additional changes to the bpf system call and bpftool to
>make multi-function JITed dumps easier to correlate.
> 
> v2:
>  - Incorporate review comments from Jakub.
> 
> Sandipan Das (10):
>   bpf: support 64-bit offsets for bpf function calls
>   bpf: powerpc64: pad function address loads with NOPs
>   bpf: powerpc64: add JIT support for multi-function programs
>   bpf: get kernel symbol addresses via syscall
>   tools: bpf: sync bpf uapi header
>   tools: bpftool: resolve calls without using imm field
>   bpf: fix multi-function JITed dump obtained via syscall
>   bpf: get JITed image lengths of functions via syscall
>   tools: bpf: sync bpf uapi header
>   tools: bpftool: add delimiters to multi-function JITed dumps
> 
>  arch/powerpc/net/bpf_jit_comp64.c | 110 
> ++
>  include/uapi/linux/bpf.h  |   4 ++
>  kernel/bpf/syscall.c  |  82 ++--
>  kernel/bpf/verifier.c |  22 +---
>  tools/bpf/bpftool/prog.c  |  97 -
>  tools/bpf/bpftool/xlated_dumper.c |  14 +++--
>  tools/bpf/bpftool/xlated_dumper.h |   3 ++
>  tools/include/uapi/linux/bpf.h|   4 ++
>  8 files changed, 301 insertions(+), 35 deletions(-)

Applied to bpf-next, thanks a lot Sandipan!


Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps

2018-05-23 Thread Daniel Borkmann
On 05/23/2018 12:37 PM, Sandipan Das wrote:
[...]
> Other than that, for powerpc64, there is a problem with the way the
> binutils disassembler code (in "opcodes/ppc-dis.c") passes arguments
> to the callback fprintf_json().
> 
> In fprintf_json(), we always expect the va_list elements to resolve
> to strings (char *). But for powerpc64, the register or immediate
> operands are always passed as integers. So, when the code attempts
> to resolve these operands using va_arg(ap, char *), bpftool crashes.
> For now, I am using a workaround based on vsnprintf() but this does
> not get the semantics correct for memory operands. You can probably
> see that for the store instructions in the JSON dump above this.
> 
> Daniel,
> 
> Would it be okay if I send out a fix for this in a different series?

I'm fine either way with regards to the fix. Feels like a portability bug
in the binutils disassembler?

We could probably have a feature test like in test-disassembler-four-args
and select a workaround in bpftool based on that outcome.

Thanks Sandipan!

  [1] tools/build/feature/test-disassembler-four-args.c


Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps

2018-05-23 Thread Daniel Borkmann
On 05/22/2018 09:55 PM, Jakub Kicinski wrote:
> On Tue, 22 May 2018 22:46:13 +0530, Sandipan Das wrote:
>> +if (info.nr_jited_func_lens && info.jited_func_lens) {
>> +struct kernel_sym *sym = NULL;
>> +unsigned char *img = buf;
>> +__u64 *ksyms = NULL;
>> +__u32 *lens;
>> +__u32 i;
>> +
>> +if (info.nr_jited_ksyms) {
>> +kernel_syms_load();
>> +ksyms = (__u64 *) info.jited_ksyms;
>> +}
>> +
>> +lens = (__u32 *) info.jited_func_lens;
>> +for (i = 0; i < info.nr_jited_func_lens; i++) {
>> +if (ksyms) {
>> +sym = kernel_syms_search(, ksyms[i]);
>> +if (sym)
>> +printf("%s:\n", sym->name);
>> +else
>> +printf("%016llx:\n", ksyms[i]);
>> +}
>> +
>> +disasm_print_insn(img, lens[i], opcodes, name);
>> +img += lens[i];
>> +printf("\n");
>> +}
>> +} else {
> 
> The output doesn't seem to be JSON-compatible :(  We try to make sure
> all bpftool command can produce valid JSON when run with -j (or -p)
> switch.
> 
> Would it be possible to make each function a separate JSON object with
> "name" and "insn" array?  Would that work?

Sandipan, could you take a look at this? Given there's json output today we
should definitely try not to break it; presumably this would be one final
respin of your series with this fixed.

Thanks,
Daniel


Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall

2018-05-22 Thread Daniel Borkmann
On 05/21/2018 09:42 PM, Sandipan Das wrote:
> On 05/18/2018 09:21 PM, Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> Currently, for multi-function programs, we cannot get the JITed
>>> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
>>> command. Because of this, userspace tools such as bpftool fail
>>> to identify a multi-function program as being JITed or not.
>>>
>>> With the JIT enabled and the test program running, this can be
>>> verified as follows:
>>>
>>>   # cat /proc/sys/net/core/bpf_jit_enable
>>>   1
>>>
>>> Before applying this patch:
>>>
>>>   # bpftool prog list
>>>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>>>   loaded_at 2018-05-16T11:43:38+0530  uid 0
>>>   xlated 216B  not jited  memlock 65536B
>>>   ...
>>>
>>>   # bpftool prog dump jited id 1
>>>   no instructions returned
>>>
>>> After applying this patch:
>>>
>>>   # bpftool prog list
>>>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>>>   loaded_at 2018-05-16T12:13:01+0530  uid 0
>>>   xlated 216B  jited 308B  memlock 65536B
>>>   ...
>>
>> That's really nice! One comment inline below:
>>
>>>   # bpftool prog dump jited id 1
>>>  0:   nop
>>>  4:   nop
>>>  8:   mflrr0
>>>  c:   std r0,16(r1)
>>> 10:   stdur1,-112(r1)
>>> 14:   std r31,104(r1)
>>> 18:   addir31,r1,48
>>> 1c:   li  r3,10
>>>   ...
>>>
>>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
>>> ---
>>>  kernel/bpf/syscall.c | 38 --
>>>  1 file changed, 32 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 54a72fafe57c..2430d159078c 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
>>> *prog,
>>> struct bpf_prog_info info = {};
>>> u32 info_len = attr->info.info_len;
>>> char __user *uinsns;
>>> -   u32 ulen;
>>> +   u32 ulen, i;
>>> int err;
>>>  
>>> err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
>>> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
>>> *prog,
>>> ulen = min_t(u32, info.nr_map_ids, ulen);
>>> if (ulen) {
>>> u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
>>> -   u32 i;
>>>  
>>> for (i = 0; i < ulen; i++)
>>> if (put_user(prog->aux->used_maps[i]->id,
>>> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
>>> *prog,
>>>  * for offload.
>>>  */
>>> ulen = info.jited_prog_len;
>>> -   info.jited_prog_len = prog->jited_len;
>>> +   if (prog->aux->func_cnt) {
>>> +   info.jited_prog_len = 0;
>>> +   for (i = 0; i < prog->aux->func_cnt; i++)
>>> +   info.jited_prog_len += prog->aux->func[i]->jited_len;
>>> +   } else {
>>> +   info.jited_prog_len = prog->jited_len;
>>> +   }
>>> +
>>> if (info.jited_prog_len && ulen) {
>>> if (bpf_dump_raw_ok()) {
>>> uinsns = u64_to_user_ptr(info.jited_prog_insns);
>>> ulen = min_t(u32, info.jited_prog_len, ulen);
>>> -   if (copy_to_user(uinsns, prog->bpf_func, ulen))
>>> -   return -EFAULT;
>>> +
>>> +   /* for multi-function programs, copy the JITed
>>> +* instructions for all the functions
>>> +*/
>>> +   if (prog->aux->func_cnt) {
>>> +   u32 len, free;
>>> +   u8 *img;
>>> +
>>> +   free = ulen;
>>> +   for (i = 0; i < prog->aux->func_cnt; i++) {
>>> +   len = prog->aux->func[i]->jited_len;
>>> +   img = (u8 *) 
>>> prog->aux->func[i]->bpf_func;
>>> +   if (len > free)
>>> +   break;
>>> +   if (copy_to_user(uinsns, img, len))
>>> +   return -EFAULT;
>>> +   uinsns += len;
>>> +   free -= len;
>>
>> Is there any way we can introduce a delimiter between the different
>> images such that they could be more easily correlated with the call
>> from the main (or other sub-)program instead of having one contiguous
>> dump blob?
> 
> Can we have another member in bpf_prog_info that points to a list of the 
> lengths of the
> JITed images for each subprogram? We can use this information to split up the 
> dump.

Seems okay to me.

Thanks,
Daniel


Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 06:05 PM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> This adds support for bpf-to-bpf function calls in the powerpc64
>>> JIT compiler. The JIT compiler converts the bpf call instructions
>>> to native branch instructions. After a round of the usual passes,
>>> the start addresses of the JITed images for the callee functions
>>> are known. Finally, to fixup the branch target addresses, we need
>>> to perform an extra pass.
>>>
>>> Because of the address range in which JITed images are allocated
>>> on powerpc64, the offsets of the start addresses of these images
>>> from __bpf_call_base are as large as 64 bits. So, for a function
>>> call, we cannot use the imm field of the instruction to determine
>>> the callee's address. Instead, we use the alternative method of
>>> getting it from the list of function addresses in the auxillary
>>> data of the caller by using the off field as an index.
>>>
>>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 79 
>>> ++-
>>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
>>> b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..25939892d8f7 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
>>> codegen_context *ctx, u32
>>>  /* Assemble the body code between the prologue & epilogue */
>>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>    struct codegen_context *ctx,
>>> -  u32 *addrs)
>>> +  u32 *addrs, bool extra_pass)
>>>  {
>>>  const struct bpf_insn *insn = fp->insnsi;
>>>  int flen = fp->len;
>>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, 
>>> u32 *image,
>>>  break;
>>>  
>>>  /*
>>> - * Call kernel helper
>>> + * Call kernel helper or bpf function
>>>   */
>>>  case BPF_JMP | BPF_CALL:
>>>  ctx->seen |= SEEN_FUNC;
>>> -    func = (u8 *) __bpf_call_base + imm;
>>> +
>>> +    /* bpf function call */
>>> +    if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
>>
>> Perhaps it might make sense here for !extra_pass to set func to some dummy
>> address as otherwise the 'kernel helper call' branch used for this is a bit
>> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
>> optimizes the immediate addr, I presume the JIT can handle situations where
>> in the final extra_pass the image needs to grow/shrink again (due to 
>> different
>> final address for the call)?
> 
> That's a good catch. We don't handle that -- we expect to get the size right 
> on first pass. We could probably have PPC_FUNC_ADDR() pad the result with 
> nops to make it a constant 5-instruction sequence.

Yeah, arm64 does something similar by not optimizing the imm in order to always
emit 4 insns for it.

Thanks,
Daniel


Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>   loaded_at 2018-05-16T11:43:38+0530  uid 0
>   xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>   loaded_at 2018-05-16T12:13:01+0530  uid 0
>   xlated 216B  jited 308B  memlock 65536B
>   ...

That's really nice! One comment inline below:

>   # bpftool prog dump jited id 1
>  0:   nop
>  4:   nop
>  8:   mflrr0
>  c:   std r0,16(r1)
> 10:   stdur1,-112(r1)
> 14:   std r31,104(r1)
> 18:   addir31,r1,48
> 1c:   li  r3,10
>   ...
> 
> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/syscall.c | 38 --
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 54a72fafe57c..2430d159078c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   struct bpf_prog_info info = {};
>   u32 info_len = attr->info.info_len;
>   char __user *uinsns;
> - u32 ulen;
> + u32 ulen, i;
>   int err;
>  
>   err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   ulen = min_t(u32, info.nr_map_ids, ulen);
>   if (ulen) {
>   u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
> - u32 i;
>  
>   for (i = 0; i < ulen; i++)
>   if (put_user(prog->aux->used_maps[i]->id,
> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>* for offload.
>*/
>   ulen = info.jited_prog_len;
> - info.jited_prog_len = prog->jited_len;
> + if (prog->aux->func_cnt) {
> + info.jited_prog_len = 0;
> + for (i = 0; i < prog->aux->func_cnt; i++)
> + info.jited_prog_len += prog->aux->func[i]->jited_len;
> + } else {
> + info.jited_prog_len = prog->jited_len;
> + }
> +
>   if (info.jited_prog_len && ulen) {
>   if (bpf_dump_raw_ok()) {
>   uinsns = u64_to_user_ptr(info.jited_prog_insns);
>   ulen = min_t(u32, info.jited_prog_len, ulen);
> - if (copy_to_user(uinsns, prog->bpf_func, ulen))
> - return -EFAULT;
> +
> + /* for multi-function programs, copy the JITed
> +  * instructions for all the functions
> +  */
> + if (prog->aux->func_cnt) {
> + u32 len, free;
> + u8 *img;
> +
> + free = ulen;
> + for (i = 0; i < prog->aux->func_cnt; i++) {
> + len = prog->aux->func[i]->jited_len;
> + img = (u8 *) 
> prog->aux->func[i]->bpf_func;
> + if (len > free)
> + break;
> + if (copy_to_user(uinsns, img, len))
> + return -EFAULT;
> + uinsns += len;
> + free -= len;

Is there any way we can introduce a delimiter between the different
images such that they could be more easily correlated with the call
from the main (or other sub-)program instead of having one contiguous
dump blob?

> + }
> + } else {
> + if (copy_to_user(uinsns, prog->bpf_func, ulen))
> + return -EFAULT;
> + }
>   } else {
>   info.jited_prog_insns = 0;
>   }
> @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   if (info.nr_jited_ksyms && ulen) {
>   u64 __user *user_jited_ksyms = 
> u64_to_user_ptr(info.jited_ksyms);
>   ulong ksym_addr;
> - u32 i;
>  
>   /* copy the address of the kernel symbol corresponding to
>

Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds new two new fields to struct bpf_prog_info. For
> multi-function programs, these fields can be used to pass
> a list of kernel symbol addresses for all functions in a
> given program and to userspace using the bpf system call
> with the BPF_OBJ_GET_INFO_BY_FD command.
> 
> When bpf_jit_kallsyms is enabled, we can get the address
> of the corresponding kernel symbol for a callee function
> and resolve the symbol's name. The address is determined
> by adding the value of the call instruction's imm field
> to __bpf_call_base. This offset gets assigned to the imm
> field by the verifier.
> 
> For some architectures, such as powerpc64, the imm field
> is not large enough to hold this offset.
> 
> We resolve this by:
> 
> [1] Assigning the subprog id to the imm field of a call
> instruction in the verifier instead of the offset of
> the callee's symbol's address from __bpf_call_base.
> 
> [2] Determining the address of a callee's corresponding
> symbol by using the imm field as an index for the
> list of kernel symbol addresses now available from
> the program info.
> 
> Suggested-by: Daniel Borkmann <dan...@iogearbox.net>
> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/syscall.c | 20 
>  kernel/bpf/verifier.c|  7 +--
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>   __u32 xlated_prog_len;
>   __aligned_u64 jited_prog_insns;
>   __aligned_u64 xlated_prog_insns;
> + __aligned_u64 jited_ksyms;
> + __u32 nr_jited_ksyms;
>   __u64 load_time;/* ns since boottime */
>   __u32 created_by_uid;
>   __u32 nr_map_ids;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bfcde949c7f8..54a72fafe57c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   if (!capable(CAP_SYS_ADMIN)) {
>   info.jited_prog_len = 0;
>   info.xlated_prog_len = 0;
> + info.nr_jited_ksyms = 0;
>   goto done;
>   }
>  
> @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   }
>   }
>  
> + ulen = info.nr_jited_ksyms;
> + info.nr_jited_ksyms = prog->aux->func_cnt;
> + if (info.nr_jited_ksyms && ulen) {

Since this exposes addresses (though masked one which is correct), this
definitely needs to be guarded with bpf_dump_raw_ok() like we do in other
places here (see JIT dump for example).

> + u64 __user *user_jited_ksyms = 
> u64_to_user_ptr(info.jited_ksyms);
> + ulong ksym_addr;
> + u32 i;
> +
> + /* copy the address of the kernel symbol corresponding to
> +  * each function
> +  */
> + ulen = min_t(u32, info.nr_jited_ksyms, ulen);
> + for (i = 0; i < ulen; i++) {
> + ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> + ksym_addr &= PAGE_MASK;
> + if (put_user((u64) ksym_addr, _jited_ksyms[i]))
> + return -EFAULT;
> + }
> + }
> +
>  done:
>   if (copy_to_user(uinfo, , info_len) ||
>   put_user(info_len, >info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6c56cce9c4e3..e826c396aba2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>* later look the same as if they were interpreted only.
>*/
>   for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
> - unsigned long addr;
> -
>   if (insn->code != (BPF_JMP | BPF_CALL) ||
>   insn->src_reg != BPF_PSEUDO_CALL)
>   continue;
>   insn->off = env->insn_aux_data[i].call_imm;
>   subprog = find_subprog(env, i + insn->off + 1);
> - addr  = (unsigned long)func[subprog]->bpf_func;

Hmm, in current bpf tree this says 'subprog + 1' here, so this is not
rebased against bpf tree but bpf-next (unlike what subject says)?

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351

> - addr &= PAGE_MASK;
> - insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
> - addr - __bpf_call_base;
> + insn->imm = subprog;
>   }
>  
>   prog->jited = 1;
> 



Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> The imm field of a bpf instruction is a signed 32-bit integer.
> For JIT bpf-to-bpf function calls, it stores the offset of the
> start address of the callee's JITed image from __bpf_call_base.
> 
> For some architectures, such as powerpc64, this offset may be
> as large as 64 bits and cannot be accomodated in the imm field
> without truncation.
> 
> We resolve this by:
> 
> [1] Additionally using the auxillary data of each function to
> keep a list of start addresses of the JITed images for all
> functions determined by the verifier.
> 
> [2] Retaining the subprog id inside the off field of the call
> instructions and using it to index into the list mentioned
> above and lookup the callee's address.
> 
> To make sure that the existing JIT compilers continue to work
> without requiring changes, we keep the imm field as it is.
> 
> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/verifier.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a9e4b1372da6..6c56cce9c4e3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>   insn->src_reg != BPF_PSEUDO_CALL)
>   continue;
>   subprog = insn->off;
> - insn->off = 0;
>   insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>   func[subprog]->bpf_func -
>   __bpf_call_base;
>   }
> +
> + /* we use the aux data to keep a list of the start addresses
> +  * of the JITed images for each function in the program
> +  *
> +  * for some architectures, such as powerpc64, the imm field
> +  * might not be large enough to hold the offset of the start
> +  * address of the callee's JITed image from __bpf_call_base
> +  *
> +  * in such cases, we can lookup the start address of a callee
> +  * by using its subprog id, available from the off field of
> +  * the call instruction, as an index for this list
> +  */
> + func[i]->aux->func = func;
> + func[i]->aux->func_cnt = env->subprog_cnt + 1;

The target tree you have here is infact bpf, since in bpf-next there was a
cleanup where the + 1 is removed. Just for the record that we need to keep
this in mind for bpf into bpf-next merge since this would otherwise subtly
break.

>   }
>   for (i = 0; i < env->subprog_cnt; i++) {
>   old_bpf_func = func[i]->bpf_func;
> 



Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds support for bpf-to-bpf function calls in the powerpc64
> JIT compiler. The JIT compiler converts the bpf call instructions
> to native branch instructions. After a round of the usual passes,
> the start addresses of the JITed images for the callee functions
> are known. Finally, to fixup the branch target addresses, we need
> to perform an extra pass.
> 
> Because of the address range in which JITed images are allocated
> on powerpc64, the offsets of the start addresses of these images
> from __bpf_call_base are as large as 64 bits. So, for a function
> call, we cannot use the imm field of the instruction to determine
> the callee's address. Instead, we use the alternative method of
> getting it from the list of function addresses in the auxillary
> data of the caller by using the off field as an index.
> 
> Signed-off-by: Sandipan Das 
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 79 
> ++-
>  1 file changed, 69 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 1bdb1aff0619..25939892d8f7 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
> codegen_context *ctx, u32
>  /* Assemble the body code between the prologue & epilogue */
>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
> struct codegen_context *ctx,
> -   u32 *addrs)
> +   u32 *addrs, bool extra_pass)
>  {
>   const struct bpf_insn *insn = fp->insnsi;
>   int flen = fp->len;
> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
> *image,
>   break;
>  
>   /*
> -  * Call kernel helper
> +  * Call kernel helper or bpf function
>*/
>   case BPF_JMP | BPF_CALL:
>   ctx->seen |= SEEN_FUNC;
> - func = (u8 *) __bpf_call_base + imm;
> +
> + /* bpf function call */
> + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)

Perhaps it might make sense here for !extra_pass to set func to some dummy
address as otherwise the 'kernel helper call' branch used for this is a bit
misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
optimizes the immediate addr, I presume the JIT can handle situations where
in the final extra_pass the image needs to grow/shrink again (due to different
final address for the call)?

> + if (fp->aux->func && off < fp->aux->func_cnt)
> + /* use the subprog id from the off
> +  * field to lookup the callee address
> +  */
> + func = (u8 *) 
> fp->aux->func[off]->bpf_func;
> + else
> + return -EINVAL;
> + /* kernel helper call */
> + else
> + func = (u8 *) __bpf_call_base + imm;
>  
>   bpf_jit_emit_func_call(image, ctx, (u64)func);
>  
> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
> *image,
>   return 0;
>  }
>  
> +struct powerpc64_jit_data {
> + struct bpf_binary_header *header;
> + u32 *addrs;
> + u8 *image;
> + u32 proglen;
> + struct codegen_context ctx;
> +};
> +
>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  {
>   u32 proglen;
> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   u8 *image = NULL;
>   u32 *code_base;
>   u32 *addrs;
> + struct powerpc64_jit_data *jit_data;
>   struct codegen_context cgctx;
>   int pass;
>   int flen;
> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   struct bpf_prog *org_fp = fp;
>   struct bpf_prog *tmp_fp;
>   bool bpf_blinded = false;
> + bool extra_pass = false;
>  
>   if (!fp->jit_requested)
>   return org_fp;
> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   fp = tmp_fp;
>   }
>  
> + jit_data = fp->aux->jit_data;
> + if (!jit_data) {
> + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
> + if (!jit_data) {
> + fp = org_fp;
> + goto out;
> + }
> + fp->aux->jit_data = jit_data;
> + }
> +
>   flen = fp->len;
> + addrs = jit_data->addrs;
> + if (addrs) {
> + cgctx = jit_data->ctx;
> + image = jit_data->image;
> + bpf_hdr = jit_data->header;
> + proglen 

Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls

2018-02-27 Thread Daniel Borkmann
On 02/27/2018 01:13 PM, Sandipan Das wrote:
> "Naveen N. Rao" wrote:
>> I'm wondering if we can instead encode the bpf prog id in
>> imm32. That way, we should be able to indicate the BPF
>> function being called into.  Daniel, is that something we
>> can consider?
> 
> Since each subprog does not get a separate id, we cannot fetch
> the fd and therefore the tag of a subprog. Instead we can use
> the tag of the complete program as shown below.
> 
> "Daniel Borkmann" wrote:
>> I think one limitation that would still need to be addressed later
>> with such approach would be regarding the xlated prog dump in bpftool,
>> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
>> of maps and helpers in dump"). Any ideas for this (potentially if we
>> could use off + imm for calls, we'd get to 48 bits, but that seems
>> still not be enough as you say)?
> 
> As an alternative, this is what I am thinking of:
> 
> Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
> bpftool looks up the name of the corresponding symbol for the
> JIT-ed subprogram and shows it in the xlated dump alongside the
> actual call instruction. However, the lookup is based on the
> target address which is calculated using the imm field of the
> instruction. So, once again, if imm is truncated, we will end
> up with the wrong address. Also, the subprog aux data (which
> has been proposed as a mitigation for this) is not accessible
> from this tool.
> 
> We can still access the tag for the complete bpf program and use
> this with the correct offset in an objdump-like notation as an
> alterative for the name of the subprog that is the target of a
> bpf-to-bpf call instruction.
> 
> Currently, an xlated dump looks like this:
>0: (85) call pc+2#bpf_prog_5f76847930402518_F
>1: (b7) r0 = 1
>2: (95) exit
>3: (b7) r0 = 2
>4: (95) exit
> 
> With this patch, it will look like this:
>0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3

(Note the +2 is the insn->off already.)

>1: (b7) r0 = 1
>2: (95) exit
>3: (b7) r0 = 2
>4: (95) exit
> 
> where 8f85936f29a7790a is the tag of the bpf program and 3 is
> the offset to the start of the subprog from the start of the
> program.

The problem with this approach would be that right now the name is
something like bpf_prog_5f76847930402518_F where the subprog tag is
just a placeholder so in future, this may well adapt to e.g. the actual
function name from the elf file. Note that when kallsyms is enabled
then a name like bpf_prog_5f76847930402518_F will also appear in stack
traces, perf records, etc, so for correlation/debugging it would really
help to have them the same everywhere.

Worst case if there's nothing better, potentially what one could do in
bpf_prog_get_info_by_fd() is to dump an array of full addresses and
have the imm part as the index pointing to one of them, just unfortunate
that it's likely only needed in ppc64.


Re: [PATCH] bpf, powerpc: fix jit for seccomp_data access

2018-02-20 Thread Daniel Borkmann
On 02/21/2018 01:33 AM, Michael Ellerman wrote:
> Mark Lord  writes:
> 
>> I am using SECCOMP to filter syscalls on a ppc32 platform,
>> and noticed that the JIT compiler was failing on the BPF
>> even though the interpreter was working fine.
>>
>> The issue was that the compiler was missing one of the instructions
>> used by SECCOMP, so here is a patch to enable JIT for that instruction.
>>
>> Signed-Off-By:  Mark Lord 
> 
> Thanks.
> 
> What is the failure mode without this patch? I assume when you have the
> JIT enabled the seccomp filter fails to load entirely? Or do we have
> logic to fall back to the interpreter?

The logic for all JITs is that once a BPF insn cannot be JITed then it falls
back to BPF interpreter. Here, since ppc32 is cBPF the path is: cBPF insn ->
cBPF JIT -> fail -> migrate cBPF to eBPF -> run in eBPF interpreter. In the
case where interpreter is compiled out (CONFIG_BPF_JIT_ALWAYS_ON), then the
filter is rejected.

> Either way we should probably back port to stable. I assume this has
> been broken since we originally added 32-bit support, so:

Note that arm32 before it was converted to be an eBPF JIT (eBPF JITs do handle
seccomp BPF in any case) was the only cBPF JIT that had it 'JIT-able', so
currently, other cBPF ones like sparc32 or mips32 don't translate it either.
Meaning, it would be nice to have as feature; I wouldn't necessarily frame
it as a bug though (or at least a stable-urgent one, imho, but I leave that
to you, of course).

> Fixes: eb84bab0fb38 ("ppc: Kconfig: Enable BPF JIT on ppc32")
> Cc: sta...@vger.kernel.org # v4.1+
> 
> cheers
> 
> 
>> --- old/arch/powerpc/net/bpf_jit_comp.c 2018-02-16 14:07:01.0 -0500
>> +++ linux/arch/powerpc/net/bpf_jit_comp.c   2018-02-20 
>> 14:41:20.805227494 -0500
>> @@ -329,6 +329,9 @@ static int bpf_jit_build_body(struct bpf
>> BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
>> PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, 
>> len));
>> break;
>> +   case BPF_LDX | BPF_W | BPF_ABS: /* A = *((u32 
>> *)(seccomp_data + K)); */
>> +   PPC_LWZ_OFFS(r_A, r_skb, K);
>> +   break;
>> case BPF_LDX | BPF_W | BPF_LEN: /* X = skb->len; */
>> PPC_LWZ_OFFS(r_X, r_skb, offsetof(struct sk_buff, 
>> len));
>> break;
>> -- 
>> Mark Lord
>> Real-Time Remedies Inc.
>> ml...@pobox.com



Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-15 Thread Daniel Borkmann
On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>> The imm field of a bpf_insn is a signed 32-bit integer. For
>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>> __bpf_call_base to the start of the callee function.
>>
>> For some architectures, such as powerpc64, it was found that
>> this offset may be as large as 64 bits because of which this
>> cannot be accomodated in the imm field without truncation.
>>
>> To resolve this, we additionally make aux->func within each
>> bpf_prog associated with the functions to point to the list
>> of all function addresses determined by the verifier.
>>
>> We keep the value assigned to the off field of the bpf_insn
>> as a way to index into aux->func and also set aux->func_cnt
>> so that this can be used for performing basic upper bound
>> checks for the off field.
>>
>> Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
>> ---
>> v2: Make aux->func point to the list of functions determined
>> by the verifier rather than allocating a separate callee
>> list for each function.
> 
> Approach looks good to me; do you know whether s390x JIT would
> have similar requirement? I think one limitation that would still
> need to be addressed later with such approach would be regarding the
> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> ("bpf: allow for correlation of maps and helpers in dump"). Any
> ideas for this (potentially if we could use off + imm for calls,
> we'd get to 48 bits, but that seems still not be enough as you say)?

One other random thought, although I'm not sure how feasible this
is for ppc64 JIT to realize ... but idea would be to have something
like the below:

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29ca920..daa7258 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
*value, char *type,
return ret;
 }

+void * __weak bpf_jit_image_alloc(unsigned long size)
+{
+   return module_alloc(size);
+}
+
 struct bpf_binary_header *
 bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 unsigned int alignment,
@@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 * random section of illegal instructions.
 */
size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
-   hdr = module_alloc(size);
+   hdr = bpf_jit_image_alloc(size);
if (hdr == NULL)
return NULL;

And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
like some archs would override the module_alloc() helper through a
custom implementation, usually via __vmalloc_node_range(), so we
could perhaps fit the range for BPF JITed images in a way that they
could use the 32bit imm in the end? There are not that many progs
loaded typically, so the range could be a bit narrower in such case
anyway. (Not sure if this would work out though, but I thought to
bring it up.)

Thanks,
Daniel


Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-15 Thread Daniel Borkmann
On 02/13/2018 05:05 AM, Sandipan Das wrote:
> The imm field of a bpf_insn is a signed 32-bit integer. For
> JIT-ed bpf-to-bpf function calls, it stores the offset from
> __bpf_call_base to the start of the callee function.
> 
> For some architectures, such as powerpc64, it was found that
> this offset may be as large as 64 bits because of which this
> cannot be accomodated in the imm field without truncation.
> 
> To resolve this, we additionally make aux->func within each
> bpf_prog associated with the functions to point to the list
> of all function addresses determined by the verifier.
> 
> We keep the value assigned to the off field of the bpf_insn
> as a way to index into aux->func and also set aux->func_cnt
> so that this can be used for performing basic upper bound
> checks for the off field.
> 
> Signed-off-by: Sandipan Das 
> ---
> v2: Make aux->func point to the list of functions determined
> by the verifier rather than allocating a separate callee
> list for each function.

Approach looks good to me; do you know whether s390x JIT would
have similar requirement? I think one limitation that would still
need to be addressed later with such approach would be regarding the
xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
("bpf: allow for correlation of maps and helpers in dump"). Any
ideas for this (potentially if we could use off + imm for calls,
we'd get to 48 bits, but that seems still not be enough as you say)?

Thanks,
Daniel


Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Daniel Borkmann

On 09/01/2017 08:53 PM, Sandipan Das wrote:

Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das 


Awesome, thanks for following up! :)


Re: [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer

2017-01-13 Thread Daniel Borkmann

On 01/13/2017 06:10 PM, Naveen N. Rao wrote:

With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
the rest of the space with illegal instructions to mitigate BPF spraying
attacks, while having the actual JIT'ed BPF program at a random location
within the allocated space. Under this scenario, it would be better to
flush the entire allocated buffer rather than just the part containing
the actual program. We already flush the buffer from start to the end of
the BPF program. Extend this to include the illegal instructions after
the BPF program.

Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Daniel Borkmann

On 09/26/2016 10:56 AM, Naveen N. Rao wrote:

On 2016/09/24 03:30AM, Alexei Starovoitov wrote:

On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Thanks for adding support, Naveen, that's really great! I think 2) seems
fine as well in this context as prologue size can vary quite a bit here,
and depending on program types likelihood of tail call usage as well (but
I wouldn't expect deep nesting). Thanks a lot!


Great stuff. In this circumstances approach 2 makes sense to me as well.


Alexie, Daniel,
Thanks for the quick review!


The patches would go via Michael's tree (same way as with the JIT itself
in the past), right?


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-23 Thread Daniel Borkmann

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao 


Thanks for adding support, Naveen, that's really great! I think 2) seems
fine as well in this context as prologue size can vary quite a bit here,
and depending on program types likelihood of tail call usage as well (but
I wouldn't expect deep nesting). Thanks a lot!


Re: [PATCH 3/3] bpf powerpc: add support for bpf constant blinding

2016-09-23 Thread Daniel Borkmann

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

In line with similar support for other architectures by Daniel Borkmann.

'MOD Default X' from test_bpf without constant blinding:
84 bytes emitted from JIT compiler (pass:3, flen:7)
d58a4688 + :
0:  nop
4:  nop
8:  std r27,-40(r1)
c:  std r28,-32(r1)
   10:  xor r8,r8,r8
   14:  xor r28,r28,r28
   18:  mr  r27,r3
   1c:  li  r8,66
   20:  cmpwi   r28,0
   24:  bne 0x0030
   28:  li  r8,0
   2c:  b   0x0044
   30:  divwu   r9,r8,r28
   34:  mullw   r9,r28,r9
   38:  subfr8,r9,r8
   3c:  rotlwi  r8,r8,0
   40:  li  r8,66
   44:  ld  r27,-40(r1)
   48:  ld  r28,-32(r1)
   4c:  mr  r3,r8
   50:  blr

... and with constant blinding:
140 bytes emitted from JIT compiler (pass:3, flen:11)
dbd6ab24 + :
0:  nop
4:  nop
8:  std r27,-40(r1)
c:  std r28,-32(r1)
   10:  xor r8,r8,r8
   14:  xor r28,r28,r28
   18:  mr  r27,r3
   1c:  lis r2,-22834
   20:  ori r2,r2,36083
   24:  rotlwi  r2,r2,0
   28:  xorir2,r2,36017
   2c:  xoris   r2,r2,42702
   30:  rotlwi  r2,r2,0
   34:  mr  r8,r2
   38:  rotlwi  r8,r8,0
   3c:  cmpwi   r28,0
   40:  bne 0x004c
   44:  li  r8,0
   48:  b   0x007c
   4c:  divwu   r9,r8,r28
   50:  mullw   r9,r28,r9
   54:  subfr8,r9,r8
   58:  rotlwi  r8,r8,0
   5c:  lis r2,-17137
   60:  ori r2,r2,39065
   64:  rotlwi  r2,r2,0
   68:  xorir2,r2,39131
   6c:  xoris   r2,r2,48399
   70:  rotlwi  r2,r2,0
   74:  mr  r8,r2
   78:  rotlwi  r8,r8,0
   7c:  ld  r27,-40(r1)
   80:  ld  r28,-32(r1)
   84:  mr  r3,r8
   88:  blr

Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH net 4/4] lib/test_bpf: Add additional BPF_ADD tests

2016-04-05 Thread Daniel Borkmann

On 04/05/2016 12:02 PM, Naveen N. Rao wrote:

Some of these tests proved useful with the powerpc eBPF JIT port due to
sign-extended 16-bit immediate loads. Though some of these aspects get
covered in other tests, it is better to have explicit tests so as to
quickly tag the precise problem.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Thanks for adding these!

Acked-by: Daniel Borkmann <dan...@iogearbox.net>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 3/4] lib/test_bpf: Add test to check for result of 32-bit add that overflows

2016-04-05 Thread Daniel Borkmann

On 04/05/2016 12:02 PM, Naveen N. Rao wrote:

BPF_ALU32 and BPF_ALU64 tests for adding two 32-bit values that results in
32-bit overflow.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 2/4] lib/test_bpf: Add tests for unsigned BPF_JGT

2016-04-05 Thread Daniel Borkmann

On 04/05/2016 12:02 PM, Naveen N. Rao wrote:

Unsigned Jump-if-Greater-Than.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 1/4] lib/test_bpf: Fix JMP_JSET tests

2016-04-05 Thread Daniel Borkmann

On 04/05/2016 12:02 PM, Naveen N. Rao wrote:

JMP_JSET tests incorrectly used BPF_JNE. Fix the same.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


---
  lib/test_bpf.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 27a7a26..e76fa4d 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -4303,7 +4303,7 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0),
BPF_LD_IMM64(R1, 3),
-   BPF_JMP_IMM(BPF_JNE, R1, 2, 1),
+   BPF_JMP_IMM(BPF_JSET, R1, 2, 1),
BPF_EXIT_INSN(),
BPF_ALU32_IMM(BPF_MOV, R0, 1),
BPF_EXIT_INSN(),
@@ -4317,7 +4317,7 @@ static struct bpf_test tests[] = {
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0),
BPF_LD_IMM64(R1, 3),
-   BPF_JMP_IMM(BPF_JNE, R1, 0x, 1),
+   BPF_JMP_IMM(BPF_JSET, R1, 0x, 1),
BPF_EXIT_INSN(),
BPF_ALU32_IMM(BPF_MOV, R0, 1),
BPF_EXIT_INSN(),
@@ -4474,7 +4474,7 @@ static struct bpf_test tests[] = {
BPF_ALU32_IMM(BPF_MOV, R0, 0),
BPF_LD_IMM64(R1, 3),
BPF_LD_IMM64(R2, 2),
-   BPF_JMP_REG(BPF_JNE, R1, R2, 1),
+   BPF_JMP_REG(BPF_JSET, R1, R2, 1),
BPF_EXIT_INSN(),
BPF_ALU32_IMM(BPF_MOV, R0, 1),
BPF_EXIT_INSN(),
@@ -4489,7 +4489,7 @@ static struct bpf_test tests[] = {
BPF_ALU32_IMM(BPF_MOV, R0, 0),
BPF_LD_IMM64(R1, 3),
BPF_LD_IMM64(R2, 0x),
-   BPF_JMP_REG(BPF_JNE, R1, R2, 1),
+   BPF_JMP_REG(BPF_JSET, R1, R2, 1),
BPF_EXIT_INSN(),
BPF_ALU32_IMM(BPF_MOV, R0, 1),
BPF_EXIT_INSN(),



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-04-01 Thread Daniel Borkmann

On 04/01/2016 08:10 PM, Alexei Starovoitov wrote:

On 4/1/16 2:58 AM, Naveen N. Rao wrote:

PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.

Enable with:
echo 1 > /proc/sys/net/core/bpf_jit_enable
or
echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code, as are the TODOs. Some of the prominent TODOs include
implementing BPF tail calls and skb loads.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/include/asm/ppc-opcode.h |  19 +-
  arch/powerpc/net/Makefile |   4 +
  arch/powerpc/net/bpf_jit.h|  66 ++-
  arch/powerpc/net/bpf_jit64.h  |  58 +++
  arch/powerpc/net/bpf_jit_comp64.c | 828 ++
  5 files changed, 973 insertions(+), 2 deletions(-)
  create mode 100644 arch/powerpc/net/bpf_jit64.h
  create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

...

-#ifdef CONFIG_PPC64
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)


impressive stuff!


+1, awesome to see another one!


Everything nicely documented. Could you add few words for the above
condition as well ?
Or may be a new macro, since it occurs many times?
What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ?
Will there ever be v3 ?


Minor TODO would also be to convert to use bpf_jit_binary_alloc() and
bpf_jit_binary_free() API for the image, which is done by other eBPF
jits, too.


So far most of the bpf jits were going via net-next tree, but if
in this case no changes to the core is necessary then I guess it's fine
to do it via powerpc tree. What's your plan?



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-03-31 Thread Daniel Borkmann

On 03/31/2016 07:46 PM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

While at it, fix some typos in the comment.

Cc: Alexei Starovoitov 
Cc: David S. Miller 
Cc: Ananth N Mavinakayanahalli 
Cc: Michael Ellerman 
Signed-off-by: Naveen N. Rao 
---
  samples/bpf/Makefile | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 502c9fc..88bc5a0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -76,16 +76,13 @@ HOSTLOADLIBES_offwaketime += -lelf
  HOSTLOADLIBES_spintest += -lelf
  HOSTLOADLIBES_map_perf_test += -lelf -lrt

-# point this to your LLVM backend with bpf support
-LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
-
-# asm/sysreg.h inline assmbly used by it is incompatible with llvm.
-# But, ehere is not easy way to fix it, so just exclude it since it is
+# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
+# But, there is no easy way to fix it, so just exclude it since it is
  # useless for BPF samples.
  $(obj)/%.o: $(src)/%.c
  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s


that was a workaround when clang/llvm didn't have bpf support.
Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove
manual calls to llc completely.
Just use 'clang -target bpf -O2 -D... -c $< -o $@'


+1, the clang part in that Makefile should also more correctly be called
with '-target bpf' as it turns out (despite llc with '-march=bpf' ...).
Better to use clang directly as suggested by Alexei.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] net: filter: make JITs zero A for SKF_AD_ALU_XOR_X

2016-01-05 Thread Daniel Borkmann

On 01/05/2016 05:03 PM, Rabin Vincent wrote:

On Tue, Jan 05, 2016 at 08:00:45AM -0800, Eric Dumazet wrote:

On Tue, 2016-01-05 at 16:23 +0100, Rabin Vincent wrote:

The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.


Is x86_64 part of this 'All' subset ? ;)


No, because it's an eBPF JIT.


Correct, filter conversion to eBPF clears it already.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] net: filter: make JITs zero A for SKF_AD_ALU_XOR_X

2016-01-05 Thread Daniel Borkmann

On 01/05/2016 04:23 PM, Rabin Vincent wrote:

The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <ra...@rab.in>


Excellent catch, thanks a lot! The fix looks good to me and should
go to -net tree.

Acked-by: Daniel Borkmann <dan...@iogearbox.net>

If you're interested, feel free to add a small test case for the
SKF_AD_ALU_XOR_X issue to lib/test_bpf.c for -net-next tree. Thanks!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32

2015-02-16 Thread Daniel Borkmann

On 02/16/2015 08:13 AM, Denis Kirjanov wrote:
...

Well, I don't see significant challenges to enable eBPF on ppc64 in the future.
I'll start working on it after I get this merged


Cool, awesome!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32

2015-02-15 Thread Daniel Borkmann

On 02/15/2015 07:06 PM, Denis Kirjanov wrote:

This patch series enables BPF JIT on ppc32. There are relatevily
few chnages in the code to make it work.

All test_bpf tests passed both on 7447a and P2041-based machines.


I'm just wondering, next to the feedback that has already been
provided, would opening this up for ppc32 make it significantly
more difficult in future to migrate from classic BPF JIT to eBPF
JIT eventually (which is what we want long-term)? Being curious,
is there any ongoing effort from ppc people?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

ppc64 and Documentation/mic/mpssd/mpssd.c issues

2014-12-04 Thread Daniel Borkmann

Hi Caz,

I think commit ...

commit 8d49751580db804a02caf6a5b7cebe2ff26c0d7e
Author: Caz Yokoyama caz.yokoy...@intel.com
Date:   Thu Sep 5 16:42:39 2013 -0700

Sample Implementation of Intel MIC User Space Daemon.

... actually triggers a build error on my default config
ppc64 (I'm using net-next tree):

  HOSTCC  Documentation/mic/mpssd/mpssd.o
Documentation/mic/mpssd/mpssd.c:93: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:96: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:113: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:116: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:119: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:146: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:149: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:151: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:152: error: braced-group within expression 
allowed only inside a function
make[3]: *** [Documentation/mic/mpssd/mpssd.o] Error 1
make[2]: *** [Documentation/mic/mpssd] Error 2
make[1]: *** [Documentation/mic] Error 2
make: *** [vmlinux] Error 2

gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)

Cheers,
Daniel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: ppc64 and Documentation/mic/mpssd/mpssd.c issues

2014-12-04 Thread Daniel Borkmann

On 12/04/2014 05:57 PM, Sudeep Dutt wrote:

On Thu, 2014-12-04 at 08:48 -0800, Yokoyama, Caz wrote:

Thank you for the report.
Have you tried to compile on x86_64? Which rhel do you use? Rhel6.X?
MIC card is expected to work with x86_64 host, not with ppc64. We have never 
compiled on ppc host while our primary development platform was rhel6.X.


Yep, I was using RHEL6 on ppc64 by chance to build an upstream kernel.

You might want to fix the Kconfig entry then to limit this to x86_64 only. ;)

Thanks,
Daniel


I am adding Peter Foley since he enabled the mpssd build by default
recently with commit adb19fb. Peter, is there a way to enable the mpssd
build only for x86_64?

Thanks,
Sudeep Dutt


Line 92 is the first use of htole16 and MIC_VRING_ENTRIES. Is there any change 
on them?

Thank you.
-caz, caz.yokoy...@intel.com, yokoy...@member.fsf.org


-Original Message-
From: Daniel Borkmann [mailto:dbork...@redhat.com]
Sent: Thursday, December 04, 2014 5:42 AM
To: Yokoyama, Caz
Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
Subject: ppc64 and Documentation/mic/mpssd/mpssd.c issues

Hi Caz,

I think commit ...

commit 8d49751580db804a02caf6a5b7cebe2ff26c0d7e
Author: Caz Yokoyama caz.yokoy...@intel.com
Date:   Thu Sep 5 16:42:39 2013 -0700

  Sample Implementation of Intel MIC User Space Daemon.

... actually triggers a build error on my default config
ppc64 (I'm using net-next tree):

HOSTCC  Documentation/mic/mpssd/mpssd.o
Documentation/mic/mpssd/mpssd.c:93: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:96: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:113: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:116: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:119: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:146: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:149: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:151: error: braced-group within expression 
allowed only inside a function
Documentation/mic/mpssd/mpssd.c:152: error: braced-group within expression 
allowed only inside a function
make[3]: *** [Documentation/mic/mpssd/mpssd.o] Error 1
make[2]: *** [Documentation/mic/mpssd] Error 2
make[1]: *** [Documentation/mic] Error 2
make: *** [vmlinux] Error 2

gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)

Cheers,
Daniel




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-11-01 Thread Daniel Borkmann

On 11/01/2014 07:00 PM, David Miller wrote:

From: Denis Kirjanov k...@linux-powerpc.org
Date: Sat, 1 Nov 2014 21:49:27 +0400


David, you need a feedback from other guys to apply this patch, right?

Alexei wanted some output before/after the patch.
Michael Ellerman wanted the explanation what a BPF_ANC | SKF_AD_PKTTYPE means.


The BPF_ANC | SKF_AD_PKTTYPE case means that this is an ancillary
operation aka BPF extension which loads the value of skb-pkt_type
into the accumulator.

A similar transformation, that is, from BPF into eBPF insns can be
found in convert_bpf_extensions() in the SKF_AD_PKTTYPE case, or
commit 709f6c58d4dc (sparc: bpf_jit: add SKF_AD_PKTTYPE support
to JIT) that recently enabled the same in sparc.


So I'm waiting  the ack/nack from them...


I don't really think performance metrics are necessary just for adding
SKF_AD_PKTTYPE support, that's sort of an over the top requirement
if you ask me.


Right, lib/test_bpf.c actually brings the quoted output w/ numbers
for free. I think the important point was that the 'After:' case
with ``echo 1  /proc/sys/net/core/bpf_jit_enable'' runs through for
that test case, which has been shown here.


It's pretty obvious that we should support as many operations as
possible to each JIT, because all of program has to do is use that
unsupported opcode and then we have none of that program being JIT'd.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-11-01 Thread Daniel Borkmann

On 10/31/2014 07:09 AM, Denis Kirjanov wrote:

On 10/30/14, Alexei Starovoitov alexei.starovoi...@gmail.com wrote:

On Wed, Oct 29, 2014 at 11:12 PM, Denis Kirjanov k...@linux-powerpc.org
wrote:

Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load
skb-pkt_type field.

Before:
[   88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS
[   88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS

After:
[   80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS
[   80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS


if you'd only quoted #12, it would all make sense ;)
but #11 test is not using PKTTYPE. So your patch shouldn't
make a difference. Are these numbers with JIT on and off?


Right.


Ok.

Please mention this in future log messages, as it was not quite
clear that before was actually with JIT off, and after was
with JIT on.

One could have read it that actually both cases were with JIT on,
and thus the inconsistent result for LD_IND_NET is a bit confusing
since you've quoted it here as well.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test

2014-06-27 Thread Daniel Borkmann

On 06/27/2014 07:08 AM, Michael Ellerman wrote:

On Thu, 2014-06-26 at 10:36 +0200, Daniel Borkmann wrote:

On 06/26/2014 10:30 AM, Michael Ellerman wrote:

On Wed, 2014-06-25 at 21:34 +0400, Denis Kirjanov wrote:

We have to return the boolean here if the tag presents
or not, not just ANDing the TCI with the mask which results to:

[  709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT
[  709.412245] ret 4096 != 1
[  709.412332] ret 4096 != 1
[  709.412333] FAIL (2 times)


Hi Denis,

Is this the same version of test_bpf that is in Linus' tree?

I don't see any fails with that version, am I missing something?

[  187.690640] test_bpf: #18 LD_VLAN_TAG_PRESENT 46 50 PASS


Did you try to modprobe test_bpf after enabling JIT, i.e. :

echo 1  /proc/sys/net/core/bpf_jit_enable

Last time I tested with ppc64, I saw the interpreter passing
while JIT failed, which the fix above should address.


Right, I thought CONFIG_BPF_JIT=y was sufficient.

But that just makes it available, I need to *also* enable it at runtime.


Yes, it's an extra runtime knob which still needs to be enabled by
the admin for security reasons. This knob also allows you to enable
the debugging interface `echo 2 /proc/sys/net/core/bpf_jit_enable`
where the disassembly can be read out via: tools/net/bpf_jit_disasm.c
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test

2014-06-26 Thread Daniel Borkmann

On 06/26/2014 10:30 AM, Michael Ellerman wrote:

On Wed, 2014-06-25 at 21:34 +0400, Denis Kirjanov wrote:

We have to return the boolean here if the tag presents
or not, not just ANDing the TCI with the mask which results to:

[  709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT
[  709.412245] ret 4096 != 1
[  709.412332] ret 4096 != 1
[  709.412333] FAIL (2 times)


Hi Denis,

Is this the same version of test_bpf that is in Linus' tree?

I don't see any fails with that version, am I missing something?

   [  187.690640] test_bpf: #18 LD_VLAN_TAG_PRESENT 46 50 PASS


Did you try to modprobe test_bpf after enabling JIT, i.e. :

  echo 1  /proc/sys/net/core/bpf_jit_enable

Last time I tested with ppc64, I saw the interpreter passing
while JIT failed, which the fix above should address.

Cheers,

Daniel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc: bpf_jit: support MOD operation

2013-09-04 Thread Daniel Borkmann

On 09/03/2013 10:52 PM, Daniel Borkmann wrote:

On 09/03/2013 09:58 PM, Vladimir Murzin wrote:

[...]

Do you have a test case/suite by any chance ?

Ben.



Hi Ben!

Thanks for your feedback.

This patch is only compile tested. I have no real hardware, but I'll
probably bring up qemu ppc64 till end of the week...
Meanwhile, I've made simple how-to for testing. You can use it if you wish.
It is mainly based on the [1] and rechecked on x86-64.


Please also cc netdev on BPF related changes.

Actually, your test plan can be further simplified ...

For retrieving and disassembling the JIT image, we have bpf_jit_disasm [1].

  1) echo 2  /proc/sys/net/core/bpf_jit_enable
  2) ... attach filter ...
  3) bpf_jit_disasm -o

For generating a simple stupid test filter, you can use bpfc [2] (also
see its man page). E.g. ...

   # cat blub
   ldi #10
   mod #8
   ret a
   # bpfc blub
   { 0x0, 0, 0, 0x000a },
   { 0x94, 0, 0, 0x0008 },
   { 0x16, 0, 0, 0x },


Plus something like ...

ldxi #0
mod x
ret a

For longer-term testing, also trinity has BPF support. ;)


And load this array e.g. either into a small C program that attaches this
as BPF filter, or simply do bpfc blub  blub2 and run netsniff-ng -f blub2\
-s -i eth0, that should also do it.

Then, when attached, the kernel should truncate incoming frames for pf_packet
into max length of 2, just as an example.

   [1] kernel tree, tools/net/bpf_jit_disasm.c
   [2] git clone git://github.com/borkmann/netsniff-ng.git

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] ppc: bpf_jit: support MOD operation

2013-09-03 Thread Daniel Borkmann

On 09/03/2013 09:58 PM, Vladimir Murzin wrote:

On Tue, Sep 03, 2013 at 06:45:50AM +1000, Benjamin Herrenschmidt wrote:

On Mon, 2013-09-02 at 19:48 +0200, Vladimir Murzin wrote:

Ping

On Wed, Aug 28, 2013 at 02:49:52AM +0400, Vladimir Murzin wrote:

commit b6069a9570 (filter: add MOD operation) added generic
support for modulus operation in BPF.


Sorry, nobody got a chance to review that yet. Unfortunately Matt
doesn't work for us anymore and none of us has experience with the
BPF code, so somebody (possibly me) will need to spend a bit of time
figuring it out before verifying that is correct.

Do you have a test case/suite by any chance ?

Ben.



Hi Ben!

Thanks for your feedback.

This patch is only compile tested. I have no real hardware, but I'll
probably bring up qemu ppc64 till end of the week...
Meanwhile, I've made simple how-to for testing. You can use it if you wish.
It is mainly based on the [1] and rechecked on x86-64.


Please also cc netdev on BPF related changes.

Actually, your test plan can be further simplified ...

For retrieving and disassembling the JIT image, we have bpf_jit_disasm [1].

 1) echo 2  /proc/sys/net/core/bpf_jit_enable
 2) ... attach filter ...
 3) bpf_jit_disasm -o

For generating a simple stupid test filter, you can use bpfc [2] (also
see its man page). E.g. ...

  # cat blub
  ldi #10
  mod #8
  ret a
  # bpfc blub
  { 0x0, 0, 0, 0x000a },
  { 0x94, 0, 0, 0x0008 },
  { 0x16, 0, 0, 0x },

And load this array e.g. either into a small C program that attaches this
as BPF filter, or simply do bpfc blub  blub2 and run netsniff-ng -f blub2\
-s -i eth0, that should also do it.

Then, when attached, the kernel should truncate incoming frames for pf_packet
into max length of 2, just as an example.

  [1] kernel tree, tools/net/bpf_jit_disasm.c
  [2] git clone git://github.com/borkmann/netsniff-ng.git


1. get the tcpdump utility (git clone git://bpf.tcpdump.org/tcpdump)
2. get the libcap library (git clone git://bpf.tcpdump.org/libpcap)
2.1. apply patch for libcap [2] (against libcap-1.3 branch)
2.2. build libcap (./configure  make  ln -s libcap.so.1.3.0 libcap.so)
3. build tcpdump (LDFLAGS=-L/path/to/libcap ./configure  make)
4. run

# ./tcpdump -d (ip[2:2] - 20) % 5 != 0  ip[6]  0x20 = 0x20
(000) ldh [14]
(001) jeq #0x800 jt 2 jf 10
(002) ldh [18]
(003) sub #20
(004) mod #5
(005) jeq #0x0 jt 10 jf 6
(006) ldb [22]
(007) and #0x20
(008) jeq #0x20 jt 9 jf 10
(009) ret #65535
(010) ret #0

to get pseudo code (we are interested the most into line #4)

5. enable bpf jit compiler

# echo 2  /proc/sys/net/core/bpf_jit_enable

6. run

./tcpdump -nv (ip[2:2] - 20) % 5 != 0  ip[6]  0x20 = 0x20

7. check dmesg for lines starting with (output for x86-64 is provided as an 
example)

[ 3768.329253] flen=11 proglen=99 pass=3 image=a003c000
[ 3768.329254] JIT code: a003c000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 
44 8b 4f 60
[ 3768.329255] JIT code: a003c010: 44 2b 4f 64 4c 8b 87 c0 00 00 00 0f 
b7 47 76 86
[ 3768.329256] JIT code: a003c020: c4 3d 00 08 00 00 75 37 be 02 00 00 
00 e8 9f 3e
[ 3768.329257] JIT code: a003c030: 02 e1 83 e8 14 31 d2 b9 05 00 00 00 
f7 f1 89 d0
[ 3768.329258] JIT code: a003c040: 85 c0 74 1b be 06 00 00 00 e8 9f 3e 
02 e1 25 20
[ 3768.329259] JIT code: a003c050: 00 00 00 83 f8 20 75 07 b8 ff ff 00 
00 eb 02 31
[ 3768.329259] JIT code: a003c060: c0 c9 c3

8. make sure generated opcodes (JIT code) implement pseudo code form step 4.

Reference
[1] http://comments.gmane.org/gmane.linux.network/242456
[2] http://permalink.gmane.org/gmane.network.tcpdump.devel/5973

P.S.
I hope net people will corect me if I'm wrong there

Cheers
Vladimir Murzin


This patch brings JIT support for PPC64

Signed-off-by: Vladimir Murzin murzi...@gmail.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: fix ics_rtas_init and start_secondary section mismatch

2013-02-05 Thread Daniel Borkmann
It seems, we're fine with just annotating the two functions.
Thus, this fixes the following build warnings on ppc64:

WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1664):
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.

WARNING: arch/powerpc/sysdev/built-in.o(.text+0x6044):
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.

WARNING: arch/powerpc/kernel/built-in.o(.text+0x2db30):
The function .start_secondary() references
the function __cpuinit .vdso_getcpu_init().
This is often because .start_secondary lacks a __cpuinit
annotation or the annotation of .vdso_getcpu_init is wrong.

Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Daniel Borkmann dbork...@redhat.com
---
 Note: compile-tested only!

 arch/powerpc/kernel/smp.c   |2 +-
 arch/powerpc/sysdev/xics/ics-rtas.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 793401e..76bd9da 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -610,7 +610,7 @@ static struct device_node *cpu_to_l2cache(int cpu)
 }
 
 /* Activate a secondary processor. */
-void start_secondary(void *unused)
+__cpuinit void start_secondary(void *unused)
 {
unsigned int cpu = smp_processor_id();
struct device_node *l2_cache;
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c 
b/arch/powerpc/sysdev/xics/ics-rtas.c
index c782f85..936575d 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -213,7 +213,7 @@ static int ics_rtas_host_match(struct ics *ics, struct 
device_node *node)
return !of_device_is_compatible(node, chrp,iic);
 }
 
-int ics_rtas_init(void)
+__init int ics_rtas_init(void)
 {
ibm_get_xive = rtas_token(ibm,get-xive);
ibm_set_xive = rtas_token(ibm,set-xive);
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev