Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Hi David,

On Tue, Aug 22, 2017 at 10:02 PM, David Miller  wrote:
>
> You posted this 4 times. :-(
>
> I hope I applied the right one.

All 4 of these are the same patch. I mistakenly sent it 4 times. My
apologies for that.
>
> Go check net-next and please send me any necessary fix up patches.
I just checked. Its the correct patch.

Thanks a lot David. :)


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Hi David,

On Tue, Aug 22, 2017 at 10:02 PM, David Miller  wrote:
>
> You posted this 4 times. :-(
>
> I hope I applied the right one.

All 4 of these are the same patch. I mistakenly sent it 4 times. My
apologies for that.
>
> Go check net-next and please send me any necessary fix up patches.
I just checked. Its the correct patch.

Thanks a lot David. :)


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread David Miller

You posted this 4 times. :-(

I hope I applied the right one.

Go check net-next and please send me any necessary fix up patches.

Thanks.


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread David Miller

You posted this 4 times. :-(

I hope I applied the right one.

Go check net-next and please send me any necessary fix up patches.

Thanks.


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 05:08 PM, Daniel Borkmann wrote:

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?


Hm, okay, it's for generating the out jmp offsets in
tail call emission which are supposed to always be the
same relative offsets; should be fine then.


Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+/* out: */
+if (out_offset == -1)
+out_offset = cur_offset;
+if (cur_offset != out_offset) {
+pr_err_once("tail_call out_offset = %d, expected %d!\n",
+cur_offset, out_offset);
+return -1;
+}
+return 0;
+#undef cur_offset
+#undef jmp_offset
  }




Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 05:08 PM, Daniel Borkmann wrote:

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?


Hm, okay, it's for generating the out jmp offsets in
tail call emission which are supposed to always be the
same relative offsets; should be fine then.


Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+/* out: */
+if (out_offset == -1)
+out_offset = cur_offset;
+if (cur_offset != out_offset) {
+pr_err_once("tail_call out_offset = %d, expected %d!\n",
+cur_offset, out_offset);
+return -1;
+}
+return 0;
+#undef cur_offset
+#undef jmp_offset
  }




Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?

Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+   const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+   /* out: */
+   if (out_offset == -1)
+   out_offset = cur_offset;
+   if (cur_offset != out_offset) {
+   pr_err_once("tail_call out_offset = %d, expected %d!\n",
+   cur_offset, out_offset);
+   return -1;
+   }
+   return 0;
+#undef cur_offset
+#undef jmp_offset
  }


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?

Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+   const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+   /* out: */
+   if (out_offset == -1)
+   out_offset = cur_offset;
+   if (cur_offset != out_offset) {
+   pr_err_once("tail_call out_offset = %d, expected %d!\n",
+   cur_offset, out_offset);
+   return -1;
+   }
+   return 0;
+#undef cur_offset
+#undef jmp_offset
  }


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Please ignore this mail. Sent it by mistake.
Sent the correct patch later on.


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Please ignore this mail. Sent it by mistake.
Sent the correct patch later on.


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Russell, David, Alexei, Daniel and Kees. Please check this patch and
lets finish it.

Thanks.


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
Russell, David, Alexei, Daniel and Kees. Please check this patch and
lets finish it.

Thanks.


[PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
The JIT compiler emits ARM 32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF core.

This patch is essentially changing the current implementation of JIT compiler
of Berkeley Packet Filter from classic to internal with almost all
instructions from eBPF ISA supported except the following
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
ARM because of deficiency of general purpose registers on ARM. Currently,
only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.

Tested on ARMv7 with QEMU by me (Shubham Bansal).

Testing results on ARMv7:

1) test_bpf: Summary: 341 PASSED, 0 FAILED, [312/333 JIT'ed]
2) test_tag: OK (40945 tests)
3) test_progs: Summary: 30 PASSED, 0 FAILED
4) test_lpm: OK
5) test_lru_map: OK

Above tests are all done with following flags enabled discreatly.

1) bpf_jit_enable=1
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled
2) bpf_jit_enable=1 and bpf_jit_harden=2
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled

See Documentation/networking/filter.txt for more information.

Signed-off-by: Shubham Bansal 
---
v2
- Solved many bugs for ARMv5 and ARMv6 support.

v3
- Implemented BPF_JMP | BPF_CALL.
- Changed tail call opcode to BPF_JMP | BPF_TAIL_CALL.
- Filled prog->jited_len value.
- Implemented BPF_JLT, BPF_JLE, BPF_JSLT, BPF_JSLE.
- Solved many bugs noticed by bpf selftests.

v4
- Proper ENDIANNESS handling in kconfig.
- Changed alignment from 16 bytes to 4 bytes.
- Setup ctx->stack_size for optimization.
- Removed unnecessary bpf_jit_free from the JIT compiler.
---
 arch/arm/Kconfig  |2 +-
 arch/arm/net/bpf_jit_32.c | 2448 ++---
 arch/arm/net/bpf_jit_32.h |  108 +-
 3 files changed, 1747 insertions(+), 811 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..f1b3f1d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..c10 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
 /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
  *
+ * Copyright (c) 2017 Shubham Bansal 
  * Copyright (c) 2011 Mircea Gherzan 
  *
  * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
  * Free Software Foundation; version 2 of the License.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -18,54 +20,101 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
 #include "bpf_jit_32.h"
 
+int bpf_jit_enable __read_mostly;
+
+#define STACK_OFFSET(k)(k)
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
+#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
+
+/* Flags used for JIT optimization */
+#define SEEN_CALL  (1 << 0)
+
+#define FLAG_IMM_OVERFLOW  (1 << 0)
+
 /*
- * ABI:
+ * Map eBPF registers to ARM 32bit registers or stack scratch space.
+ *
+ * 1. First argument is passed using the arm 32bit registers and rest of the
+ * arguments are passed on stack scratch space.
+ * 2. First callee-saved arugument is mapped to arm 32 bit registers and rest
+ * arguments are mapped to scratch space on stack.
+ * 3. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ *
+ * As the eBPF registers are all 64 bit registers and arm has only 32 bit
+ * registers, we have to map each eBPF registers with two arm 32 bit regs or
+ * scratch memory space and we have to build eBPF 64 bit register from those.
  *
- * r0  scratch register
- * r4  BPF register A
- * r5  BPF register X
- * r6  pointer to the skb
- * r7  skb->data
- * r8  skb_headlen(skb)
  */
+static const u8 bpf2a32[][2] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = {ARM_R1, ARM_R0},
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = {ARM_R3, ARM_R2},
+   /* Stored on stack scratch space */
+   [BPF_REG_2] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+   [BPF_REG_3] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+   

[PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
The JIT compiler emits ARM 32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF core.

This patch is essentially changing the current implementation of JIT compiler
of Berkeley Packet Filter from classic to internal with almost all
instructions from eBPF ISA supported except the following
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
ARM because of deficiency of general purpose registers on ARM. Currently,
only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.

Tested on ARMv7 with QEMU by me (Shubham Bansal).

Testing results on ARMv7:

1) test_bpf: Summary: 341 PASSED, 0 FAILED, [312/333 JIT'ed]
2) test_tag: OK (40945 tests)
3) test_progs: Summary: 30 PASSED, 0 FAILED
4) test_lpm: OK
5) test_lru_map: OK

Above tests are all done with following flags enabled discreatly.

1) bpf_jit_enable=1
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled
2) bpf_jit_enable=1 and bpf_jit_harden=2
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled

See Documentation/networking/filter.txt for more information.

Signed-off-by: Shubham Bansal 
---
v2
- Solved many bugs for ARMv5 and ARMv6 support.

v3
- Implemented BPF_JMP | BPF_CALL.
- Changed tail call opcode to BPF_JMP | BPF_TAIL_CALL.
- Filled prog->jited_len value.
- Implemented BPF_JLT, BPF_JLE, BPF_JSLT, BPF_JSLE.
- Solved many bugs noticed by bpf selftests.

v4
- Proper ENDIANNESS handling in kconfig.
- Changed alignment from 16 bytes to 4 bytes.
- Setup ctx->stack_size for optimization.
- Removed unnecessary bpf_jit_free from the JIT compiler.
---
 arch/arm/Kconfig  |2 +-
 arch/arm/net/bpf_jit_32.c | 2448 ++---
 arch/arm/net/bpf_jit_32.h |  108 +-
 3 files changed, 1747 insertions(+), 811 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..f1b3f1d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..c10 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
 /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
  *
+ * Copyright (c) 2017 Shubham Bansal 
  * Copyright (c) 2011 Mircea Gherzan 
  *
  * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
  * Free Software Foundation; version 2 of the License.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -18,54 +20,101 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
 #include "bpf_jit_32.h"
 
+int bpf_jit_enable __read_mostly;
+
+#define STACK_OFFSET(k)(k)
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
+#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
+
+/* Flags used for JIT optimization */
+#define SEEN_CALL  (1 << 0)
+
+#define FLAG_IMM_OVERFLOW  (1 << 0)
+
 /*
- * ABI:
+ * Map eBPF registers to ARM 32bit registers or stack scratch space.
+ *
+ * 1. First argument is passed using the arm 32bit registers and rest of the
+ * arguments are passed on stack scratch space.
+ * 2. First callee-saved arugument is mapped to arm 32 bit registers and rest
+ * arguments are mapped to scratch space on stack.
+ * 3. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ *
+ * As the eBPF registers are all 64 bit registers and arm has only 32 bit
+ * registers, we have to map each eBPF registers with two arm 32 bit regs or
+ * scratch memory space and we have to build eBPF 64 bit register from those.
  *
- * r0  scratch register
- * r4  BPF register A
- * r5  BPF register X
- * r6  pointer to the skb
- * r7  skb->data
- * r8  skb_headlen(skb)
  */
+static const u8 bpf2a32[][2] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = {ARM_R1, ARM_R0},
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = {ARM_R3, ARM_R2},
+   /* Stored on stack scratch space */
+   [BPF_REG_2] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+   [BPF_REG_3] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+   [BPF_REG_4] = {STACK_OFFSET(16), STACK_OFFSET(20)},
+   [BPF_REG_5] = 

[PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
The JIT compiler emits ARM 32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF core.

This patch is essentially changing the current implementation of JIT compiler
of Berkeley Packet Filter from classic to internal with almost all
instructions from eBPF ISA supported except the following
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
ARM because of deficiency of general purpose registers on ARM. Currently,
only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.

Tested on ARMv7 with QEMU by me (Shubham Bansal).

Testing results on ARMv7:

1) test_bpf: Summary: 341 PASSED, 0 FAILED, [312/333 JIT'ed]
2) test_tag: OK (40945 tests)
3) test_progs: Summary: 30 PASSED, 0 FAILED
4) test_lpm: OK
5) test_lru_map: OK

Above tests are all done with following flags enabled discreatly.

1) bpf_jit_enable=1
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled
2) bpf_jit_enable=1 and bpf_jit_harden=2
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled

See Documentation/networking/filter.txt for more information.

Signed-off-by: Shubham Bansal 
---
v2
- Solved many bugs for ARMv5 and ARMv6 support.

v3
- Implemented BPF_JMP | BPF_CALL.
- Changed tail call opcode to BPF_JMP | BPF_TAIL_CALL.
- Filled prog->jited_len value.
- Implemented BPF_JLT, BPF_JLE, BPF_JSLT, BPF_JSLE.
- Solved many bugs noticed by bpf selftests.

v4
- Proper ENDIANNESS handling in kconfig.
- Changed alignment from 16 bytes to 4 bytes.
- Setup ctx->stack_size for optimization.
- Removed unnecessary bpf_jit_free from the JIT compiler.
---
 arch/arm/Kconfig  |2 +-
 arch/arm/net/bpf_jit_32.c | 2448 ++---
 arch/arm/net/bpf_jit_32.h |  108 +-
 3 files changed, 1747 insertions(+), 811 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..f1b3f1d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..c10 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
 /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
  *
+ * Copyright (c) 2017 Shubham Bansal 
  * Copyright (c) 2011 Mircea Gherzan 
  *
  * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
  * Free Software Foundation; version 2 of the License.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -18,54 +20,101 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
 #include "bpf_jit_32.h"
 
+int bpf_jit_enable __read_mostly;
+
+#define STACK_OFFSET(k)(k)
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
+#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
+
+/* Flags used for JIT optimization */
+#define SEEN_CALL  (1 << 0)
+
+#define FLAG_IMM_OVERFLOW  (1 << 0)
+
 /*
- * ABI:
+ * Map eBPF registers to ARM 32bit registers or stack scratch space.
+ *
+ * 1. First argument is passed using the arm 32bit registers and rest of the
+ * arguments are passed on stack scratch space.
+ * 2. First callee-saved arugument is mapped to arm 32 bit registers and rest
+ * arguments are mapped to scratch space on stack.
+ * 3. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ *
+ * As the eBPF registers are all 64 bit registers and arm has only 32 bit
+ * registers, we have to map each eBPF registers with two arm 32 bit regs or
+ * scratch memory space and we have to build eBPF 64 bit register from those.
  *
- * r0  scratch register
- * r4  BPF register A
- * r5  BPF register X
- * r6  pointer to the skb
- * r7  skb->data
- * r8  skb_headlen(skb)
  */
+static const u8 bpf2a32[][2] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = {ARM_R1, ARM_R0},
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = {ARM_R3, ARM_R2},
+   /* Stored on stack scratch space */
+   [BPF_REG_2] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+   [BPF_REG_3] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+   

[PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Shubham Bansal
The JIT compiler emits ARM 32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF core.

This patch is essentially changing the current implementation of JIT compiler
of Berkeley Packet Filter from classic to internal with almost all
instructions from eBPF ISA supported except the following
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit
ARM because of deficiency of general purpose registers on ARM. Currently,
only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler.

Tested on ARMv7 with QEMU by me (Shubham Bansal).

Testing results on ARMv7:

1) test_bpf: Summary: 341 PASSED, 0 FAILED, [312/333 JIT'ed]
2) test_tag: OK (40945 tests)
3) test_progs: Summary: 30 PASSED, 0 FAILED
4) test_lpm: OK
5) test_lru_map: OK

Above tests are all done with following flags enabled discreatly.

1) bpf_jit_enable=1
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled
2) bpf_jit_enable=1 and bpf_jit_harden=2
a) CONFIG_FRAME_POINTER enabled
b) CONFIG_FRAME_POINTER disabled

See Documentation/networking/filter.txt for more information.

Signed-off-by: Shubham Bansal 
---
v2
- Solved many bugs for ARMv5 and ARMv6 support.

v3
- Implemented BPF_JMP | BPF_CALL.
- Changed tail call opcode to BPF_JMP | BPF_TAIL_CALL.
- Filled prog->jited_len value.
- Implemented BPF_JLT, BPF_JLE, BPF_JSLT, BPF_JSLE.
- Solved many bugs noticed by bpf selftests.

v4
- Proper ENDIANNESS handling in kconfig.
- Changed alignment from 16 bytes to 4 bytes.
- Setup ctx->stack_size for optimization.
- Removed unnecessary bpf_jit_free from the JIT compiler.
---
 arch/arm/Kconfig  |2 +-
 arch/arm/net/bpf_jit_32.c | 2448 ++---
 arch/arm/net/bpf_jit_32.h |  108 +-
 3 files changed, 1747 insertions(+), 811 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..f1b3f1d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..c10 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
 /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
  *
+ * Copyright (c) 2017 Shubham Bansal 
  * Copyright (c) 2011 Mircea Gherzan 
  *
  * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
  * Free Software Foundation; version 2 of the License.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -18,54 +20,101 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
 #include "bpf_jit_32.h"
 
+int bpf_jit_enable __read_mostly;
+
+#define STACK_OFFSET(k)(k)
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
+#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
+
+/* Flags used for JIT optimization */
+#define SEEN_CALL  (1 << 0)
+
+#define FLAG_IMM_OVERFLOW  (1 << 0)
+
 /*
- * ABI:
+ * Map eBPF registers to ARM 32bit registers or stack scratch space.
+ *
+ * 1. First argument is passed using the arm 32bit registers and rest of the
+ * arguments are passed on stack scratch space.
+ * 2. First callee-saved arugument is mapped to arm 32 bit registers and rest
+ * arguments are mapped to scratch space on stack.
+ * 3. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ *
+ * As the eBPF registers are all 64 bit registers and arm has only 32 bit
+ * registers, we have to map each eBPF registers with two arm 32 bit regs or
+ * scratch memory space and we have to build eBPF 64 bit register from those.
  *
- * r0  scratch register
- * r4  BPF register A
- * r5  BPF register X
- * r6  pointer to the skb
- * r7  skb->data
- * r8  skb_headlen(skb)
  */
+static const u8 bpf2a32[][2] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = {ARM_R1, ARM_R0},
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = {ARM_R3, ARM_R2},
+   /* Stored on stack scratch space */
+   [BPF_REG_2] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+   [BPF_REG_3] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+   [BPF_REG_4] = {STACK_OFFSET(16), STACK_OFFSET(20)},
+   [BPF_REG_5] =