Re: [PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2022-03-02 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 27/07/2021 à 08:55, Jordan Niethe a écrit :

Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.

Before this patch, the test #359 ADD default X is:
0:   nop
4:   nop
8:   std r27,-40(r1)
c:   std r28,-32(r1)
   10:   xor r8,r8,r8
   14:   rotlwi  r8,r8,0
   18:   xor r28,r28,r28
   1c:   rotlwi  r28,r28,0
   20:   mr  r27,r3
   24:   li  r8,66
   28:   add r8,r8,r28
   2c:   rotlwi  r8,r8,0
   30:   ld  r27,-40(r1)
   34:   ld  r28,-32(r1)
   38:   mr  r3,r8
   3c:   blr

After this patch, the same test has become:
0:   nop
4:   nop
8:   xor r8,r8,r8
c:   rotlwi  r8,r8,0
   10:   xor r5,r5,r5
   14:   rotlwi  r5,r5,0
   18:   mr  r4,r3
   1c:   li  r8,66
   20:   add r8,r8,r5
   24:   rotlwi  r8,r8,0
   28:   mr  r3,r8
   2c:   blr

Signed-off-by: Jordan Niethe 


If this series is still applicable, it needs to be rebased of Naveen's 
series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=286000


Thanks for bringing this up. My apologies - I missed copying you and 
Jordan on the new series.


I have included the first patch and a variant of the second patch in 
this series, in the new series I posted. For patch 3/3, it might be 
simpler to not track temp register usage on ppc64.



Thanks,
Naveen



Re: [PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2022-02-22 Thread Christophe Leroy




Le 27/07/2021 à 08:55, Jordan Niethe a écrit :

Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.

Before this patch, the test #359 ADD default X is:
0:   nop
4:   nop
8:   std r27,-40(r1)
c:   std r28,-32(r1)
   10:   xor r8,r8,r8
   14:   rotlwi  r8,r8,0
   18:   xor r28,r28,r28
   1c:   rotlwi  r28,r28,0
   20:   mr  r27,r3
   24:   li  r8,66
   28:   add r8,r8,r28
   2c:   rotlwi  r8,r8,0
   30:   ld  r27,-40(r1)
   34:   ld  r28,-32(r1)
   38:   mr  r3,r8
   3c:   blr

After this patch, the same test has become:
0:   nop
4:   nop
8:   xor r8,r8,r8
c:   rotlwi  r8,r8,0
   10:   xor r5,r5,r5
   14:   rotlwi  r5,r5,0
   18:   mr  r4,r3
   1c:   li  r8,66
   20:   add r8,r8,r5
   24:   rotlwi  r8,r8,0
   28:   mr  r3,r8
   2c:   blr

Signed-off-by: Jordan Niethe 


If this series is still applicable, it needs to be rebased of Naveen's 
series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=286000


Christophe



Re: [PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2022-01-07 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 27/07/2021 à 08:55, Jordan Niethe a écrit :

Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.


Maybe you could try and do as on PPC32, try to use r0 as much as possible 
instead of TMP regs.
r0 needs to be used carefully because for some instructions (ex: addi, lwz, etc) r0 means 0 instead 
of register 0, but it would help freeing one more register in several cases.


Yes, but I think the utility of register re-mapping is debatable on 
ppc64 since we are using NVRs only for BPF NVRs. Unlike the savings seen 
with the test case shown in the commit description (and with other test 
programs in test_bpf), most real world BPF programs will be generated by 
llvm which will only use the NVRs if necessary. I also suspect that most 
BPF programs will end up making at least one helper call.


On ppc32 though, there is value in re-mapping registers, especially 
BPF_REG_AX and TMP_REG, and to a lesser extent, BPF_REG_5, since those 
are volatile BPF registers and can be remapped regardless of a helper 
call.



- Naveen



Re: [PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2021-08-05 Thread Christophe Leroy




Le 27/07/2021 à 08:55, Jordan Niethe a écrit :

Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.


Maybe you could try and do as on PPC32, try to use r0 as much as possible 
instead of TMP regs.
r0 needs to be used carefully because for some instructions (ex: addi, lwz, etc) r0 means 0 instead 
of register 0, but it would help freeing one more register in several cases.




Before this patch, the test #359 ADD default X is:
0:   nop
4:   nop
8:   std r27,-40(r1)
c:   std r28,-32(r1)
   10:   xor r8,r8,r8
   14:   rotlwi  r8,r8,0
   18:   xor r28,r28,r28
   1c:   rotlwi  r28,r28,0
   20:   mr  r27,r3
   24:   li  r8,66
   28:   add r8,r8,r28
   2c:   rotlwi  r8,r8,0
   30:   ld  r27,-40(r1)
   34:   ld  r28,-32(r1)
   38:   mr  r3,r8
   3c:   blr

After this patch, the same test has become:
0:   nop
4:   nop
8:   xor r8,r8,r8
c:   rotlwi  r8,r8,0
   10:   xor r5,r5,r5
   14:   rotlwi  r5,r5,0
   18:   mr  r4,r3
   1c:   li  r8,66
   20:   add r8,r8,r5
   24:   rotlwi  r8,r8,0
   28:   mr  r3,r8
   2c:   blr

Signed-off-by: Jordan Niethe 
---
  arch/powerpc/net/bpf_jit64.h  |  2 ++
  arch/powerpc/net/bpf_jit_comp64.c | 60 +--
  2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 89b625d9342b..e20521bf77bf 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -70,6 +70,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
   */
  #define PPC_BPF_LL(ctx, r, base, i) do {  
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_LDX(r, base,   
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
@@ -78,6 +79,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
} while(0)
  #define PPC_BPF_STL(ctx, r, base, i) do { 
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_STDX(r, base,  
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index f7a668c1e364..287e0322bbf3 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -66,6 +66,24 @@ static int bpf_jit_stack_offsetof(struct codegen_context 
*ctx, int reg)
  
  void bpf_jit_realloc_regs(struct codegen_context *ctx)

  {
+   if (ctx->seen & SEEN_FUNC)
+   return;
+
+   while (ctx->seen & SEEN_NVREG_MASK &&
+  (ctx->seen & SEEN_VREG_MASK) != SEEN_VREG_MASK) {
+   int old = 32 - fls(ctx->seen & SEEN_NVREG_MASK);
+   int new = 32 - fls(~ctx->seen & SEEN_VREG_MASK);
+   int i;
+
+   for (i = BPF_REG_0; i <= TMP_REG_2; i++) {
+   if (ctx->b2p[i] != old)
+   continue;
+   ctx->b2p[i] = new;
+   bpf_set_seen_register(ctx, new);
+   bpf_clear_seen_register(ctx, old);
+   break;
+   }
+   }


This function is not very different from the one for PPC32. Maybe we could cook 
a common function.



  }
  
  void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)

@@ -106,10 +124,9 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 * If we haven't created our own stack frame, we save these
 * in the protected zone below the previous stack frame
 */
-   for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-   if (bpf_is_seen_register(ctx, bpf_to_ppc(ctx, i)))
-   PPC_BPF_STL(ctx, bpf_to_ppc(ctx, i), 1,
-   bpf_jit_stack_offsetof(ctx, bpf_to_ppc(ctx, 
i)));
+   for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
+   if (bpf_is_seen_register(ctx, i))
+   

[PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2021-07-27 Thread Jordan Niethe
Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.

Before this patch, the test #359 ADD default X is:
   0:   nop
   4:   nop
   8:   std r27,-40(r1)
   c:   std r28,-32(r1)
  10:   xor r8,r8,r8
  14:   rotlwi  r8,r8,0
  18:   xor r28,r28,r28
  1c:   rotlwi  r28,r28,0
  20:   mr  r27,r3
  24:   li  r8,66
  28:   add r8,r8,r28
  2c:   rotlwi  r8,r8,0
  30:   ld  r27,-40(r1)
  34:   ld  r28,-32(r1)
  38:   mr  r3,r8
  3c:   blr

After this patch, the same test has become:
   0:   nop
   4:   nop
   8:   xor r8,r8,r8
   c:   rotlwi  r8,r8,0
  10:   xor r5,r5,r5
  14:   rotlwi  r5,r5,0
  18:   mr  r4,r3
  1c:   li  r8,66
  20:   add r8,r8,r5
  24:   rotlwi  r8,r8,0
  28:   mr  r3,r8
  2c:   blr

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/net/bpf_jit64.h  |  2 ++
 arch/powerpc/net/bpf_jit_comp64.c | 60 +--
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 89b625d9342b..e20521bf77bf 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -70,6 +70,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
  */
 #define PPC_BPF_LL(ctx, r, base, i) do {   
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_LDX(r, base,   
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
@@ -78,6 +79,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
} while(0)
 #define PPC_BPF_STL(ctx, r, base, i) do {  
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_STDX(r, base,  
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index f7a668c1e364..287e0322bbf3 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -66,6 +66,24 @@ static int bpf_jit_stack_offsetof(struct codegen_context 
*ctx, int reg)
 
 void bpf_jit_realloc_regs(struct codegen_context *ctx)
 {
+   if (ctx->seen & SEEN_FUNC)
+   return;
+
+   while (ctx->seen & SEEN_NVREG_MASK &&
+  (ctx->seen & SEEN_VREG_MASK) != SEEN_VREG_MASK) {
+   int old = 32 - fls(ctx->seen & SEEN_NVREG_MASK);
+   int new = 32 - fls(~ctx->seen & SEEN_VREG_MASK);
+   int i;
+
+   for (i = BPF_REG_0; i <= TMP_REG_2; i++) {
+   if (ctx->b2p[i] != old)
+   continue;
+   ctx->b2p[i] = new;
+   bpf_set_seen_register(ctx, new);
+   bpf_clear_seen_register(ctx, old);
+   break;
+   }
+   }
 }
 
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
@@ -106,10 +124,9 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 * If we haven't created our own stack frame, we save these
 * in the protected zone below the previous stack frame
 */
-   for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-   if (bpf_is_seen_register(ctx, bpf_to_ppc(ctx, i)))
-   PPC_BPF_STL(ctx, bpf_to_ppc(ctx, i), 1,
-   bpf_jit_stack_offsetof(ctx, bpf_to_ppc(ctx, 
i)));
+   for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
+   if (bpf_is_seen_register(ctx, i))
+   PPC_BPF_STL(ctx, i, 1, bpf_jit_stack_offsetof(ctx, i));
 
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx, bpf_to_ppc(ctx, BPF_REG_FP)))
@@ -122,10 +139,9 @@ static void bpf_jit_emit_common_epilogue(u32 *image, 
struct codegen_context *ctx
int i;
 
/* Restore NVRs */
-   for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-   if (bpf_is_seen_register(ctx, bpf_to_ppc(ctx, i)))