eddyz87 added a comment.

After retesting kernel build with LLVM=1 and libbpf patch 
<https://github.com/eddyz87/bpf/tree/llvm-d133361-ctx> to reconstruct 
btf_decl_tag [1], statistics for BPF selftests looks as follows:

- out of 653 object files 13 have some differences with and w/o this change;
- for 2 programs there is small instruction count increase (+2 insn total);
- for 5 programs there is small instruction decrease (-6 insn total);
- 6 programs differ slightly but number of instructions is the same.

(The differences are insignificant, the rest of the comment could be skipped as 
it is probably not interesting for anyone but me).

---

Differences for first 5 programs were already described in this 
<https://reviews.llvm.org/D133361#4467348> comment. The rest of the differences 
in described below.

netns_cookie_prog.bpf.o
-----------------------

Without ctx: 46 instructions
With ctx: 46 instructions

Instruction reordering:

   <get_netns_cookie_sk_msg>:
        r6 = r1
  -     r2 = *(u64 *)(r6 + 0x48)
        r1 = *(u32 *)(r6 + 0x10)
  -     if w1 != 0xa goto +0xb <LBB1_4>
  +     if w1 != 0xa goto +0xc <LBB1_4>
  +     r2 = *(u64 *)(r6 + 0x48)
        if r2 == 0x0 goto +0xa <LBB1_4>
        r1 = 0x0 ll
        r3 = 0x0

The difference is introduced by "Machine code sinking" transformation. Before 
the transformation both 0x48 and 0x10 loads reside in the same basic block:

  ;; Old:
  bb.0.entry:
    ...
    %0:gpr = CORE_LD64 345, %2:gpr, @"llvm.sk_msg_md:0:72$0:10:0"
    %9:gpr32 = CORE_LD32 350, %2:gpr, @"llvm.sk_msg_md:0:16$0:2"
    JNE_ri_32 killed %9:gpr32, 10, %bb.3
  
  ;; New:
  bb.0.entry:
    ...
    %0:gpr = LDD %2:gpr, 72
    %3:gpr32 = LDW32 %2:gpr, 16
    JNE_ri_32 killed %3:gpr32, 10, %bb.3

Note: CORE pseudo-instructions are replaced by regular loads because 
btf_decl_tag("ctx") has priority over preserve_access_index attribute. The 
"Machine code sinking" transformation (MachineSink.cpp) can move `LDD`, `LDW` 
instructions, but can't move `CORE_LD*` because `CORE_LD*` instructions are 
marked as `MCID::UnmodeledSideEffects` in `BPFGenInstrInfo.inc` (maybe 
something to adjust):

  // called from MachineSinking::SinkInstruction
  bool MachineInstr::isSafeToMove(AAResults *AA, bool &SawStore) const {
    if (... hasUnmodeledSideEffects())
      return false;
    ...
  }



sock_destroy_prog.bpf.o
-----------------------

Without ctx: 102 instructions
With ctx: 101 instructions

In the following code fragment:

        if (ctx->protocol == IPPROTO_TCP)
                bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0);
        else if (ctx->protocol == IPPROTO_UDP)
                bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0);
        else
                return 1;

Version w/o btf_decl_tag("ctx") keeps two loads for `ctx->protocol` because of 
the llvm.bpf.passthrough call. Version with btf_decl_tag("ctx") eliminates 
second load via a combination of EarlyCSEPass/InstCombinePass/SimplifyCFGPass 
passes.

socket_cookie_prog.bpf.o
------------------------

Without ctx: 66 instructions
With ctx: 66 instructions

For the following C code fragment:

  SEC("sockops")
  int update_cookie_sockops(struct bpf_sock_ops *ctx)
  {
        struct bpf_sock *sk = ctx->sk;
        struct socket_cookie *p;
  
        if (ctx->family != AF_INET6)
                return 1;
  
        if (ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
                return 1;
  
        if (!sk)
                return 1;
      ...
  }

Code with decl_tag("ctx") does reordering for ctx->sk load relative to 
ctx->family and ctx->op loads:

  //  old                                       new
   <update_cookie_sockops>:                  <update_cookie_sockops>:
      r6 = r1                                   r6 = r1
  -   r2 = *(u64 *)(r6 + 0xb8)
      r1 = *(u32 *)(r6 + 0x14)                  r1 = *(u32 *)(r6 + 0x14)
      if w1 != 0xa goto +0x13 <LBB1_6>          if w1 != 0xa goto +0x14 <LBB1_6>
      r1 = *(u32 *)(r6 + 0x0)                   r1 = *(u32 *)(r6 + 0x0)
      if w1 != 0x3 goto +0x11 <LBB1_6>          if w1 != 0x3 goto +0x12 <LBB1_6>
                                            +   r2 = *(u64 *)(r6 + 0xb8)
      if r2 == 0x0 goto +0x10 <LBB1_6>          if r2 == 0x0 goto +0x10 <LBB1_6>
      r1 = 0x0 ll                               r1 = 0x0 ll
      r3 = 0x0                                  r3 = 0x0

Code w/o decl_tag("ctx") uses `CORE_LD*` instructions for these loads and does 
not reorder loads due to reasons as in netns_cookie_prog.bpf.o.

test_lwt_reroute.bpf.o
----------------------

Without ctx: 18 instructions
With ctx: 17 instructions

The difference boils down EarlyCSEPass being able to remove last load in the 
store/load pair:

  llvm
  ; Before EarlyCSEPass
    store i32 %and, ptr %mark24, align 8
    %mark25 = getelementptr inbounds %struct.__sk_buff, ptr %skb, i32 0, i32 2
    %19 = load i32, ptr %mark25, align 8
    %cmp26 = icmp eq i32 %19, 0
  ; After EarlyCSEPass
    %and = and i32 %cond, 255
    store i32 %and, ptr %mark, align 8
    %cmp26 = icmp eq i32 %and, 0

And unable to do so when get.element.and.{store,load} intrinsics are used. 
Which leads to slight codegen differences downstream.

test_sockmap_invalid_update.bpf.o
---------------------------------

Without ctx: 13 instructions
With ctx: 12 instructions

In the following C fragment:

  c
        if (skops->sk)
                bpf_map_update_elem(&map, &key, skops->sk, 0);

Code with decl_tag("ctx") loads skops->sk only once. Code w/o decl_tag("ctx") 
uses CO-RE relocations and does load twice. As with sock_destroy_prog.bpf.o, 
EarlyCSEPass does not consolidate identical `%x = call llvm.bpf.passthrough; 
load %x` pairs.

type_cast.bpf.o
---------------

Without ctx: 96 instructions
With ctx: 96 instructions

`__builtin_memcpy(name, dev->name, IFNAMSIZ)` is unrolled in a different order. 
No idea why.

core_kern.bpf.o, test_verif_scale2.bpf.o
----------------------------------------

For both programs number of instructions is unchanged (11249, 12286). Some 
instructions have different order after DAG->DAG Pattern Instruction Selection. 
Instruction selection with CO-RE and non-CO-RE loads produces slightly 
different result.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133361/new/

https://reviews.llvm.org/D133361

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to