Help with the BPF verifier
tl;dr: I seem to be trying to get past clang optimizations that get the verifier to accept my proggie. Hi, So I'm moving to use raw_syscalls:sys_exit to collect pointer contents, using maps to tell the bpf program what to copy, how many bytes, filters, etc. I'm at the start of it at this point I need to use an index to get to the right syscall arg that is a filename, starting just with "open" and "openat", that have the filename in different args, so to get this first part working I'm doing it directly in the bpf restricted C program, later this will be to maps, etc, so if I set the index as a constant, just for testing, it works, look at the "open" and "openat" calls below, later we'll see why openat is failing to augment its "filename" arg while "open" works: [root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1 ? ( ): sleep/10152 ... [continued]: execve()) = 0 0.045 ( 0.004 ms): sleep/10152 brk() = 0x55ccff356000 0.074 ( 0.007 ms): sleep/10152 access(filename: , mode: R) = -1 ENOENT No such file or directory 0.089 ( 0.006 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3 0.097 ( 0.003 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd283f0) = 0 0.103 ( 0.006 ms): sleep/10152 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ffee9c000 0.111 ( 0.002 ms): sleep/10152 close(fd: 3) = 0 0.135 ( 0.007 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3 0.144 ( 0.003 ms): sleep/10152 read(fd: 3, buf: 0x7ffecdd285b8, count: 832) = 832 0.150 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd28450) = 0 0.155 ( 0.005 ms): sleep/10152 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f8ffee9a000 0.166 ( 0.007 ms): sleep/10152 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7f8ffe8dc000 0.175 ( 0.010 ms): sleep/10152 mprotect(start: 0x7f8ffea89000, len: 2093056) = 0 0.188 ( 0.010 ms): sleep/10152 mmap(addr: 0x7f8ffec88000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7f8ffec88000 0.204 ( 0.005 ms): sleep/10152 mmap(addr: 0x7f8ffec8e000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f8ffec8e000 0.218 ( 0.002 ms): sleep/10152 close(fd: 3) = 0 0.239 ( 0.002 ms): sleep/10152 arch_prctl(option: 4098, arg2: 140256433779968) = 0 0.312 ( 0.009 ms): sleep/10152 mprotect(start: 0x7f8ffec88000, len: 16384, prot: READ) = 0 0.343 ( 0.005 ms): sleep/10152 mprotect(start: 0x55ccff1c6000, len: 4096, prot: READ) = 0 0.354 ( 0.006 ms): sleep/10152 mprotect(start: 0x7f8ffeeb6000, len: 4096, prot: READ) = 0 0.362 ( 0.019 ms): sleep/10152 munmap(addr: 0x7f8ffee9c000, len: 103334) = 0 0.476 ( 0.002 ms): sleep/10152 brk() = 0x55ccff356000 0.480 ( 0.004 ms): sleep/10152 brk(brk: 0x55ccff377000) = 0x55ccff377000 0.487 ( 0.002 ms): sleep/10152 brk() = 0x55ccff377000 0.497 ( 0.008 ms): sleep/10152 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 0.507 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7f8ffec8daa0) = 0 0.511 ( 0.006 ms): sleep/10152 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ff7d0d000 0.524 ( 0.002 ms): sleep/10152 close(fd: 3) = 0 0.574 (1000.140 ms): sleep/10152 nanosleep(rqtp: 0x7ffecdd29130) = 0 1000.753 ( 0.007 ms): sleep/10152 close(fd: 1) = 0 1000.767 ( 0.004 ms): sleep/10152 close(fd: 2) = 0 1000.781 ( ): sleep/10152 exit_group() [root@seventh perf]# 1 // SPDX-License-Identifier: GPL-2.0 2 /* 3 * Augment the raw_syscalls tracepoints with the contents of the pointer arguments. 4 * 5 * Test it with: 6 * 7 * perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c cat /etc/passwd > /dev/null 8 * 9 * This exactly matches what is marshalled into the raw_syscall:sys_enter 10 * payload expected by the 'perf trace' beautifiers. 11 * 12 * For now it just uses the existing tracepoint augmentation code in 'perf 13 * trace', in the next csets we'll hook up these with the sys_enter/sys_exit 14 * code that will combine entry/exit in a strace like way. 15 */ 16 #include 17 #include 18 /* bpf-output associated map */ 19 struct bpf_map SEC("maps") __augmented_syscalls__ = { 20 .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, 21 .key_size = sizeof(int), 22 .value_size = sizeof(u32), 23 .max_entries = __NR_CPUS__, 24 }; 25 struct syscall_enter_args { 26 unsigned long long common_tp_fields; 27 long syscall_nr; 28 unsigned long args[6]; 29 }; 30 struct syscall_exit_args { 31 unsigned long long common_tp_fields; 32
Re: Help with the BPF verifier
Em Thu, Nov 01, 2018 at 08:05:07PM +, Edward Cree escreveu: > On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote: > > R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) > > R7=inv64 R10=fp0,call_-1 > > 15: (b7) r2 = 0 > > 16: (63) *(u32 *)(r10 -260) = r2 > > 17: (67) r1 <<= 32 > > 18: (77) r1 >>= 32 > > 19: (67) r1 <<= 3 > > 20: (bf) r2 = r6 > > 21: (0f) r2 += r1 > > 22: (79) r3 = *(u64 *)(r2 +16) > > R2 invalid mem access 'inv' > I wonder if you could run this with verifier log level 2? (I'm not sure how > you would go about plumbing that through the perf tooling.) It seems very > odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown > during the shifts or whether the addition in insn 21 somehow produces the > unknown-ness. (I know we used to have a thing[1] where doing ptr += K and > then also having an offset in the LDX produced an error about > ptr+const+const, but that seems to have been fixed at some point.) > > Note however that even if we get past this, R1 at this point holds 6, so it > looks like the verifier is walking the impossible path where we're inside the > 'if' even though filename_arg = 6. This is a (slightly annoying) verifier > limitation, that it walks paths with impossible combinations of constraints > (we've previously had cases where assertions in the verifier would blow up > because of this, e.g. registers with max_val less than min_val). So if the > check_ctx_access() is going to worry about whether you're off the end of the > array (I'm not sure what your program type is and thus which is_valid_access > callback is involved), then it'll complain about this. > If filename_arg came from some external source you'd have a different > problem, because then it would have a totally unknown value, that on entering > the 'if' becomes "unknown but < 6", which is still too variable to have as > the offset of a ctx access. Those have to be at a known constant offset, so > that we can determine the type of the returned value. > > As a way to fix this, how about [UNTESTED!]: > const void *filename_arg = NULL; > /* ... */ > switch (augmented_args.args.syscall_nr) { > case SYS_OPEN: filename_arg = args->args[0]; break; > case SYS_OPENAT: filename_arg = args->args[1]; break; > } > /* ... */ > if (filename_arg) { > /* stuff */ > blah = probe_read_str(/* ... */, filename_arg); > } else { > /* the other stuff */ > } > That way, you're only ever dealing in constant pointers (although judging by > an old thread I found[1] about ptr+const+const, the compiler might decide to > make some optimisations that end up looking like your existing code). Yeah, didn't work as well: SEC("raw_syscalls:sys_enter") int sys_enter(struct syscall_enter_args *args) { struct { struct syscall_enter_args args; struct augmented_filename filename; } augmented_args; unsigned int len = sizeof(augmented_args); const void *filename_arg = NULL; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); switch (augmented_args.args.syscall_nr) { case SYS_OPEN: filename_arg = (const void *)args->args[0]; break; case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break; } if (filename_arg != NULL) { augmented_args.filename.reserved = 0; augmented_args.filename.size = probe_read_str(&augmented_args.filename.value, sizeof(augmented_args.filename.value), filename_arg); if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) { len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size; len &= sizeof(augmented_args.filename.value) - 1; } } else { len = sizeof(augmented_args.args); } perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len); return 0; } And the -vv in 'perf trace' didn't seem to map to further details in the output of the verifier debug: # trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1 bpf: builtin compilation failed: -95, try external compiler Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /hom
Re: Help with the BPF verifier
On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote: > Yeah, didn't work as well: > And the -vv in 'perf trace' didn't seem to map to further details in the > output of the verifier debug: Yeah for log_level 2 you probably need to make source-level changes to either perf or libbpf (I think the latter). It's annoying that essentially no tools plumb through an option for that, someone should fix them ;-) > libbpf: -- BEGIN DUMP LOG --- > libbpf: > 0: (bf) r6 = r1 > 1: (bf) r1 = r10 > 2: (07) r1 += -328 > 3: (b7) r7 = 64 > 4: (b7) r2 = 64 > 5: (bf) r3 = r6 > 6: (85) call bpf_probe_read#4 > 7: (79) r1 = *(u64 *)(r10 -320) > 8: (15) if r1 == 0x101 goto pc+4 > R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > 9: (55) if r1 != 0x2 goto pc+22 > R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > 10: (bf) r1 = r6 > 11: (07) r1 += 16 > 12: (05) goto pc+2 > 15: (79) r3 = *(u64 *)(r1 +0) > dereference of modified ctx ptr R1 off=16 disallowed Aha, we at least got a different error message this time. And indeed llvm has done that optimisation, rather than the more obvious 11: r3 = *(u64 *)(r1 +16) because it wants to have lots of reads share a single insn. You may be able to defeat that optimisation by adding compiler barriers, idk. Maybe someone with llvm knowledge can figure out how to stop it (ideally, llvm would know when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯ Alternatively, your prog looks short enough that maybe you could kick the C habit and write it directly in eBPF asm, that way no-one is optimising things behind your back. (I realise this option won't appeal to everyone ;-) The reason the verifier disallows this, iirc, is because it needs to be able to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case underlying kernel struct doesn't match the layout of the ctx ABI. To do this it needs the ctx offset to live entirely in the insn doing the access, otherwise different paths could lead to the same insn accessing different ctx offsets with different fixups needed — can't be done. -Ed
Re: Help with the BPF verifier
On 11/2/18 8:42 AM, Edward Cree wrote: > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote: >> Yeah, didn't work as well: > >> And the -vv in 'perf trace' didn't seem to map to further details in the >> output of the verifier debug: > Yeah for log_level 2 you probably need to make source-level changes to either > perf or libbpf (I think the latter). It's annoying that essentially no > tools > plumb through an option for that, someone should fix them ;-) > >> libbpf: -- BEGIN DUMP LOG --- >> libbpf: >> 0: (bf) r6 = r1 >> 1: (bf) r1 = r10 >> 2: (07) r1 += -328 >> 3: (b7) r7 = 64 >> 4: (b7) r2 = 64 >> 5: (bf) r3 = r6 >> 6: (85) call bpf_probe_read#4 >> 7: (79) r1 = *(u64 *)(r10 -320) >> 8: (15) if r1 == 0x101 goto pc+4 >> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 >> 9: (55) if r1 != 0x2 goto pc+22 >> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 >> 10: (bf) r1 = r6 >> 11: (07) r1 += 16 >> 12: (05) goto pc+2 >> 15: (79) r3 = *(u64 *)(r1 +0) >> dereference of modified ctx ptr R1 off=16 disallowed > Aha, we at least got a different error message this time. > And indeed llvm has done that optimisation, rather than the more obvious > 11: r3 = *(u64 *)(r1 +16) > because it wants to have lots of reads share a single insn. You may be able > to defeat that optimisation by adding compiler barriers, idk. Maybe someone > with llvm knowledge can figure out how to stop it (ideally, llvm would know > when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯ The optimization mostly likes below: br1: ... r1 += 16 goto merge br2: ... r1 += 20 goto merge merge: *(u64 *)(r1 + 0) The compiler tries to merge common loads. There is no easy way to stop this compiler optimization without turning off a lot of other optimizations. The easiest way is to add barriers __asm__ __volatile__("": : :"memory") after the ctx memory access to prevent their down stream merging. > Alternatively, your prog looks short enough that maybe you could kick the C > habit and write it directly in eBPF asm, that way no-one is optimising > things > behind your back. (I realise this option won't appeal to everyone ;-) The LLVM supports BPF inline assembly as well. Some examples here https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll You may try it for selective ctx access to work around some compiler optimizations. I personally have not used it yet and actually not sure whether it actually works or not :-) > The reason the verifier disallows this, iirc, is because it needs to be able > to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case > > underlying kernel struct doesn't match the layout of the ctx ABI. To do this > it needs the ctx offset to live entirely in the insn doing the access, > otherwise different paths could lead to the same insn accessing different > ctx > offsets with different fixups needed — can't be done. > > -Ed >
Re: Help with the BPF verifier
Em Fri, Nov 02, 2018 at 03:42:49PM +, Edward Cree escreveu: > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote: > > Yeah, didn't work as well: > > > And the -vv in 'perf trace' didn't seem to map to further details in the > > output of the verifier debug: > Yeah for log_level 2 you probably need to make source-level changes to either > perf or libbpf (I think the latter). It's annoying that essentially no tools > plumb through an option for that, someone should fix them ;-) > > > libbpf: -- BEGIN DUMP LOG --- > > libbpf: > > 0: (bf) r6 = r1 > > 1: (bf) r1 = r10 > > 2: (07) r1 += -328 > > 3: (b7) r7 = 64 > > 4: (b7) r2 = 64 > > 5: (bf) r3 = r6 > > 6: (85) call bpf_probe_read#4 > > 7: (79) r1 = *(u64 *)(r10 -320) > > 8: (15) if r1 == 0x101 goto pc+4 > > R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > > 9: (55) if r1 != 0x2 goto pc+22 > > R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > > 10: (bf) r1 = r6 > > 11: (07) r1 += 16 > > 12: (05) goto pc+2 > > 15: (79) r3 = *(u64 *)(r1 +0) > > dereference of modified ctx ptr R1 off=16 disallowed > Aha, we at least got a different error message this time. > And indeed llvm has done that optimisation, rather than the more obvious > 11: r3 = *(u64 *)(r1 +16) > because it wants to have lots of reads share a single insn. You may be able > to defeat that optimisation by adding compiler barriers, idk. Maybe someone > with llvm knowledge can figure out how to stop it (ideally, llvm would know > when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯ set env: NR_CPUS=4 set env: LINUX_VERSION_CODE=0x41300 set env: CLANG_EXEC=/usr/local/bin/clang unset env: CLANG_OPTIONS set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O2 -o - So it is using -O2, lets try with -O0... So I added this to my ~/.perfconfig, i.e. the default clang command line template replacing -O2 with -O0. [root@seventh perf]# cat ~/.perfconfig [llvm] clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE" # dump-obj = true [root@seventh perf]# And got an explosion: # trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1 bpf: builtin compilation failed: -95, try external compiler Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: NR_CPUS=4 set env: LINUX_VERSION_CODE=0x41300 set env: CLANG_EXEC=/usr/local/bin/clang unset env: CLANG_OPTIONS set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/g
Re: Help with the BPF verifier
Em Sat, Nov 03, 2018 at 08:29:34AM -0300, Arnaldo Carvalho de Melo escreveu: > PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: > Preprocessed source(s) and associated run script(s) are located at: > clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c > clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh > clang-7: note: diagnostic msg: > > > ERROR:unable to compile > tools/perf/examples/bpf/augmented_raw_syscalls.c > Hint: Check error message shown above. > Hint: You can also pre-compile it into .o using: > clang -target bpf -O2 -c > tools/perf/examples/bpf/augmented_raw_syscalls.c > with proper -I and -D options. > event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c' > \___ Failed to load > tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when > compiling BPF scriptlet > > (add -v to see detail) > Run 'perf list' for a list of valid events > > Usage: perf trace [] [] > or: perf trace [] -- [] > or: perf trace record [] [] > or: perf trace record [] -- [] > > -e, --eventevent/syscall selector. use 'perf list' to list > available events > [root@seventh perf]# > > Trying with -O1... -O1 doesn't get clang confused, its just the verifier that doesn't like the result, i.e. we're back to that optimization, that isn't disabled with -O1 llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O1 -o - libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3 libbpf: skip section(1) .strtab libbpf: section(2) .text, size 0, link 0, flags 6, type=1 libbpf: skip section(2) .text libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1 libbpf: found program raw_syscalls:sys_enter libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9 libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1 libbpf: found program raw_syscalls:sys_exit libbpf: section(6) maps, size 56, link 0, flags 3, type=1 libbpf: section(7) license, size 4, link 0, flags 3, type=1 libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL libbpf: section(8) version, size 4, link 0, flags 3, type=1 libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300 libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 8000, type=1879002115 libbpf: skip section(9) .llvm_addrsig libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2 libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes libbpf: map 0 is "__augmented_syscalls__" libbpf: map 1 is "__bpf_stdout__" libbpf: collecting relocating info for: 'raw_syscalls:sys_enter' libbpf: relo for 4 value 28 name 124 libbpf: relocation: insn_idx=35 libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35 Added extra kernel map __entry_SYSCALL_64_trampoline fe006000-fe007000 Added extra kernel map __entry_SYSCALL_64_trampoline fe032000-fe033000 Added extra kernel map __entry_SYSCALL_64_trampoline fe05e000-fe05f000 Added extra kernel map __entry_SYSCALL_64_trampoline fe08a000-fe08b000 bpf: config program 'raw_syscalls:sys_enter' bpf: config program 'raw_syscalls:sys_exit' libbpf: create map __bpf_stdout__: fd=3 libbpf: create map __augmented_syscalls__: fd=4 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: 0: (bf) r6 = r1 1: (bf) r1 = r10 2: (07) r1 += -328 3: (b7) r7 = 64 4: (b7) r2 = 64 5: (bf) r3 = r6 6: (85) call bpf_probe_read#4 7: (79) r1 = *(u64 *)(r10 -320) 8: (15) if r1 == 0x101 goto pc+4 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 9: (55) if r1 != 0x2 goto pc+22 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 10: (bf) r1 = r6 11: (07) r1 += 16 12: (05) goto pc+2 15: (79) r3 = *(u64 *)(r1 +0) dereference of modified ctx ptr R1 off=16 disallowed libbpf: -- END LOG -- libbpf: failed to load program 'raw_syscalls:sys_enter' libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' bpf: load objects failed: err=-4007: (Kernel verifier blocks program l
Re: Help with the BPF verifier
Em Fri, Nov 02, 2018 at 09:27:52PM +, Yonghong Song escreveu: > > > On 11/2/18 8:42 AM, Edward Cree wrote: > > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote: > >> Yeah, didn't work as well: > > > >> And the -vv in 'perf trace' didn't seem to map to further details in the > >> output of the verifier debug: > > Yeah for log_level 2 you probably need to make source-level changes to > > either > > perf or libbpf (I think the latter). It's annoying that essentially no > > tools > > plumb through an option for that, someone should fix them ;-) > > > >> libbpf: -- BEGIN DUMP LOG --- > >> libbpf: > >> 0: (bf) r6 = r1 > >> 1: (bf) r1 = r10 > >> 2: (07) r1 += -328 > >> 3: (b7) r7 = 64 > >> 4: (b7) r2 = 64 > >> 5: (bf) r3 = r6 > >> 6: (85) call bpf_probe_read#4 > >> 7: (79) r1 = *(u64 *)(r10 -320) > >> 8: (15) if r1 == 0x101 goto pc+4 > >> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 > >> R10=fp0,call_-1 > >> 9: (55) if r1 != 0x2 goto pc+22 > >> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1 > >> 10: (bf) r1 = r6 > >> 11: (07) r1 += 16 > >> 12: (05) goto pc+2 > >> 15: (79) r3 = *(u64 *)(r1 +0) > >> dereference of modified ctx ptr R1 off=16 disallowed > > Aha, we at least got a different error message this time. > > And indeed llvm has done that optimisation, rather than the more obvious > > 11: r3 = *(u64 *)(r1 +16) > > because it wants to have lots of reads share a single insn. You may be > > able > > to defeat that optimisation by adding compiler barriers, idk. Maybe > > someone > > with llvm knowledge can figure out how to stop it (ideally, llvm would > > know > > when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯ > > The optimization mostly likes below: > br1: > ... > r1 += 16 > goto merge > br2: > ... > r1 += 20 > goto merge > merge: > *(u64 *)(r1 + 0) > > The compiler tries to merge common loads. There is no easy way to > stop this compiler optimization without turning off a lot of other > optimizations. The easiest way is to add barriers > __asm__ __volatile__("": : :"memory") > after the ctx memory access to prevent their down stream merging. Great, this made it work: cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c #include #include /* bpf-output associated map */ struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_args { unsigned long long common_tp_fields; long syscall_nr; unsigned long args[6]; }; struct syscall_exit_args { unsigned long long common_tp_fields; long syscall_nr; long ret; }; struct augmented_filename { unsigned intsize; int reserved; charvalue[256]; }; #define SYS_OPEN 2 #define SYS_OPENAT 257 SEC("raw_syscalls:sys_enter") int sys_enter(struct syscall_enter_args *args) { struct { struct syscall_enter_args args; struct augmented_filename filename; } augmented_args; unsigned int len = sizeof(augmented_args); const void *filename_arg = NULL; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); switch (augmented_args.args.syscall_nr) { case SYS_OPEN: filename_arg = (const void *)args->args[0]; __asm__ __volatile__("": : :"memory"); break; case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break; default: return 0; } if (filename_arg != NULL) { augmented_args.filename.reserved = 0; augmented_args.filename.size = probe_read_str(&augmented_args.filename.value, sizeof(augmented_args.filename.value), filename_arg); if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) { len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size; len &= sizeof(augmented_args.filename.value) - 1; } } else { len = sizeof(augmented_args.args); } perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len); return 0; } SEC("raw_syscalls:sys_exit") int sys_exit(struct syscall_exit_args *args) { struct syscall_exit_args augmented_args; probe_read(&augmented_args, sizeof(augmented_args), args); return augmented_args.syscall_nr == SYS_OPEN || augmented_args.syscall_nr == SYS_OPENAT; } license(GPL);
Re: Help with the BPF verifier
From: Arnaldo Carvalho de Melo Date: Thu, 1 Nov 2018 15:52:17 -0300 > 50unsigned int filename_arg = 6; ... > --- /wb/augmented_raw_syscalls.c.old 2018-11-01 15:43:55.000394234 -0300 > +++ /wb/augmented_raw_syscalls.c 2018-11-01 15:44:15.102367838 -0300 > @@ -67,7 +67,7 @@ > augmented_args.filename.reserved = 0; > augmented_args.filename.size = > probe_read_str(&augmented_args.filename.value, > > sizeof(augmented_args.filename.value), > - (const void > *)args->args[0]); > + (const void > *)args->args[filename_arg]); args[] is sized to '6', therefore the last valid index is '5', yet you're using '6' here which is one entry past the end of the declared array.
Re: Help with the BPF verifier
Em Thu, Nov 01, 2018 at 12:10:39PM -0700, David Miller escreveu: > From: Arnaldo Carvalho de Melo > Date: Thu, 1 Nov 2018 15:52:17 -0300 > > > 50 unsigned int filename_arg = 6; > ... > > --- /wb/augmented_raw_syscalls.c.old2018-11-01 15:43:55.000394234 > > -0300 > > +++ /wb/augmented_raw_syscalls.c2018-11-01 15:44:15.102367838 -0300 > > @@ -67,7 +67,7 @@ > > augmented_args.filename.reserved = 0; > > augmented_args.filename.size = > > probe_read_str(&augmented_args.filename.value, > > > > sizeof(augmented_args.filename.value), > > - (const void > > *)args->args[0]); > > + (const void > > *)args->args[filename_arg]); > > args[] is sized to '6', therefore the last valid index is '5', yet you're > using '6' here which > is one entry past the end of the declared array. Nope... this is inside an if: if (filename_arg <= 5) { augmented_args.filename.reserved = 0; augmented_args.filename.size = probe_read_str(&augmented_args.filename.value, sizeof(augmented_args.filename.value), (const void *)args->args[filename_arg]); if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) { len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size; len &= sizeof(augmented_args.filename.value) - 1; } } else { I use 6 to mean "hey, this syscall doesn't have any string argument, don't bother with it". - Arnaldo
Re: Help with the BPF verifier
From: Arnaldo Carvalho de Melo Date: Thu, 1 Nov 2018 16:13:10 -0300 > Nope... this is inside an if: > > if (filename_arg <= 5) { > augmented_args.filename.reserved = 0; > augmented_args.filename.size = > probe_read_str(&augmented_args.filename.value, > > sizeof(augmented_args.filename.value), > (const void > *)args->args[filename_arg]); > if (augmented_args.filename.size < > sizeof(augmented_args.filename.value)) { > len -= sizeof(augmented_args.filename.value) - > augmented_args.filename.size; > len &= sizeof(augmented_args.filename.value) - 1; > } > } else { > > I use 6 to mean "hey, this syscall doesn't have any string argument, don't > bother with it". Really weird. And it's unsigned so I can't imagine it wants you to check that it's >= 0... Maybe Deniel or someone else can figure it out.
Re: Help with the BPF verifier
On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote: > R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 > R10=fp0,call_-1 > 15: (b7) r2 = 0 > 16: (63) *(u32 *)(r10 -260) = r2 > 17: (67) r1 <<= 32 > 18: (77) r1 >>= 32 > 19: (67) r1 <<= 3 > 20: (bf) r2 = r6 > 21: (0f) r2 += r1 > 22: (79) r3 = *(u64 *)(r2 +16) > R2 invalid mem access 'inv' I wonder if you could run this with verifier log level 2? (I'm not sure how you would go about plumbing that through the perf tooling.) It seems very odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown during the shifts or whether the addition in insn 21 somehow produces the unknown-ness. (I know we used to have a thing[1] where doing ptr += K and then also having an offset in the LDX produced an error about ptr+const+const, but that seems to have been fixed at some point.) Note however that even if we get past this, R1 at this point holds 6, so it looks like the verifier is walking the impossible path where we're inside the 'if' even though filename_arg = 6. This is a (slightly annoying) verifier limitation, that it walks paths with impossible combinations of constraints (we've previously had cases where assertions in the verifier would blow up because of this, e.g. registers with max_val less than min_val). So if the check_ctx_access() is going to worry about whether you're off the end of the array (I'm not sure what your program type is and thus which is_valid_access callback is involved), then it'll complain about this. If filename_arg came from some external source you'd have a different problem, because then it would have a totally unknown value, that on entering the 'if' becomes "unknown but < 6", which is still too variable to have as the offset of a ctx access. Those have to be at a known constant offset, so that we can determine the type of the returned value. As a way to fix this, how about [UNTESTED!]: const void *filename_arg = NULL; /* ... */ switch (augmented_args.args.syscall_nr) { case SYS_OPEN: filename_arg = args->args[0]; break; case SYS_OPENAT: filename_arg = args->args[1]; break; } /* ... */ if (filename_arg) { /* stuff */ blah = probe_read_str(/* ... */, filename_arg); } else { /* the other stuff */ } That way, you're only ever dealing in constant pointers (although judging by an old thread I found[1] about ptr+const+const, the compiler might decide to make some optimisations that end up looking like your existing code). As for what you want to do with the index coming from userspace, the verifier will not like that at all, as mentioned above, so I think you'll need to do something like: switch (filename_arg_from_userspace) { case 0: filename_arg = args->args[0]; break; case 1: filename_arg = args->args[1]; break; /* etc */ default: filename_arg = NULL; } thus ensuring that you only ever have ctx pointers with constant offsets. -Ed [1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302