Help with the BPF verifier

2018-11-01 Thread Arnaldo Carvalho de Melo
tl;dr: I seem to be trying to get past clang optimizations that get the
   verifier to accept my proggie.

Hi,

So I'm moving to use raw_syscalls:sys_exit to collect pointer
contents, using maps to tell the bpf program what to copy, how many
bytes, filters, etc.

I'm at the start of it at this point I need to use an index to
get to the right syscall arg that is a filename, starting just with
"open" and "openat", that have the filename in different args, so to get
this first part working I'm doing it directly in the bpf restricted C
program, later this will be to maps, etc, so if I set the index as a
constant, just for testing, it works, look at the "open" and "openat"
calls below, later we'll see why openat is failing to augment its
"filename" arg while "open" works:

[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c 
sleep 1
 ? ( ): sleep/10152  ... [continued]: execve()) = 0
 0.045 ( 0.004 ms): sleep/10152 brk() = 0x55ccff356000
 0.074 ( 0.007 ms): sleep/10152 access(filename: , mode: R) = -1 ENOENT No 
such file or directory
 0.089 ( 0.006 ms): sleep/10152 openat(dfd: CWD, filename: , flags: 
CLOEXEC) = 3
 0.097 ( 0.003 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd283f0) = 0
 0.103 ( 0.006 ms): sleep/10152 mmap(len: 103334, prot: READ, flags: 
PRIVATE, fd: 3) = 0x7f8ffee9c000
 0.111 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.135 ( 0.007 ms): sleep/10152 openat(dfd: CWD, filename: , flags: 
CLOEXEC) = 3
 0.144 ( 0.003 ms): sleep/10152 read(fd: 3, buf: 0x7ffecdd285b8, count: 
832) = 832
 0.150 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd28450) = 0
 0.155 ( 0.005 ms): sleep/10152 mmap(len: 8192, prot: READ|WRITE, flags: 
PRIVATE|ANONYMOUS) = 0x7f8ffee9a000
 0.166 ( 0.007 ms): sleep/10152 mmap(len: 3889792, prot: EXEC|READ, flags: 
PRIVATE|DENYWRITE, fd: 3) = 0x7f8ffe8dc000
 0.175 ( 0.010 ms): sleep/10152 mprotect(start: 0x7f8ffea89000, len: 
2093056) = 0
 0.188 ( 0.010 ms): sleep/10152 mmap(addr: 0x7f8ffec88000, len: 24576, 
prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 
0x7f8ffec88000
 0.204 ( 0.005 ms): sleep/10152 mmap(addr: 0x7f8ffec8e000, len: 14976, 
prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f8ffec8e000
 0.218 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.239 ( 0.002 ms): sleep/10152 arch_prctl(option: 4098, arg2: 
140256433779968) = 0
 0.312 ( 0.009 ms): sleep/10152 mprotect(start: 0x7f8ffec88000, len: 16384, 
prot: READ) = 0
 0.343 ( 0.005 ms): sleep/10152 mprotect(start: 0x55ccff1c6000, len: 4096, 
prot: READ) = 0
 0.354 ( 0.006 ms): sleep/10152 mprotect(start: 0x7f8ffeeb6000, len: 4096, 
prot: READ) = 0
 0.362 ( 0.019 ms): sleep/10152 munmap(addr: 0x7f8ffee9c000, len: 103334) = 0
 0.476 ( 0.002 ms): sleep/10152 brk() = 0x55ccff356000
 0.480 ( 0.004 ms): sleep/10152 brk(brk: 0x55ccff377000) = 0x55ccff377000
 0.487 ( 0.002 ms): sleep/10152 brk() = 0x55ccff377000
 0.497 ( 0.008 ms): sleep/10152 open(filename: 
/usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
 0.507 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7f8ffec8daa0) = 0
 0.511 ( 0.006 ms): sleep/10152 mmap(len: 113045344, prot: READ, flags: 
PRIVATE, fd: 3) = 0x7f8ff7d0d000
 0.524 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.574 (1000.140 ms): sleep/10152 nanosleep(rqtp: 0x7ffecdd29130) = 0
  1000.753 ( 0.007 ms): sleep/10152 close(fd: 1) = 0
  1000.767 ( 0.004 ms): sleep/10152 close(fd: 2) = 0
  1000.781 ( ): sleep/10152 exit_group()
[root@seventh perf]# 

 1  // SPDX-License-Identifier: GPL-2.0
 2  /*
 3   * Augment the raw_syscalls tracepoints with the contents of the 
pointer arguments.
 4   *
 5   * Test it with:
 6   *
 7   * perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c cat 
/etc/passwd > /dev/null
 8   *
 9   * This exactly matches what is marshalled into the 
raw_syscall:sys_enter
10   * payload expected by the 'perf trace' beautifiers.
11   *
12   * For now it just uses the existing tracepoint augmentation code in 
'perf
13   * trace', in the next csets we'll hook up these with the 
sys_enter/sys_exit
14   * code that will combine entry/exit in a strace like way.
15   */
   
16  #include 
17  #include 
   
18  /* bpf-output associated map */
19  struct bpf_map SEC("maps") __augmented_syscalls__ = {
20  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
21  .key_size = sizeof(int),
22  .value_size = sizeof(u32),
23  .max_entries = __NR_CPUS__,
24  };
   
25  struct syscall_enter_args {
26  unsigned long long common_tp_fields;
27  long   syscall_nr;
28  unsigned long  args[6];
29  };
   
30  struct syscall_exit_args {
31  unsigned long long common_tp_fields;
32  

Re: Help with the BPF verifier

2018-11-02 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2018 at 08:05:07PM +, Edward Cree escreveu:
> On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> >  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) 
> > R7=inv64 R10=fp0,call_-1
> > 15: (b7) r2 = 0
> > 16: (63) *(u32 *)(r10 -260) = r2
> > 17: (67) r1 <<= 32
> > 18: (77) r1 >>= 32
> > 19: (67) r1 <<= 3
> > 20: (bf) r2 = r6
> > 21: (0f) r2 += r1
> > 22: (79) r3 = *(u64 *)(r2 +16)
> > R2 invalid mem access 'inv'
> I wonder if you could run this with verifier log level 2?  (I'm not sure how
>  you would go about plumbing that through the perf tooling.)  It seems very
>  odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
>  during the shifts or whether the addition in insn 21 somehow produces the
>  unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
>  then also having an offset in the LDX produced an error about
>  ptr+const+const, but that seems to have been fixed at some point.)
> 
> Note however that even if we get past this, R1 at this point holds 6, so it
>  looks like the verifier is walking the impossible path where we're inside the
>  'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
>  limitation, that it walks paths with impossible combinations of constraints
>  (we've previously had cases where assertions in the verifier would blow up
>  because of this, e.g. registers with max_val less than min_val).  So if the
>  check_ctx_access() is going to worry about whether you're off the end of the
>  array (I'm not sure what your program type is and thus which is_valid_access
>  callback is involved), then it'll complain about this.
> If filename_arg came from some external source you'd have a different
>  problem, because then it would have a totally unknown value, that on entering
>  the 'if' becomes "unknown but < 6", which is still too variable to have as
>  the offset of a ctx access.  Those have to be at a known constant offset, so
>  that we can determine the type of the returned value.
> 
> As a way to fix this, how about [UNTESTED!]:
>     const void *filename_arg = NULL;
>     /* ... */
>     switch (augmented_args.args.syscall_nr) {
>         case SYS_OPEN: filename_arg = args->args[0]; break;
>         case SYS_OPENAT: filename_arg = args->args[1]; break;
>     }
>     /* ... */
>     if (filename_arg) {
>     /* stuff */
>     blah = probe_read_str(/* ... */, filename_arg);
>     } else {
>     /* the other stuff */
>     }
> That way, you're only ever dealing in constant pointers (although judging by
>  an old thread I found[1] about ptr+const+const, the compiler might decide to
>  make some optimisations that end up looking like your existing code).

Yeah, didn't work as well:

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;

probe_read(&augmented_args.args, sizeof(augmented_args.args), args);

switch (augmented_args.args.syscall_nr) {
case SYS_OPEN:   filename_arg = (const void *)args->args[0]; break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break;
}

if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(&augmented_args.filename.value,
  
sizeof(augmented_args.filename.value),
  filename_arg);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}

perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, 
&augmented_args, len);
return 0;
}

And the -vv in 'perf trace' didn't seem to map to further details in the
output of the verifier debug:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/hom

Re: Help with the BPF verifier

2018-11-02 Thread Edward Cree
On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> Yeah, didn't work as well: 

> And the -vv in 'perf trace' didn't seem to map to further details in the
> output of the verifier debug:
Yeah for log_level 2 you probably need to make source-level changes to either
 perf or libbpf (I think the latter).  It's annoying that essentially no tools
 plumb through an option for that, someone should fix them ;-)

> libbpf: -- BEGIN DUMP LOG ---
> libbpf: 
> 0: (bf) r6 = r1
> 1: (bf) r1 = r10
> 2: (07) r1 += -328
> 3: (b7) r7 = 64
> 4: (b7) r2 = 64
> 5: (bf) r3 = r6
> 6: (85) call bpf_probe_read#4
> 7: (79) r1 = *(u64 *)(r10 -320)
> 8: (15) if r1 == 0x101 goto pc+4
>  R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 9: (55) if r1 != 0x2 goto pc+22
>  R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 10: (bf) r1 = r6
> 11: (07) r1 += 16
> 12: (05) goto pc+2
> 15: (79) r3 = *(u64 *)(r1 +0)
> dereference of modified ctx ptr R1 off=16 disallowed
Aha, we at least got a different error message this time.
And indeed llvm has done that optimisation, rather than the more obvious
11: r3 = *(u64 *)(r1 +16)
 because it wants to have lots of reads share a single insn.  You may be able
 to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
 with llvm knowledge can figure out how to stop it (ideally, llvm would know
 when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
Alternatively, your prog looks short enough that maybe you could kick the C
 habit and write it directly in eBPF asm, that way no-one is optimising things
 behind your back.  (I realise this option won't appeal to everyone ;-)
The reason the verifier disallows this, iirc, is because it needs to be able
 to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case
 underlying kernel struct doesn't match the layout of the ctx ABI.  To do this
 it needs the ctx offset to live entirely in the insn doing the access,
 otherwise different paths could lead to the same insn accessing different ctx
 offsets with different fixups needed — can't be done.

-Ed


Re: Help with the BPF verifier

2018-11-02 Thread Yonghong Song


On 11/2/18 8:42 AM, Edward Cree wrote:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
>> Yeah, didn't work as well:
> 
>> And the -vv in 'perf trace' didn't seem to map to further details in the
>> output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
>   perf or libbpf (I think the latter).  It's annoying that essentially no 
> tools
>   plumb through an option for that, someone should fix them ;-)
> 
>> libbpf: -- BEGIN DUMP LOG ---
>> libbpf:
>> 0: (bf) r6 = r1
>> 1: (bf) r1 = r10
>> 2: (07) r1 += -328
>> 3: (b7) r7 = 64
>> 4: (b7) r2 = 64
>> 5: (bf) r3 = r6
>> 6: (85) call bpf_probe_read#4
>> 7: (79) r1 = *(u64 *)(r10 -320)
>> 8: (15) if r1 == 0x101 goto pc+4
>>   R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 9: (55) if r1 != 0x2 goto pc+22
>>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 10: (bf) r1 = r6
>> 11: (07) r1 += 16
>> 12: (05) goto pc+2
>> 15: (79) r3 = *(u64 *)(r1 +0)
>> dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
>   because it wants to have lots of reads share a single insn.  You may be able
>   to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
>   with llvm knowledge can figure out how to stop it (ideally, llvm would know
>   when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯

The optimization mostly likes below:
br1:
  ...
  r1 += 16
  goto merge
br2:
  ...
  r1 += 20
  goto merge
merge:
  *(u64 *)(r1 + 0)

The compiler tries to merge common loads. There is no easy way to
stop this compiler optimization without turning off a lot of other
optimizations. The easiest way is to add barriers
__asm__ __volatile__("": : :"memory")
after the ctx memory access to prevent their down stream merging.

> Alternatively, your prog looks short enough that maybe you could kick the C
>   habit and write it directly in eBPF asm, that way no-one is optimising 
> things
>   behind your back.  (I realise this option won't appeal to everyone ;-)

The LLVM supports BPF inline assembly as well. Some examples here
https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
You may try it for selective ctx access to work around some
compiler optimizations. I personally have not used it yet and actually
not sure whether it actually works or not :-)

> The reason the verifier disallows this, iirc, is because it needs to be able
>   to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case 
> >   underlying kernel struct doesn't match the layout of the ctx ABI.  
To do this
>   it needs the ctx offset to live entirely in the insn doing the access,
>   otherwise different paths could lead to the same insn accessing different 
> ctx
>   offsets with different fixups needed — can't be done.
> 
> -Ed
> 


Re: Help with the BPF verifier

2018-11-03 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 02, 2018 at 03:42:49PM +, Edward Cree escreveu:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> > Yeah, didn't work as well: 
> 
> > And the -vv in 'perf trace' didn't seem to map to further details in the
> > output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
>  perf or libbpf (I think the latter).  It's annoying that essentially no tools
>  plumb through an option for that, someone should fix them ;-)
> 
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf: 
> > 0: (bf) r6 = r1
> > 1: (bf) r1 = r10
> > 2: (07) r1 += -328
> > 3: (b7) r7 = 64
> > 4: (b7) r2 = 64
> > 5: (bf) r3 = r6
> > 6: (85) call bpf_probe_read#4
> > 7: (79) r1 = *(u64 *)(r10 -320)
> > 8: (15) if r1 == 0x101 goto pc+4
> >  R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 9: (55) if r1 != 0x2 goto pc+22
> >  R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 10: (bf) r1 = r6
> > 11: (07) r1 += 16
> > 12: (05) goto pc+2
> > 15: (79) r3 = *(u64 *)(r1 +0)
> > dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
>  because it wants to have lots of reads share a single insn.  You may be able
>  to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
>  with llvm knowledge can figure out how to stop it (ideally, llvm would know
>  when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯

set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: 
CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign 
-working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM 
-O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc 
-isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value 
-Wno-pointer-sign -working-directory 
/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c 
/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target 
bpf  -O2 -o -

So it is using -O2, lets try with -O0...

So I added this to my ~/.perfconfig, i.e. the default clang command line
template replacing -O2 with -O0.

[root@seventh perf]# cat ~/.perfconfig 
[llvm]
clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign 
-working-directory $WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf 
$CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE"
# dump-obj = true
[root@seventh perf]# 

And got an explosion:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/g

Re: Help with the BPF verifier

2018-11-03 Thread Arnaldo Carvalho de Melo
Em Sat, Nov 03, 2018 at 08:29:34AM -0300, Arnaldo Carvalho de Melo escreveu:
> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> Preprocessed source(s) and associated run script(s) are located at:
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
> clang-7: note: diagnostic msg: 
> 
> 
> ERROR:unable to compile 
> tools/perf/examples/bpf/augmented_raw_syscalls.c
> Hint: Check error message shown above.
> Hint: You can also pre-compile it into .o using:
>   clang -target bpf -O2 -c 
> tools/perf/examples/bpf/augmented_raw_syscalls.c
>   with proper -I and -D options.
> event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
>  \___ Failed to load 
> tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when 
> compiling BPF scriptlet
> 
> (add -v to see detail)
> Run 'perf list' for a list of valid events
> 
>  Usage: perf trace [] []
> or: perf trace [] --  []
> or: perf trace record [] []
> or: perf trace record [] --  []
> 
> -e, --eventevent/syscall selector. use 'perf list' to list 
> available events
> [root@seventh perf]# 
> 
> Trying with -O1...

-O1 doesn't get clang confused, its just the verifier that doesn't like
the result, i.e. we're back to that optimization, that isn't disabled
with -O1

llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc 
-isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value 
-Wno-pointer-sign -working-directory 
/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c 
/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target 
bpf  -O1 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from 
buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 
41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 8000, 
type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 
bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe006000-fe007000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe032000-fe033000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe05e000-fe05f000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe08a000-fe08b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program l

Re: Help with the BPF verifier

2018-11-05 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 02, 2018 at 09:27:52PM +, Yonghong Song escreveu:
> 
> 
> On 11/2/18 8:42 AM, Edward Cree wrote:
> > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> >> Yeah, didn't work as well:
> > 
> >> And the -vv in 'perf trace' didn't seem to map to further details in the
> >> output of the verifier debug:
> > Yeah for log_level 2 you probably need to make source-level changes to 
> > either
> >   perf or libbpf (I think the latter).  It's annoying that essentially no 
> > tools
> >   plumb through an option for that, someone should fix them ;-)
> > 
> >> libbpf: -- BEGIN DUMP LOG ---
> >> libbpf:
> >> 0: (bf) r6 = r1
> >> 1: (bf) r1 = r10
> >> 2: (07) r1 += -328
> >> 3: (b7) r7 = 64
> >> 4: (b7) r2 = 64
> >> 5: (bf) r3 = r6
> >> 6: (85) call bpf_probe_read#4
> >> 7: (79) r1 = *(u64 *)(r10 -320)
> >> 8: (15) if r1 == 0x101 goto pc+4
> >>   R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 
> >> R10=fp0,call_-1
> >> 9: (55) if r1 != 0x2 goto pc+22
> >>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 10: (bf) r1 = r6
> >> 11: (07) r1 += 16
> >> 12: (05) goto pc+2
> >> 15: (79) r3 = *(u64 *)(r1 +0)
> >> dereference of modified ctx ptr R1 off=16 disallowed
> > Aha, we at least got a different error message this time.
> > And indeed llvm has done that optimisation, rather than the more obvious
> > 11: r3 = *(u64 *)(r1 +16)
> >   because it wants to have lots of reads share a single insn.  You may be 
> > able
> >   to defeat that optimisation by adding compiler barriers, idk.  Maybe 
> > someone
> >   with llvm knowledge can figure out how to stop it (ideally, llvm would 
> > know
> >   when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
> 
> The optimization mostly likes below:
> br1:
>   ...
>   r1 += 16
>   goto merge
> br2:
>   ...
>   r1 += 20
>   goto merge
> merge:
>   *(u64 *)(r1 + 0)
> 
> The compiler tries to merge common loads. There is no easy way to
> stop this compiler optimization without turning off a lot of other
> optimizations. The easiest way is to add barriers
> __asm__ __volatile__("": : :"memory")
> after the ctx memory access to prevent their down stream merging.

Great, this made it work:

cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#include 
#include 

/* bpf-output associated map */
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};

struct syscall_enter_args {
unsigned long long common_tp_fields;
long   syscall_nr;
unsigned long  args[6];
};

struct syscall_exit_args {
unsigned long long common_tp_fields;
long   syscall_nr;
long   ret;
};

struct augmented_filename {
unsigned intsize;
int reserved;
charvalue[256];
};

#define SYS_OPEN 2
#define SYS_OPENAT 257

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;

probe_read(&augmented_args.args, sizeof(augmented_args.args), args);

switch (augmented_args.args.syscall_nr) {
case SYS_OPEN:   filename_arg = (const void *)args->args[0];
__asm__ __volatile__("": : :"memory");
 break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
 break;
default:
 return 0;
}

if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(&augmented_args.filename.value,
  
sizeof(augmented_args.filename.value),
  filename_arg);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}

perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, 
&augmented_args, len);
return 0;
}

SEC("raw_syscalls:sys_exit")
int sys_exit(struct syscall_exit_args *args)
{
struct syscall_exit_args augmented_args;
probe_read(&augmented_args, sizeof(augmented_args), args);
return augmented_args.syscall_nr == SYS_OPEN || 
augmented_args.syscall_nr == SYS_OPENAT;
}

license(GPL);

Re: Help with the BPF verifier

2018-11-01 Thread David Miller
From: Arnaldo Carvalho de Melo 
Date: Thu, 1 Nov 2018 15:52:17 -0300

> 50unsigned int filename_arg = 6;
 ...
> --- /wb/augmented_raw_syscalls.c.old  2018-11-01 15:43:55.000394234 -0300
> +++ /wb/augmented_raw_syscalls.c  2018-11-01 15:44:15.102367838 -0300
> @@ -67,7 +67,7 @@
>   augmented_args.filename.reserved = 0;
>   augmented_args.filename.size = 
> probe_read_str(&augmented_args.filename.value,
> 
> sizeof(augmented_args.filename.value),
> -   (const void 
> *)args->args[0]);
> +   (const void 
> *)args->args[filename_arg]);

args[] is sized to '6', therefore the last valid index is '5', yet you're using 
'6' here which
is one entry past the end of the declared array.


Re: Help with the BPF verifier

2018-11-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2018 at 12:10:39PM -0700, David Miller escreveu:
> From: Arnaldo Carvalho de Melo 
> Date: Thu, 1 Nov 2018 15:52:17 -0300
> 
> > 50  unsigned int filename_arg = 6;
>  ...
> > --- /wb/augmented_raw_syscalls.c.old2018-11-01 15:43:55.000394234 
> > -0300
> > +++ /wb/augmented_raw_syscalls.c2018-11-01 15:44:15.102367838 -0300
> > @@ -67,7 +67,7 @@
> > augmented_args.filename.reserved = 0;
> > augmented_args.filename.size = 
> > probe_read_str(&augmented_args.filename.value,
> >   
> > sizeof(augmented_args.filename.value),
> > - (const void 
> > *)args->args[0]);
> > + (const void 
> > *)args->args[filename_arg]);
> 
> args[] is sized to '6', therefore the last valid index is '5', yet you're 
> using '6' here which
> is one entry past the end of the declared array.

Nope... this is inside an if:

if (filename_arg <= 5) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(&augmented_args.filename.value,
  
sizeof(augmented_args.filename.value),
  (const void 
*)args->args[filename_arg]);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {

I use 6 to mean "hey, this syscall doesn't have any string argument, don't
bother with it".

- Arnaldo


Re: Help with the BPF verifier

2018-11-01 Thread David Miller
From: Arnaldo Carvalho de Melo 
Date: Thu, 1 Nov 2018 16:13:10 -0300

> Nope... this is inside an if:
> 
> if (filename_arg <= 5) {
> augmented_args.filename.reserved = 0;
> augmented_args.filename.size = 
> probe_read_str(&augmented_args.filename.value,
>   
> sizeof(augmented_args.filename.value),
>   (const void 
> *)args->args[filename_arg]);
> if (augmented_args.filename.size < 
> sizeof(augmented_args.filename.value)) {
> len -= sizeof(augmented_args.filename.value) - 
> augmented_args.filename.size;
> len &= sizeof(augmented_args.filename.value) - 1;
> }
> } else {
> 
> I use 6 to mean "hey, this syscall doesn't have any string argument, don't
> bother with it".

Really weird.  And it's unsigned so I can't imagine it wants you to
check that it's >= 0...

Maybe Deniel or someone else can figure it out.


Re: Help with the BPF verifier

2018-11-01 Thread Edward Cree
On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
>  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 
> R10=fp0,call_-1
> 15: (b7) r2 = 0
> 16: (63) *(u32 *)(r10 -260) = r2
> 17: (67) r1 <<= 32
> 18: (77) r1 >>= 32
> 19: (67) r1 <<= 3
> 20: (bf) r2 = r6
> 21: (0f) r2 += r1
> 22: (79) r3 = *(u64 *)(r2 +16)
> R2 invalid mem access 'inv'
I wonder if you could run this with verifier log level 2?  (I'm not sure how
 you would go about plumbing that through the perf tooling.)  It seems very
 odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
 during the shifts or whether the addition in insn 21 somehow produces the
 unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
 then also having an offset in the LDX produced an error about
 ptr+const+const, but that seems to have been fixed at some point.)

Note however that even if we get past this, R1 at this point holds 6, so it
 looks like the verifier is walking the impossible path where we're inside the
 'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
 limitation, that it walks paths with impossible combinations of constraints
 (we've previously had cases where assertions in the verifier would blow up
 because of this, e.g. registers with max_val less than min_val).  So if the
 check_ctx_access() is going to worry about whether you're off the end of the
 array (I'm not sure what your program type is and thus which is_valid_access
 callback is involved), then it'll complain about this.
If filename_arg came from some external source you'd have a different
 problem, because then it would have a totally unknown value, that on entering
 the 'if' becomes "unknown but < 6", which is still too variable to have as
 the offset of a ctx access.  Those have to be at a known constant offset, so
 that we can determine the type of the returned value.

As a way to fix this, how about [UNTESTED!]:
    const void *filename_arg = NULL;
    /* ... */
    switch (augmented_args.args.syscall_nr) {
        case SYS_OPEN: filename_arg = args->args[0]; break;
        case SYS_OPENAT: filename_arg = args->args[1]; break;
    }
    /* ... */
    if (filename_arg) {
    /* stuff */
    blah = probe_read_str(/* ... */, filename_arg);
    } else {
    /* the other stuff */
    }
That way, you're only ever dealing in constant pointers (although judging by
 an old thread I found[1] about ptr+const+const, the compiler might decide to
 make some optimisations that end up looking like your existing code).

As for what you want to do with the index coming from userspace, the verifier
 will not like that at all, as mentioned above, so I think you'll need to do
 something like:
    switch (filename_arg_from_userspace) {
    case 0: filename_arg = args->args[0]; break;
    case 1: filename_arg = args->args[1]; break;
    /* etc */
    default: filename_arg = NULL;
    }
 thus ensuring that you only ever have ctx pointers with constant offsets.

-Ed

[1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302