Re: [iovisor-dev] Bcc on 32-bit ARM

2017-10-13 Thread Y Song via iovisor-dev
On Fri, Oct 13, 2017 at 5:53 AM, Adrian Szyndela via iovisor-dev
 wrote:
>
> W dniu 04.10.2017 o 21:14, Y Song pisze:
>>
>> On Wed, Oct 4, 2017 at 3:24 AM, Adrian Szyndela via iovisor-dev
>>  wrote:
>>>
>>> - return type of bpf_pseudo_fd is 'u64', and it is passed to functions
>>> that
>>> take pointers; the compiler generates 'trunc' in IR for this, which
>>> becomes
>>> shift-left + shift-right in final bpf code; and bpf verifier does not
>>> like
>>> it when pointers are shifted. We worked it around by changing return type
>>> of
>>> bpf_pseudo_fd/llvm.bpf.pseudo from 'u64' to 'void *' - both in bcc
>>>
>>> (https://github.com/aszyndela/bcc/commit/8eb8fb2e265b93d38c1ad53efcb7da8883a3d643)
>>> and llvm
>>> (include/llvm/IR/IntrinsicsBPF.td):
>>>
>>>   def int_bpf_pseudo : GCCBuiltin<"__builtin_bpf_pseudo">,
>>>-  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
>>>+  Intrinsic<[llvm_ptr_ty], [llvm_i64_ty, llvm_i64_ty]>;
>>>
>>> Obviously we don't know what we're doing, but at least it works for us.
>>
>>
>> This should work as well. essentially bpf_pseudo tries to convert a
>> load map_id to load_imm64 address. So its return value is indeed a ptr.
>
>
> I asked about it on llvm-dev and the only suggestion I got so far is to set
> bpf target for clang:
> https://lists.llvm.org/pipermail/llvm-dev/2017-October/118149.html
>
> And it makes sense...
>
> We can easily get into trouble when we mix two targets in the process. For
> example, it looks like sizeofs are computed at the "clang" stage, but actual
> sizes of some types are "known" at the "llc" stage.
>
> However, switching to bpf target for the whole process takes away
> convenience and maintainability: we can't rely on including kernel headers
> anymore. So we have to craft everything we need by hand, including struct
> pt_regs...
>
> Any thoughts how to handle this?

All these are valid. For certain program types e.g., networking where pt_regs
are not needed, you can already use -target bpf in clang. Check
linux:tools/testing/selftests/bpf/Makefile.

Since currently we do not have good solutions for pt_regs when "clang
-target bpf" is
used, let us stick to the approach "clang -target arm"/"llc -target bpf".

> ___
> iovisor-dev mailing list
> iovisor-dev@lists.iovisor.org
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Bcc on 32-bit ARM

2017-10-13 Thread Adrian Szyndela via iovisor-dev


W dniu 04.10.2017 o 21:14, Y Song pisze:

On Wed, Oct 4, 2017 at 3:24 AM, Adrian Szyndela via iovisor-dev
 wrote:

- return type of bpf_pseudo_fd is 'u64', and it is passed to functions that
take pointers; the compiler generates 'trunc' in IR for this, which becomes
shift-left + shift-right in final bpf code; and bpf verifier does not like
it when pointers are shifted. We worked it around by changing return type of
bpf_pseudo_fd/llvm.bpf.pseudo from 'u64' to 'void *' - both in bcc
(https://github.com/aszyndela/bcc/commit/8eb8fb2e265b93d38c1ad53efcb7da8883a3d643)
and llvm
(include/llvm/IR/IntrinsicsBPF.td):

  def int_bpf_pseudo : GCCBuiltin<"__builtin_bpf_pseudo">,
   -  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
   +  Intrinsic<[llvm_ptr_ty], [llvm_i64_ty, llvm_i64_ty]>;

Obviously we don't know what we're doing, but at least it works for us.


This should work as well. essentially bpf_pseudo tries to convert a
load map_id to load_imm64 address. So its return value is indeed a ptr.


I asked about it on llvm-dev and the only suggestion I got so far is to 
set bpf target for clang:

https://lists.llvm.org/pipermail/llvm-dev/2017-October/118149.html

And it makes sense...

We can easily get into trouble when we mix two targets in the process. 
For example, it looks like sizeofs are computed at the "clang" stage, 
but actual sizes of some types are "known" at the "llc" stage.


However, switching to bpf target for the whole process takes away 
convenience and maintainability: we can't rely on including kernel 
headers anymore. So we have to craft everything we need by hand, 
including struct pt_regs...


Any thoughts how to handle this?
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Bcc on 32-bit ARM

2017-10-11 Thread Adrian Szyndela via iovisor-dev



W dniu 04.10.2017 o 21:14, Y Song pisze:

Thanks for comprehensive hacking report to make arm32 work! More comments below.


Thanks for taking a look!


There could be some issues in BPF backend to handle 32bit pointers.
I tried your example above, "clang -O2 -target arm -emit-llvm -S t.c"
works fine. But llc complains calling convention failure.
# llc -march=bpf -filetype=asm t.ll
LLVM ERROR: Unsupported calling convention


When I set the target to

clang -O2 -target armv7l-unknown-linux-gnueabi -emit-llvm -S t.c

it works for me the same as with the modified bcc. llc does not complain 
anymore.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] Bcc on 32-bit ARM

2017-10-04 Thread Y Song via iovisor-dev
Hi, Adrian,

Thanks for comprehensive hacking report to make arm32 work! More comments below.

On Wed, Oct 4, 2017 at 3:24 AM, Adrian Szyndela via iovisor-dev
 wrote:
> Hi,
>
> We're trying to launch bcc on an ARM 32-bit machine. We've managed to cook a
> version that works, but with a few quirks.
>
> Besides adding general setup for compiling on ARM (defining registers,
> calling convention, syscall number), we made some changes and corrections to
> type sizes (mostly ulonglong -> size_t in libbcc.py), disabled some parts of
> kernel headers and disabled 128-bit integer support.
> (https://github.com/aszyndela/bcc/commit/bed992f2842007ec5563ec75caffe6e03b6bb7c4)

I look at the change. Mostly it is okay. In the code, you mentioned,
+ // disable including kernel header files with assembly code
+ // as there is no assembly parser for ARM implemented yet
+ // same for arm64 arch: https://patchwork.kernel.org/patch/7605601/
+ cflags->push_back("-D__ASM_SYSREG_H");
+ cflags->push_back("-D__LINUX_ARM_ARCH__=7");

If you have latest llvm trunk code, assembly parser is there, you should not
need to have -D__ASM_SYSREG_H. But for earlier version of llvm compiler,
yes, you still need it.

>
> Additionally, we encountered the following quirks:
>
> - 'memset', which comes from rewriting in libbcc, is redefined in ARM kernel
> headers
> (https://elixir.free-electrons.com/linux/latest/source/arch/arm/include/asm/string.h#L29).
> Redefined code calls __memzero, which is not allowed in bpf code because it
> is an external function call, so we changed memset to __builtin_memset when
> the code is rewritten.
> (https://github.com/aszyndela/bcc/commit/6cc9adc4e6e8cc775b1d08641d0bf9c9f48ce3e2)

The change here should be okay.

>
> - return type of bpf_pseudo_fd is 'u64', and it is passed to functions that
> take pointers; the compiler generates 'trunc' in IR for this, which becomes
> shift-left + shift-right in final bpf code; and bpf verifier does not like
> it when pointers are shifted. We worked it around by changing return type of
> bpf_pseudo_fd/llvm.bpf.pseudo from 'u64' to 'void *' - both in bcc
> (https://github.com/aszyndela/bcc/commit/8eb8fb2e265b93d38c1ad53efcb7da8883a3d643)
> and llvm
> (include/llvm/IR/IntrinsicsBPF.td):
>
>  def int_bpf_pseudo : GCCBuiltin<"__builtin_bpf_pseudo">,
>   -  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
>   +  Intrinsic<[llvm_ptr_ty], [llvm_i64_ty, llvm_i64_ty]>;
>
> Obviously we don't know what we're doing, but at least it works for us.

This should work as well. essentially bpf_pseudo tries to convert a
load map_id to load_imm64 address. So its return value is indeed a ptr.

>
> - when compiling to IR, the compiler considers pointers as 32-bit (as the
> target triple is set to something like armv7l-unknown-linux-gnueabi). But,
> when compiling from IR to bpf pointers present in IR code are considered
> 64-bit. This needs more explanation probably. Here is an example -
> hand-written code, which looks (suspiciously) similar to the code generated
> by bcc for dereferences:
>
>   struct teststruct
>   {
>   const char *field;
>   };
>   int test_ko(struct pt_regs *ctx) {
> void *conn = ctx->uregs[0];
> struct teststruct *t = ctx->uregs[1];
> const char *c = ({
>  typeof(void*) _val;
>  __builtin_memset(&_val, 0, sizeof(_val));
>  bpf_probe_read(&_val, sizeof(_val), (char *)t
>+ offsetof(struct teststruct, field));
>  _val; });
> return ({
>  typeof(const char) _val;
>  __builtin_memset(&_val, 0, sizeof(_val));
>  bpf_probe_read(&_val, sizeof(_val), (char *)c);
>  _val; });
>   }
>
> The interesting part is the first __builtin_memset. After compiling to IR it
> becomes this:
>
>   %2 = alloca i8*, align 4
>   [...]
>   %8 = bitcast i8** %2 to i32*
>   store i32 0, i32* %8, align 4
>
> It explicitly sets 4 bytes on the stack to 0, because sizeof(void*) on ARM
> is 4. Thus, 'void * val' should be zeroed, right? No. Because after
> compiling to final bpf code, i8* allocated at %2 becomes 64-bit and we got
> the following code, rejected by the verifier:
>
>   5: R6=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
>   5: (63) *(u32 *)(r10 -8) = r6
>   [...]
>   10: (79) r3 = *(u64 *)(r10 -8)
>   invalid read from stack off -8+4 size 8

There could be some issues in BPF backend to handle 32bit pointers.
I tried your example above, "clang -O2 -target arm -emit-llvm -S t.c"
works fine. But llc complains calling convention failure.
# llc -march=bpf -filetype=asm t.ll
LLVM ERROR: Unsupported calling convention

I did not check out why though.

> But, if we switch __builtin_memset to explicit initialization, things get
> better:
>
>   int test_ok(struct pt_regs *ctx) {
> void *conn = ctx->uregs[0];
> struct teststruct *t = ctx->uregs[1];
> const char *c = ({
>  typeof(void*) _val = 0;
>  // instead of __builtin_mem

[iovisor-dev] Bcc on 32-bit ARM

2017-10-04 Thread Adrian Szyndela via iovisor-dev

Hi,

We're trying to launch bcc on an ARM 32-bit machine. We've managed to 
cook a version that works, but with a few quirks.


Besides adding general setup for compiling on ARM (defining registers, 
calling convention, syscall number), we made some changes and 
corrections to type sizes (mostly ulonglong -> size_t in libbcc.py), 
disabled some parts of kernel headers and disabled 128-bit integer support.

(https://github.com/aszyndela/bcc/commit/bed992f2842007ec5563ec75caffe6e03b6bb7c4)

Additionally, we encountered the following quirks:

- 'memset', which comes from rewriting in libbcc, is redefined in ARM 
kernel headers 
(https://elixir.free-electrons.com/linux/latest/source/arch/arm/include/asm/string.h#L29). 
Redefined code calls __memzero, which is not allowed in bpf code because 
it is an external function call, so we changed memset to 
__builtin_memset when the code is rewritten.

(https://github.com/aszyndela/bcc/commit/6cc9adc4e6e8cc775b1d08641d0bf9c9f48ce3e2)

- return type of bpf_pseudo_fd is 'u64', and it is passed to functions 
that take pointers; the compiler generates 'trunc' in IR for this, which 
becomes shift-left + shift-right in final bpf code; and bpf verifier 
does not like it when pointers are shifted. We worked it around by 
changing return type of bpf_pseudo_fd/llvm.bpf.pseudo from 'u64' to 
'void *' - both in bcc
(https://github.com/aszyndela/bcc/commit/8eb8fb2e265b93d38c1ad53efcb7da8883a3d643) 
and llvm

(include/llvm/IR/IntrinsicsBPF.td):

 def int_bpf_pseudo : GCCBuiltin<"__builtin_bpf_pseudo">,
  -  Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty]>;
  +  Intrinsic<[llvm_ptr_ty], [llvm_i64_ty, llvm_i64_ty]>;

Obviously we don't know what we're doing, but at least it works for us.

- when compiling to IR, the compiler considers pointers as 32-bit (as 
the target triple is set to something like 
armv7l-unknown-linux-gnueabi). But, when compiling from IR to bpf 
pointers present in IR code are considered 64-bit. This needs more 
explanation probably. Here is an example - hand-written code, which 
looks (suspiciously) similar to the code generated by bcc for dereferences:


  struct teststruct
  {
  const char *field;
  };
  int test_ko(struct pt_regs *ctx) {
void *conn = ctx->uregs[0];
struct teststruct *t = ctx->uregs[1];
const char *c = ({
 typeof(void*) _val;
 __builtin_memset(&_val, 0, sizeof(_val));
 bpf_probe_read(&_val, sizeof(_val), (char *)t
   + offsetof(struct teststruct, field));
 _val; });
return ({
 typeof(const char) _val;
 __builtin_memset(&_val, 0, sizeof(_val));
 bpf_probe_read(&_val, sizeof(_val), (char *)c);
 _val; });
  }

The interesting part is the first __builtin_memset. After compiling to 
IR it becomes this:


  %2 = alloca i8*, align 4
  [...]
  %8 = bitcast i8** %2 to i32*
  store i32 0, i32* %8, align 4

It explicitly sets 4 bytes on the stack to 0, because sizeof(void*) on 
ARM is 4. Thus, 'void * val' should be zeroed, right? No. Because after 
compiling to final bpf code, i8* allocated at %2 becomes 64-bit and we 
got the following code, rejected by the verifier:


  5: R6=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
  5: (63) *(u32 *)(r10 -8) = r6
  [...]
  10: (79) r3 = *(u64 *)(r10 -8)
  invalid read from stack off -8+4 size 8

But, if we switch __builtin_memset to explicit initialization, things 
get better:


  int test_ok(struct pt_regs *ctx) {
void *conn = ctx->uregs[0];
struct teststruct *t = ctx->uregs[1];
const char *c = ({
 typeof(void*) _val = 0;
 // instead of __builtin_memset(&_val, 0, sizeof(_val));
 bpf_probe_read(&_val, sizeof(_val), (char *)t
   + offsetof(struct teststruct, field));
 _val; });
return ({
 typeof(const char) _val;
 __builtin_memset(&_val, 0, sizeof(_val));
 bpf_probe_read(&_val, sizeof(_val), (char *)c);
 _val; });
  }

IR is:

  %2 = alloca i8*, align 4
  [...]
  store i8* null, i8** %2, align 4, !tbaa !14

And final bpf code is:

  5: R6=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
  5: (7b) *(u64 *)(r10 -8) = r6
  [...]
  10: (79) r3 = *(u64 *)(r10 -8)

Of course, as a workaround for this we explicitly use bpf_probe_read 
with explicit initialization in our tools. Not pretty at all...


With those changes and workarounds applied we are able to run selected 
bcc tools on 32-bit ARM. It's very hacky, so our motivation is to fix it 
properly, and possibly have it integrated upstream. There is also a 
couple of issues with uprobes on 32-bit ARM, but that's a different story.


What do you guys think about all of this? I would be grateful for comments.

The changes we've made are available at https://github.com/aszyndela/bcc

Thanks,
Adrian
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lis