To further illustrate what I meant by the impact of compiler optimizations, i ran the following quick experiment:
```
// test.cc
#include <tvm/runtime/c_runtime_api.h>
// implement the function using PackedCFunc calling convention
inline int PackedCFunc(TVMValue* args, int* type_codes, int num_args,
TVMValue* out_ret_value, int* out_ret_tcode,
void* resource_handle) {
int v0 = args[0].v_int64;
void* ptr = args[1].v_handle;
out_ret_tcode[0] = kTVMArgInt;
out_ret_value[0].v_int64 = v0 + ((int*)ptr)[0];
return 0;
}
// return x + ptr[0];
extern "C" int AddViaPackedCFunc(int x, int* ptr) {
TVMValue args[2];
int type_codes[2];
TVMValue out_ret_value;
int out_ret_tcode;
args[0].v_int64 = x;
args[1].v_handle = ptr;
type_codes[0] = kTVMArgInt;
type_codes[1] = kTVMOpaqueHandle;
PackedCFunc(args, type_codes, 2, &out_ret_value, &out_ret_tcode, nullptr);
return out_ret_value.v_int64;
}
```
### Result of Clang
Run command
```bash
clang-10 -O2 -S -emit-llvm -I /path/to/tvm/3rdparty/dlpack/include -I
/path/to/tvm/include -o test.ll test.cc
cat test.ll
```
Gives the following code(meta data removed)
```ll
; Function Attrs: nounwind readonly uwtable
define dso_local i32 @AddViaPackedCFunc(i32 %0, i32* %1) local_unnamed_addr #0 {
%3 = load i32, i32* %1, align 4, !tbaa !2
%4 = add nsw i32 %3, %0
ret i32 %4
}
```
### Result of GCC
```bash
gcc -O2 -S -I /path/to/tvm/3rdparty/dlpack/include -I /path/to/tvm/include -o
test.s test.cc
cat test.s
```
```
.file "test.cc"
.text
.p2align 4,,15
.globl AddViaPackedCFunc
.type AddViaPackedCFunc, @function
AddViaPackedCFunc:
.LFB1:
.cfi_startproc
movl (%rsi), %eax
addl %edi, %eax
ret
.cfi_endproc
.LFE1:
.size AddViaPackedCFunc, .-AddViaPackedCFunc
.ident "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",@progbits
```
### Discussions
As we can see this is esssentially equivalent to the direct C calling
```c
int Add(int x, int *ptr) {
return x + ptr[0]
}
```
To understand what is happening under the hood, the following optimization are
relevant:
- Inlining that inlines the call
- Mem2reg that promote the head store/load to register operations
- Deadcode elimination that eliminates the unused type id
- Reasoning around in32 passing via int64, `cast<int32>(cast<int64>(x)) = x`
when x is i32
---
[Visit Topic](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206/32)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/28244e14ae46de6ec7a2136a9178d73ae9c670628958361d7011b766c619649a).
