>> So even though my implementation is slightly faster we're still
>> getting a 70% perf hit.
> interesting.
>
> can you show the assembly (objdump -d) for __asan_load8 in both variants?

My disas:

00000000004cf6a0 <__asan_load8>:
  4cf6a0:       48 89 f8                mov    %rdi,%rax
  4cf6a3:       48 c1 e8 03             shr    $0x3,%rax
  4cf6a7:       80 b8 00 80 ff 7f 00    cmpb   $0x0,0x7fff8000(%rax)
  4cf6ae:       75 08                   jne    4cf6b8 <__asan_load8+0x18>
  4cf6b0:       f3 c3                   repz retq
  4cf6b2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  4cf6b8:       e9 f3 55 fc ff          jmpq   494cb0 <__asan_report_load8>
  4cf6bd:       0f 1f 00                nopl   (%rax)

And here's the trunk version:

0000000000493b00 <__asan_load8>:
  493b00:       48 89 f8                mov    %rdi,%rax
  493b03:       48 c1 e8 03             shr    $0x3,%rax
  493b07:       80 b8 00 80 ff 7f 00    cmpb   $0x0,0x7fff8000(%rax)
  493b0e:       74 40                   je     493b50 <__asan_load8+0x50>
  493b10:       48 8b 05 51 83 26 00    mov    0x268351(%rip),%rax
   # 6fbe68 <_DYNAMIC+0x13a0>
  493b17:       48 8b 00                mov    (%rax),%rax
  493b1a:       48 85 c0                test   %rax,%rax
  493b1d:       74 09                   je     493b28 <__asan_load8+0x28>
  493b1f:       48 89 38                mov    %rdi,(%rax)
  493b22:       c3                      retq
  493b23:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  493b28:       55                      push   %rbp
  493b29:       48 89 f9                mov    %rdi,%rcx
  493b2c:       41 b9 08 00 00 00       mov    $0x8,%r9d
  493b32:       45 31 c0                xor    %r8d,%r8d
  493b35:       48 89 e5                mov    %rsp,%rbp
  493b38:       48 83 ec 10             sub    $0x10,%rsp
  493b3c:       48 8b 7d 08             mov    0x8(%rbp),%rdi
  493b40:       48 8d 55 f8             lea    -0x8(%rbp),%rdx
  493b44:       48 89 ee                mov    %rbp,%rsi
  493b47:       e8 64 e3 ff ff          callq  491eb0 <__asan_report_error>
  493b4c:       c9                      leaveq
  493b4d:       0f 1f 00                nopl   (%rax)
  493b50:       f3 c3                   repz retq
  493b52:       66 66 66 66 66 2e 0f    data32 data32 data32 data32
nopw %cs:0x0(%rax,%rax,1)
  493b59:       1f 84 00 00 00 00 00

> If you want to rely on a custom ABI, you should implement in on both
> callee and caller sides.

Sure.

> That might indeed improve the speed, but imho is not worth it here.

I still think that most of the overhead comes from ABI overheads (IMHO
x86/amd64 are particularly bad at this). E.g. removing _all_ code from
callbacks results in 16 sec runtime (so callback code overhead is only
(17.3 - 16)/(17.3 - 11) =  20%) so improving it further is practically
worthless.

-Y

-- 
You received this message because you are subscribed to the Google Groups 
"address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to address-sanitizer+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to