On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski <l...@amacapital.net> wrote: > > Is there some hack like if __builtin_is_unescaped(*val) *val = old; > that would work?
See my recent email suggesting a completely different interface, which avoids this problem. My interface generates: 0000000000000000 <T_refcount_inc>: 0: 8b 07 mov (%rdi),%eax 2: 83 f8 ff cmp $0xffffffff,%eax 5: 74 12 je 19 <T_refcount_inc+0x19> 7: 85 c0 test %eax,%eax 9: 74 0a je 15 <T_refcount_inc+0x15> b: 8d 50 01 lea 0x1(%rax),%edx e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 12: 75 ee jne 2 <T_refcount_inc+0x2> 14: c3 retq 15: 31 c0 xor %eax,%eax 17: 0f 0b ud2 19: c3 retq for PeterZ's test-case, which seems optimal. Of course, PeterZ used -Os, which isn't actually very natural for the kernel. Using -O2 I get something else. It turns out that my macro should use if (likely(__txchg_success)) goto success_label; (that "likely()" is criticial) to make gcc not try to optimize for the looping case. So with that "likely()" fixed, with -O2 I get: 0000000000000000 <T_refcount_inc>: 0: 8b 07 mov (%rdi),%eax 2: 83 f8 ff cmp $0xffffffff,%eax 5: 74 0d je 14 <T_refcount_inc+0x14> 7: 85 c0 test %eax,%eax 9: 74 12 je 1d <T_refcount_inc+0x1d> b: 8d 50 01 lea 0x1(%rax),%edx e: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 12: 75 02 jne 16 <T_refcount_inc+0x16> 14: f3 c3 repz retq 16: 83 f8 ff cmp $0xffffffff,%eax 19: 75 ec jne 7 <T_refcount_inc+0x7> 1b: f3 c3 repz retq 1d: 31 c0 xor %eax,%eax 1f: 0f 0b ud2 21: c3 retq which again looks pretty optimal (it did indeed actually generate bigger but potentially higher-performance code by making the good case be a fallthrough, and the unlikely case be a _forward_ jump that will be predicted not-taken in the absense of other rpediction information. (Of course, this also depends on the exact behavior that PeterZ's code had, namely an exception for use-after-free, but a silent saturation) Linus