Re: locking/atomic: Introduce atomic_try_cmpxchg()

Linus Torvalds Fri, 24 Mar 2017 12:19:05 -0700

On Fri, Mar 24, 2017 at 11:45 AM, Andy Lutomirski <l...@amacapital.net> wrote:
>
> Is there some hack like if __builtin_is_unescaped(*val) *val = old;
> that would work?


See my recent email suggesting a completely different interface, which
avoids this problem.

My interface generates:

0000000000000000 <T_refcount_inc>:
   0: 8b 07                 mov    (%rdi),%eax
   2: 83 f8 ff             cmp    $0xffffffff,%eax
   5: 74 12                 je     19 <T_refcount_inc+0x19>
   7: 85 c0                 test   %eax,%eax
   9: 74 0a                 je     15 <T_refcount_inc+0x15>
   b: 8d 50 01             lea    0x1(%rax),%edx
   e: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
  12: 75 ee                 jne    2 <T_refcount_inc+0x2>
  14: c3                   retq
  15: 31 c0                 xor    %eax,%eax
  17: 0f 0b                 ud2
  19: c3                   retq

for PeterZ's test-case, which seems optimal.

Of course, PeterZ used -Os, which isn't actually very natural for the
kernel. Using -O2 I get something else. It turns out that my macro
should use

        if (likely(__txchg_success)) goto success_label;

(that "likely()" is criticial) to make gcc not try to optimize for the
looping case.

So with that "likely()" fixed, with -O2 I get:

0000000000000000 <T_refcount_inc>:
   0: 8b 07                 mov    (%rdi),%eax
   2: 83 f8 ff             cmp    $0xffffffff,%eax
   5: 74 0d                 je     14 <T_refcount_inc+0x14>
   7: 85 c0                 test   %eax,%eax
   9: 74 12                 je     1d <T_refcount_inc+0x1d>
   b: 8d 50 01             lea    0x1(%rax),%edx
   e: f0 0f b1 17           lock cmpxchg %edx,(%rdi)
  12: 75 02                 jne    16 <T_refcount_inc+0x16>
  14: f3 c3                 repz retq
  16: 83 f8 ff             cmp    $0xffffffff,%eax
  19: 75 ec                 jne    7 <T_refcount_inc+0x7>
  1b: f3 c3                 repz retq
  1d: 31 c0                 xor    %eax,%eax
  1f: 0f 0b                 ud2
  21: c3                   retq

which again looks pretty optimal (it did indeed actually generate
bigger but potentially higher-performance code by making the good case
be a fallthrough, and the unlikely case be a _forward_ jump that will
be predicted not-taken in the absense of other rpediction information.

(Of course, this also depends on the exact behavior that PeterZ's code
had, namely an exception for use-after-free, but a silent saturation)

            Linus

Re: locking/atomic: Introduce atomic_try_cmpxchg()

Reply via email to