On 14/05/16 06:34, Emilio G. Cota wrote:
> +static inline void qemu_spin_lock(QemuSpin *spin)
> +{
> +    while (atomic_test_and_set_acquire(&spin->value)) {

A possible optimization might be using unlikely() here, copmare:

spin.o:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <spin_lock__no_hint>:
   0:    52800022     mov    w2, #0x1                       // #1
   4:    885ffc01     ldaxr    w1, [x0]
   8:    88037c02     stxr    w3, w2, [x0]
   c:    35ffffc3     cbnz    w3, 4 <spin_lock__no_hint+0x4>
  10:    340000a1     cbz    w1, 24 <spin_lock__no_hint+0x24>
  14:    b9400001     ldr    w1, [x0]
  18:    34ffff61     cbz    w1, 4 <spin_lock__no_hint+0x4>
  1c:    d503203f     yield
  20:    17fffffd     b    14 <spin_lock__no_hint+0x14>
  24:    d65f03c0     ret

0000000000000028 <spin_lock__hint>:
  28:    52800022     mov    w2, #0x1                       // #1
  2c:    885ffc01     ldaxr    w1, [x0]
  30:    88037c02     stxr    w3, w2, [x0]
  34:    35ffffc3     cbnz    w3, 2c <spin_lock__hint+0x4>
  38:    35000061     cbnz    w1, 44 <spin_lock__hint+0x1c>
  3c:    d65f03c0     ret
  40:    d503203f     yield
  44:    b9400001     ldr    w1, [x0]
  48:    35ffffc1     cbnz    w1, 40 <spin_lock__hint+0x18>
  4c:    17fffff8     b    2c <spin_lock__hint+0x4>

spin_lock__hint(), the one where unlikely() used, gives a bit more
CPU-pipeline-friendly fast-path.

> +        while (atomic_read(&spin->value)) {
> +            cpu_relax();
> +        }
> +    }
> +}
> +
> +static inline int qemu_spin_trylock(QemuSpin *spin)
> +{
> +    if (atomic_test_and_set_acquire(&spin->value)) {
> +        return -EBUSY;
> +    }
> +    return 0;
> +}

Here we could also benefit from unlikely(), I think.

Kind regards,
Sergey

Reply via email to