On 14/05/16 06:34, Emilio G. Cota wrote: > +static inline void qemu_spin_lock(QemuSpin *spin) > +{ > + while (atomic_test_and_set_acquire(&spin->value)) {
A possible optimization might be using unlikely() here, copmare: spin.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <spin_lock__no_hint>: 0: 52800022 mov w2, #0x1 // #1 4: 885ffc01 ldaxr w1, [x0] 8: 88037c02 stxr w3, w2, [x0] c: 35ffffc3 cbnz w3, 4 <spin_lock__no_hint+0x4> 10: 340000a1 cbz w1, 24 <spin_lock__no_hint+0x24> 14: b9400001 ldr w1, [x0] 18: 34ffff61 cbz w1, 4 <spin_lock__no_hint+0x4> 1c: d503203f yield 20: 17fffffd b 14 <spin_lock__no_hint+0x14> 24: d65f03c0 ret 0000000000000028 <spin_lock__hint>: 28: 52800022 mov w2, #0x1 // #1 2c: 885ffc01 ldaxr w1, [x0] 30: 88037c02 stxr w3, w2, [x0] 34: 35ffffc3 cbnz w3, 2c <spin_lock__hint+0x4> 38: 35000061 cbnz w1, 44 <spin_lock__hint+0x1c> 3c: d65f03c0 ret 40: d503203f yield 44: b9400001 ldr w1, [x0] 48: 35ffffc1 cbnz w1, 40 <spin_lock__hint+0x18> 4c: 17fffff8 b 2c <spin_lock__hint+0x4> spin_lock__hint(), the one where unlikely() used, gives a bit more CPU-pipeline-friendly fast-path. > + while (atomic_read(&spin->value)) { > + cpu_relax(); > + } > + } > +} > + > +static inline int qemu_spin_trylock(QemuSpin *spin) > +{ > + if (atomic_test_and_set_acquire(&spin->value)) { > + return -EBUSY; > + } > + return 0; > +} Here we could also benefit from unlikely(), I think. Kind regards, Sergey