--- Comment #3 from Thomas Rodgers <rodgertq at gcc dot> ---
Since this latter point has come up before, I want to additionally note that
the optimization to use an atomic count of waiters per-waiter pool bucket means
that a call to notify_one/notify_all is roughly 25x faster based on my testing
than naively issuing a syscall to FUTEX_WAKE when there is no possibility of
the wake being issued to a waiter.

Running ./benchmark
Run on (20 X 4800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 0.69, 0.61, 1.30
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may
be noisy and will incur extra overhead.
Benchmark                        Time             CPU   Iterations
BM_empty_notify_checked       3.79 ns         3.79 ns    179929051
BM_empty_notify_syscall       94.1 ns         93.9 ns      7477997

For types that can use a FUTEX directly (e.g. int) there is no place to put
that extra atomic to perform this check, so we can either have the type that is
directly usable by the underlying platform be significantly more expensive to
call, or we can use the waiter count in the waiter_pool.

Reply via email to