https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122878

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #2)
> There's another problem with our semaphore which doesn't directly affect
> this benchmark.
> 
> semaphore::try_acquire_for will keep trying to acquire the semaphore in a
> loop for the specific duration, but we don't reduce the duration on each
> loop. So a call to try_acquire_to(10ms) might wait for 9ms then get woken
> (either because the semaphore became available, or the wait was interrupted
> by a signal, or just wake spuriously), but then fail to acquire the
> semaphore (because another thread acquired it, or it wasn't available at all
> and we woke spuriously), and then wait for another 10ms. This will result in
> a total wait of 19ms instead of the requested 10ms. When we loop we should
> reduce the time for the next wait by the elapsed duration.

The simplest fix for this would be to implement _M_try_acquire_for as:

  return _M_try_acquire_until(__detail::__wait_clock_t::now() + __rtime);

However that would lose the current property that calling
_M_try_acquire_for(0s) (or any other zero-valued duration) will set the
__spin_only wait flag, and so just do the 16-iteration spinloop and then return
without doing the futex syscall.

So maybe we want _M_try_acquire_for to do:

  if (__rtime > chrono::nanoseconds(100)) // estimate of syscall overhead
    return _M_try_acquire_until(__detail::__wait_clock_t::now() + __rtime);
  else
    // current implementation but using __rtime.zero() as the timeout

This would solve the problem of __rtime not accounting for the time elapsed on
previous iterations (by deferring to _M_try_acquire_until with an absolute
timeout).

It would also retain the existing behaviour of just spinning a few times and
not blocking when a duration of zero is used.

It would extend that behaviour to all timeouts of less than 100ns (or a better
magic number), on the assumption that a timeout of 10ns will actually end up
waiting for far longer than that because of the overhead of making the syscall.
So just turn all "small" timeouts into a spinloop and don't block on the futex.

Reply via email to