On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > The futex system call supports waiting for an absolute time if
> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > benefits:
> > 
> > 1. The call to gettimeofday is not required in order to calculate a
> >   relative timeout.
> > 
> > 2. If someone changes the system clock during the wait then the futex
> >   timeout will correctly expire earlier or later.  Currently that only
> >   happens if the clock is changed prior to the call to gettimeofday.
> > 
> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > that the code still works correctly with earlier kernel versions, an ENOSYS
> > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > being set.  This flag is used to avoid the unnecessary unsupported futex
> > call in the future and to fall back to the previous gettimeofday and
> > relative time implementation.
> > 
> > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > glibc-2.10 back in 2009.  See
> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > 
> > The futex_clock_realtime_unavailable flag is accessed using
> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > risk discovering that it doesn't work and, if so, both set the flag.
> > 
> > [1] This is how glibc's nptl-init.c determines whether these flags are
> >    supported.
> > 
> >     * libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> >     futex flags.  Add futex_clock_realtime_unavailable flag to store
> >     result of trying to use
> >     FUTEX_CLOCK_REALTIME. 
> > (__atomic_futex_unsigned_base::_M_futex_wait_until):
> >     Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> >     fall back to using gettimeofday and FUTEX_WAIT if that's not
> >     supported.
> 
> Mike,
> 
> I've been doing some performance comparisons and this patch seems to
> make quite a big difference to code that polls a future by calling
> fut.wait_until(t) using any t < now() as the timeout. For example,
> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> UNIX epoch.
> 
> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> commented out) I see that polling take < 100ns. With the change, it
> takes 3000ns or more.
> 
> Now this is still far better than polling using fut.wait_for(0s) which
> takes around 50000ns due to the clock_gettime call, but I'm about to
> fix that.
> 
> I'm not sure how important it is for wait_until(past) to be fast, but
> the difference from 100ns to 3000ns seems significant. Do you see the
> same kind of numbers? Is this just a property of the futex wait with
> an absolute time?
> 
> N.B. using wait_until(system_clock::time_point::min()) or any other
> time before the epoch doesn't work. The futex syscall returns EINVAL
> which we don't check for. I'm about to fix that too.

I see similar behaviour. I suppose this is because the
gettimeofday/clock_gettime system calls are in the VDSO and therefore
usually much cheaper to call than the real system call SYS_futex.

If rather than bailing out early when the relative timeout is negative, I
call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
wait_until call takes about ten times longer than when using the absolute
SYS_futex. I can't really explain that.

Calling these functions with a time in the past is probably quite common if
you calculate a single timeout for several operations in sequence. What's
less clear is whether the performance matters that much when the return
value indicates a timeout anyway.

If gettimeofday/clock_gettime are cheap enough then I suppose we can call
them even in the absolute timeout case (losing benefit 1 above, which
appears to not really exist) to get the improved performance for timeouts
in the past whilst retaining the correct behaviour if the clock is warped
that this patch addressed (benefit 2 above.)

I'll try to come up with some standalone test cases with results for
further discussion. I suspect that the glibc people will be interested too.

Thanks for investigating this.

Mike.

Reply via email to