https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277

--- Comment #16 from Thomas Rodgers <rodgertq at gcc dot gnu.org> ---
(In reply to Thiago Macieira from comment #15)
> > >  5) std::barrier implementation also uses a type that futex(2) can't 
> > > handle
> 
> > barrier still uses a 1-byte enum for the atomic waits.
> 
> That can only now be fixed for libstdc++.so.7, then.

The original implementation came from Olvier Giroux and is part of libc++. The
libc++ implementation also does not use a type that futex or ulock_wait/wake
(uint64_t) can handle. I have discussed this in the past with Olivier, the
choice of char was deliberate on his part. The implementation has been tested
on a number of platforms (including time on ORNL's Summit). The following
comment, preserved from libc++ should be considered carefully before any change
here -

" 2. A great deal of attention has been paid to avoid cache line thrashing
    by flattening the tree structure into cache-line sized arrays, that
    are indexed in an efficient way."

It is my opinion that the bar for making a change here is high. I would need to
see benchmark numbers that illustrate the performance differences under various
contention scenarios vs impact on caches by being able to fit the entire tree
in a single cache line using char, vs four or eight cache lines using the type
favored by futex or ulock_wait/wake.

Reply via email to