Hi Paul,

I wrote:
> > The process had many threads active.
> 
> It should use 11 threads. You didn't see 100 threads, right?

Looking at the test's code, it should use 21 threads.


1) I measured the execution time of this test, on various distributions,
with various kernels, and various numbers of CPUs (in VirtualBox).

In Pop OS 22.04, Linux 6.8.0 PREEMPT

       1 CPU                      8 CPUs
       2 sec                     53 sec

In Pop OS 22.04, Linux 6.9.3 PREEMPT

       1 CPU   2 CPUs   4 CPUs    8 CPUs        10 CPUs
       2 sec   1.2 sec  2.4 sec  51 sec        103 sec
                                 real  51 sec  real 103 sec
                                 user  41 sec  user 121 sec
                                 sys  211 sec  sys  661 sec

In Ubuntu 22.04, Linux 5.15.0

       1 CPU    2 CPUs    4 CPUs    8 CPUs
       0.9 sec  0.5 sec  14 sec    65 sec
                                   real  65 sec
                                   user 111 sec
                                   sys  282 sec

In Ubuntu 24.04, Linux 6.8.0

       1 CPU    2 CPUs    4 CPUs    8 CPUs
       2.1 sec  1.3 sec   3 sec    58 sec
                                   real  58 sec
                                   user  36 sec
                                   sys  270 sec

So, clearly, this test takes a long time for many CPUs, and it is not
specific to a specific kernel version.


2) Gnulib has various implementations of locks, and the unit tests are
all similar. So I compared, on Ubuntu 24.04 with 8 CPUs:

test-pthread-mutex    0.6 sec
test-pthread-rwlock  48 sec
test-lock             4 sec
test-rwlock1          0.2..0.4 sec
test-mtx              0.6 sec

So, the Gnulib rwlocks are fast, but the glibc rwlocks are slow. What's
the difference?

The difference is that the Gnulib rwlocks test whether the rwlocks prefer
writers (at configure time: m4/pthread_rwlock_rdlock.m4) and, if not,
uses a different implementation. On glibc, the Gnulib rwlock use the
libc's functions, just with a different initializer:
  PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
instead of
  PTHREAD_RWLOCK_INITIALIZER.

And indeed, when I modify the test-pthread-rwlock to use
  PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
instead of
  PTHREAD_RWLOCK_INITIALIZER
it executes fast:
test-pthread-rwlock modified     0.3 sec


3) This topic has been discussed in the glibc bug
https://sourceware.org/bugzilla/show_bug.cgi?id=13701
where I have raised my voice for a writer-preferring implementation.

It was turned down by Torvald Riegel with two arguments
  * That a writer-preferring implementation would go against Riegel's
    new "scalable" implementation of rwlocks [1].
  * That implementation of handling of different priorities was difficult
    and therefore, nothing should be changed also for the case of same
    priority (as here). [2]

The argument [1] does not make sense to me in view of the timings above.
The argument [2] never made sense (to me at least).


4) The time to login and shutdown (i.e. more precisely from boot to the
login screen, and from shutdown command to VM termination) is pretty slow
with 8 or 10 CPUs, but not with few CPUs. It could be caused by this rwlock
problem, or by the kernel's scheduler, I don't know.


So, in summary, it's a glibc bug that has been closed as "WORKSFORME" and
will never be fixed [3].

In the test-pthread-rwlock test, we cannot just use
PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP, because the *purpose* of
the test is to check the behaviour of the rwlocks with the POSIX-specified
API, not with some alternative API.

Bruno

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c7
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c3
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c14




Reply via email to