Hi Paul, I wrote: > > The process had many threads active. > > It should use 11 threads. You didn't see 100 threads, right?
Looking at the test's code, it should use 21 threads. 1) I measured the execution time of this test, on various distributions, with various kernels, and various numbers of CPUs (in VirtualBox). In Pop OS 22.04, Linux 6.8.0 PREEMPT 1 CPU 8 CPUs 2 sec 53 sec In Pop OS 22.04, Linux 6.9.3 PREEMPT 1 CPU 2 CPUs 4 CPUs 8 CPUs 10 CPUs 2 sec 1.2 sec 2.4 sec 51 sec 103 sec real 51 sec real 103 sec user 41 sec user 121 sec sys 211 sec sys 661 sec In Ubuntu 22.04, Linux 5.15.0 1 CPU 2 CPUs 4 CPUs 8 CPUs 0.9 sec 0.5 sec 14 sec 65 sec real 65 sec user 111 sec sys 282 sec In Ubuntu 24.04, Linux 6.8.0 1 CPU 2 CPUs 4 CPUs 8 CPUs 2.1 sec 1.3 sec 3 sec 58 sec real 58 sec user 36 sec sys 270 sec So, clearly, this test takes a long time for many CPUs, and it is not specific to a specific kernel version. 2) Gnulib has various implementations of locks, and the unit tests are all similar. So I compared, on Ubuntu 24.04 with 8 CPUs: test-pthread-mutex 0.6 sec test-pthread-rwlock 48 sec test-lock 4 sec test-rwlock1 0.2..0.4 sec test-mtx 0.6 sec So, the Gnulib rwlocks are fast, but the glibc rwlocks are slow. What's the difference? The difference is that the Gnulib rwlocks test whether the rwlocks prefer writers (at configure time: m4/pthread_rwlock_rdlock.m4) and, if not, uses a different implementation. On glibc, the Gnulib rwlock use the libc's functions, just with a different initializer: PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP instead of PTHREAD_RWLOCK_INITIALIZER. And indeed, when I modify the test-pthread-rwlock to use PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP instead of PTHREAD_RWLOCK_INITIALIZER it executes fast: test-pthread-rwlock modified 0.3 sec 3) This topic has been discussed in the glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=13701 where I have raised my voice for a writer-preferring implementation. It was turned down by Torvald Riegel with two arguments * That a writer-preferring implementation would go against Riegel's new "scalable" implementation of rwlocks [1]. * That implementation of handling of different priorities was difficult and therefore, nothing should be changed also for the case of same priority (as here). [2] The argument [1] does not make sense to me in view of the timings above. The argument [2] never made sense (to me at least). 4) The time to login and shutdown (i.e. more precisely from boot to the login screen, and from shutdown command to VM termination) is pretty slow with 8 or 10 CPUs, but not with few CPUs. It could be caused by this rwlock problem, or by the kernel's scheduler, I don't know. So, in summary, it's a glibc bug that has been closed as "WORKSFORME" and will never be fixed [3]. In the test-pthread-rwlock test, we cannot just use PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP, because the *purpose* of the test is to check the behaviour of the rwlocks with the POSIX-specified API, not with some alternative API. Bruno [1] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c7 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c3 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c14