Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning
On 12/8/20 5:07 AM, Peter Zijlstra wrote: On Fri, Nov 20, 2020 at 11:14:16PM -0500, Waiman Long wrote: @@ -1032,40 +901,16 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count) * * We can take the read lock directly without doing * rwsem_optimistic_spin() if the conditions are right. This comment no longer makes sense.. You are right. I forgot to take that out. -* Also wake up other readers if it is the first reader. */ - if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) && - rwsem_no_spinners(sem)) { + if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) { rwsem_set_reader_owned(sem); lockevent_inc(rwsem_rlock_steal); - if (rcnt == 1) - goto wake_readers; - return sem; - } - /* -* Save the current read-owner of rwsem, if available, and the -* reader nonspinnable bit. -*/ - waiter.last_rowner = owner; - if (!(waiter.last_rowner & RWSEM_READER_OWNED)) - waiter.last_rowner &= RWSEM_RD_NONSPINNABLE; - - if (!rwsem_can_spin_on_owner(sem, RWSEM_RD_NONSPINNABLE)) - goto queue; - - /* -* Undo read bias from down_read() and do optimistic spinning. -*/ - atomic_long_add(-RWSEM_READER_BIAS, >count); - adjustment = 0; - if (rwsem_optimistic_spin(sem, false)) { since we're removing the optimistic spinning entirely on the read side. Also, I was looking at skipping patch #4, which mucks with the reader wakeup logic, and afaict this removal doesn't really depend on it. Or am I missing something? That is true. Patch 4 isn't essential for this series. So if you are general OK with the current patchset, I can send out v3 that remove patch 4 and make the your suggested change above. Cheers, Longman
Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning
On Fri, Nov 20, 2020 at 11:14:16PM -0500, Waiman Long wrote: > @@ -1032,40 +901,16 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int > state, long count) >* >* We can take the read lock directly without doing >* rwsem_optimistic_spin() if the conditions are right. This comment no longer makes sense.. > - * Also wake up other readers if it is the first reader. >*/ > - if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) && > - rwsem_no_spinners(sem)) { > + if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) { > rwsem_set_reader_owned(sem); > lockevent_inc(rwsem_rlock_steal); > - if (rcnt == 1) > - goto wake_readers; > - return sem; > - } > > - /* > - * Save the current read-owner of rwsem, if available, and the > - * reader nonspinnable bit. > - */ > - waiter.last_rowner = owner; > - if (!(waiter.last_rowner & RWSEM_READER_OWNED)) > - waiter.last_rowner &= RWSEM_RD_NONSPINNABLE; > - > - if (!rwsem_can_spin_on_owner(sem, RWSEM_RD_NONSPINNABLE)) > - goto queue; > - > - /* > - * Undo read bias from down_read() and do optimistic spinning. > - */ > - atomic_long_add(-RWSEM_READER_BIAS, >count); > - adjustment = 0; > - if (rwsem_optimistic_spin(sem, false)) { since we're removing the optimistic spinning entirely on the read side. Also, I was looking at skipping patch #4, which mucks with the reader wakeup logic, and afaict this removal doesn't really depend on it. Or am I missing something?
Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning
On Fri, 20 Nov 2020, Waiman Long wrote: Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It also improves the chance that a reader can get the lock as writer optimistic spinning disproportionally favors writers much more than readers. Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers in wait queue"), all the waiting readers are woken up so that they can all get the read lock and run in parallel. When the number of contending readers is large, allowing reader optimistic spinning will likely cause reader fragmentation where multiple smaller groups of readers can get the read lock in a sequential manner separated by writers. That reduces reader parallelism. One possible way to address that drawback is to limit the number of readers (preferably one) that can do optimistic spinning. These readers act as representatives of all the waiting readers in the wait queue as they will wake up all those waiting readers once they get the lock. Alternatively, as reader optimistic lock stealing has already enhanced fairness to readers, it may be easier to just remove reader optimistic spinning and simplifying the optimistic spinning code as a result. Performance measurements (locking throughput kops/s) using a locking microbenchmark with 50/50 reader/writer distribution and turbo-boost disabled was done on a 2-socket Cascade Lake system (48-core 96-thread) to see the impacts of these changes: 1) Vanilla - 5.10-rc3 kernel 2) Before - 5.10-rc3 kernel with previous patches in this series 2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch 3) no-rspin- 5.10-rc3 kernel with reader spinning disabled # of threads CS Load Vanilla Before limit-rspin no-rspin --- --- -- --- 21 5,1855,662 5,214 5,077 41 5,1074,983 5,188 4,760 81 4,7824,564 4,720 4,628 161 4,6804,053 4,567 3,402 321 4,2991,115 1,118 1,098 641 3,218 983 1,001 957 961 1,938 944957 930 2 20 2,0082,128 2,264 1,665 4 20 1,3901,033 1,046 1,101 8 20 1,4721,155 1,098 1,213 16 20 1,3321,077 1,089 1,122 32 20967 914917 980 64 20787 874891 858 96 20730 836847 844 2 100372 356360 355 4 100492 425434 392 8 100533 537529 538 16 100548 572568 598 32 100499 520527 537 64 100466 517526 512 96 100406 497506 509 The column "CS Load" represents the number of pause instructions issued in the locking critical section. A CS load of 1 is extremely short and is not likey in real situations. A load of 20 (moderate) and 100 (long) are more realistic. It can be seen that the previous patches in this series have reduced performance in general except in highly contended cases with moderate or long critical sections that performance improves a bit. This change is mostly caused by the "Prevent potential lock starvation" patch that reduce reader optimistic spinning and hence reduce reader fragmentation. The patch that further limit reader optimistic spinning doesn't seem to have too much impact on overall performance as shown in the benchmark data. The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, but comparable or slightly better performance on with heavier contention. This patch just removes reader optimistic spinning for now. As readers are not going to do optimistic spinning anymore, we don't need to consider if the OSQ is empty or not when doing lock stealing. Signed-off-by: Waiman Long Reviewed-by: Davidlohr Bueso
[PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning
Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It also improves the chance that a reader can get the lock as writer optimistic spinning disproportionally favors writers much more than readers. Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers in wait queue"), all the waiting readers are woken up so that they can all get the read lock and run in parallel. When the number of contending readers is large, allowing reader optimistic spinning will likely cause reader fragmentation where multiple smaller groups of readers can get the read lock in a sequential manner separated by writers. That reduces reader parallelism. One possible way to address that drawback is to limit the number of readers (preferably one) that can do optimistic spinning. These readers act as representatives of all the waiting readers in the wait queue as they will wake up all those waiting readers once they get the lock. Alternatively, as reader optimistic lock stealing has already enhanced fairness to readers, it may be easier to just remove reader optimistic spinning and simplifying the optimistic spinning code as a result. Performance measurements (locking throughput kops/s) using a locking microbenchmark with 50/50 reader/writer distribution and turbo-boost disabled was done on a 2-socket Cascade Lake system (48-core 96-thread) to see the impacts of these changes: 1) Vanilla - 5.10-rc3 kernel 2) Before - 5.10-rc3 kernel with previous patches in this series 2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch 3) no-rspin- 5.10-rc3 kernel with reader spinning disabled # of threads CS Load Vanilla Before limit-rspin no-rspin --- --- -- --- 21 5,1855,662 5,214 5,077 41 5,1074,983 5,188 4,760 81 4,7824,564 4,720 4,628 161 4,6804,053 4,567 3,402 321 4,2991,115 1,118 1,098 641 3,218 983 1,001 957 961 1,938 944957 930 2 20 2,0082,128 2,264 1,665 4 20 1,3901,033 1,046 1,101 8 20 1,4721,155 1,098 1,213 16 20 1,3321,077 1,089 1,122 32 20967 914917 980 64 20787 874891 858 96 20730 836847 844 2 100372 356360 355 4 100492 425434 392 8 100533 537529 538 16 100548 572568 598 32 100499 520527 537 64 100466 517526 512 96 100406 497506 509 The column "CS Load" represents the number of pause instructions issued in the locking critical section. A CS load of 1 is extremely short and is not likey in real situations. A load of 20 (moderate) and 100 (long) are more realistic. It can be seen that the previous patches in this series have reduced performance in general except in highly contended cases with moderate or long critical sections that performance improves a bit. This change is mostly caused by the "Prevent potential lock starvation" patch that reduce reader optimistic spinning and hence reduce reader fragmentation. The patch that further limit reader optimistic spinning doesn't seem to have too much impact on overall performance as shown in the benchmark data. The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, but comparable or slightly better performance on with heavier contention. This patch just removes reader optimistic spinning for now. As readers are not going to do optimistic spinning anymore, we don't need to consider if the OSQ is empty or not when doing lock stealing. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 5 +- kernel/locking/rwsem.c| 278 +- 2 files changed, 48 insertions(+), 235 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h index 270a0d351932..97fb6f3f840a 100644 --- a/kernel/locking/lock_events_list.h +++ b/kernel/locking/lock_events_list.h @@ -56,12 +56,9 @@ LOCK_EVENT(rwsem_sleep_reader) /* # of reader sleeps */ LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps */ LOCK_EVENT(rwsem_wake_reader) /* # of reader wakeups */