On 04/12/2019 02:05 PM, Waiman Long wrote: > On 04/12/2019 12:41 PM, Ingo Molnar wrote: >> >> So beyond the primary constraint of PeterZ OK-ing it all, there's also >> these two scalability regression reports from the ktest bot: >> >> [locking/rwsem] 1b94536f2d: stress-ng.bad-altstack.ops_per_sec -32.7% >> regression > A regression due to the lock handoff patch is kind of expected, but I > will into why there is such a large drop.
I don't have a high core count system on hand. I run the stress-ng test on a 2-socket 40-core 80-thread Skylake system: Kernels: 1) Before lock handoff patch 2) After lock handoff patch 3) After wake all reader patch 4) After reader spin on writer patch 5) After writer spin on reader patch Tests K1 K2 K3 K4 K5 ----- -- -- -- -- -- bad-altstack 39928 35807 36422 40062 40747 stackmmap 187 365 435 255 198 vm 309589 296097 262045 281974 310439 vm-segv 113776 114058 112318 115422 110550 Here, the bad-altstack dropped 10% after the lock handoff patch. However, the performance is recovered with later patches. The stackmmap results don't look quite right as the numbers are much smaller than the numbers in the report. I will rerun the tests again when I acquire a high core count system. Anyway, the lock handoff patch is expected to reduce throughput under heavy contention. >> [locking/rwsem] adc32e8877: will-it-scale.per_thread_ops -21.0% regression > Will look into that also. I can reproduce the regression on the same skylake system. The results of the page_fault1 will-it-scale test are as follows: Threads K2 K3 K4 K5 ------- -- -- -- -- 20 5549772 5550332 5463961 5400064 40 9540445 10286071 9705062 7706082 60 8187245 8212307 7777247 6647705 89 8390758 9619271 9019454 7124407 So the wake-all-reader patch is good for this benchmark. The performance was reduced a bit with the reader-spin-on-writer patch. It got even worse with the writer-spin-on-reader patch. I looked at the perf output, rwsem contention accounted for less than 1% of the total cpu cycles. So I believe the regression was caused by the behavior change introduced by the two reader optimistic spinning patches. These patch will make writer less preferred than before. I think the performance of this microbenchmark may be more dependent on writer performance. Looking at the lock event counts for K5: rwsem_opt_fail=253647 rwsem_opt_nospin=8776 rwsem_opt_rlock=259941 rwsem_opt_wlock=2543 rwsem_rlock=237747 rwsem_rlock_fail=0 rwsem_rlock_fast=0 rwsem_rlock_handoff=0 rwsem_sleep_reader=237747 rwsem_sleep_writer=23098 rwsem_wake_reader=6033 rwsem_wake_writer=47032 rwsem_wlock=15890 rwsem_wlock_fail=10 rwsem_wlock_handoff=3991 For K4, it was rwsem_opt_fail=479626 rwsem_opt_rlock=8877 rwsem_opt_wlock=114 rwsem_rlock=453874 rwsem_rlock_fail=0 rwsem_rlock_fast=1234 rwsem_rlock_handoff=0 rwsem_sleep_reader=453058 rwsem_sleep_writer=25836 rwsem_wake_reader=11054 rwsem_wake_writer=71568 rwsem_wlock=24515 rwsem_wlock_fail=3 rwsem_wlock_handoff=5245 It can be seen that a lot more readers got the lock via optimistic spinning. One possibility is that reader optimistic spinning causes readers to spread out into more lock acquisition groups than without. The K3 results show that grouping more readers into one lock acquisition group help to improve performance for this microbenchmark. I will need to run more tests to find out the root cause of this regression. It is not an easy problem to solve. In the mean time, I am going to send out an updated patchset tomorrow so that Peter can review the patch again when he is available. Cheers, Longman