rwsem: optimize rwsem_wakeup()

Waiman Long Thu, 30 Apr 2015 14:14:38 -0700

 v3->v4:
   - Break out the active writer check into a separate patch and move
     it from __rwsem_do_wake() to rwsem_wake().
   - Use smp_rmb() instead of the incorrect smp_mb__after_atomic() as
     suggested by PeterZ.


 v2->v3:
   - Fix errors in commit log.

 v1->v2:
   - Add a memory barrier before calling spin_trylock for proper memory
     ordering.

This patch set aims to reduce spinlock contention in the wait_lock
due to excessive activity in the rwsem_wake code path. This, in turn,
reduces up_write/up_read latency and improve performance when the
rwsem is heavily contended.

On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's
high_systime workload (1000 users) on a vanilla 4.0 kernel produced
the following perf profile for spinlock contention:

  9.23%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
              |--97.39%-- rwsem_wake
              |--0.69%-- try_to_wake_up
              |--0.52%-- release_pages
               --1.40%-- [...]

  1.70%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irq
              |--96.61%-- rwsem_down_write_failed
              |--2.03%-- __schedule
              |--0.50%-- run_timer_softirq
               --0.86%-- [...]

Here the contended rwsems are the mmap_sem (mm_struct) and the
i_mmap_rwsem (address_space) with mostly write locking.  With a
patched 4.0 kernel, the perf profile became:

  1.87%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
              |--87.64%-- rwsem_wake
              |--2.80%-- release_pages
              |--2.56%-- try_to_wake_up
              |--1.10%-- __wake_up
              |--1.06%-- pagevec_lru_move_fn
              |--0.93%-- prepare_to_wait_exclusive
              |--0.71%-- free_pid
              |--0.58%-- get_page_from_freelist
              |--0.57%-- add_device_randomness
               --2.04%-- [...]

  0.80%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irq
              |--92.49%-- rwsem_down_write_failed
              |--4.24%-- __schedule
              |--1.37%-- run_timer_softirq
               --1.91%-- [...]

The table below shows the % improvement in throughput (1100-2000 users)
in the various AIM7's workloads:

        Workload        % increase in throughput
        --------        ------------------------
        custom                   3.8%
        five-sec                 3.5%
        fserver                  4.1%
        high_systime            22.2%
        shared                   2.1%
        short                   10.1%

Waiman Long (2):
  locking/rwsem: reduce spinlock contention in wakeup after
    up_read/up_write
  locking/rwsem: check for active writer before wakeup

 include/linux/osq_lock.h    |    5 +++
 kernel/locking/rwsem-xadd.c |   65 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup()

Reply via email to