Re: [PATCH v2] signals: Avoid unnecessary taking of sighand->siglock

2016-09-26 Thread Waiman Long

On 09/23/2016 03:43 PM, Stas Sergeev wrote:

23.09.2016 19:56, Waiman Long пишет:

When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.

Hi, I was recently facing the same problem, and my solution
was to extract swapcontext() from libtask - it has better semantic
and does not do sigprocmask. How much you hack sigprocmask,
it is still faster to just not call it at all.
Alternatively, perhaps the speed-up can be achieved if the
current mask is exported to glibc via vdso.
Just my 2 cents.


The problem was in a third-party software not under our control. I am 
just doing my part to try to alleviate the problem from the kernel's 
perspective.


Cheers,
Longman


Re: [PATCH v2] signals: Avoid unnecessary taking of sighand->siglock

2016-09-26 Thread Oleg Nesterov
On 09/23, Waiman Long wrote:
>
>  
> + /*
> +  * In case the signal mask hasn't changed, we won't need to take
> +  * the lock. The current blocked mask can be modified by other CPUs.
> +  * To be safe, we need to do an atomic read without lock. As a result,
> +  * this check will only be done on 64-bit architectures.
> +  */
> + if ((_NSIG_WORDS == 1) &&
> + (READ_ONCE(tsk->blocked.sig[0]) == newset->sig[0]))
> + return;

so in case you missed my reply to V1, I still think that the comment is wrong
and you should drop the _NSIG_WORDS check.

Oleg.



Re: [PATCH v2] signals: Avoid unnecessary taking of sighand->siglock

2016-09-23 Thread Stas Sergeev

23.09.2016 19:56, Waiman Long пишет:

When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.

Hi, I was recently facing the same problem, and my solution
was to extract swapcontext() from libtask - it has better semantic
and does not do sigprocmask. How much you hack sigprocmask,
it is still faster to just not call it at all.
Alternatively, perhaps the speed-up can be achieved if the
current mask is exported to glibc via vdso.
Just my 2 cents.


[PATCH v2] signals: Avoid unnecessary taking of sighand->siglock

2016-09-23 Thread Waiman Long
When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.

  9.30%  9.30%  905387  dataserver  /proc/kcore 0x7fff8163f4d2
  [k] _raw_spin_lock_irq
|
---_raw_spin_lock_irq
   |
   |--99.34%-- __set_current_blocked
   |  sigprocmask
   |  sys_rt_sigprocmask
   |  system_call_fastpath
   |  |
   |  |--50.63%-- __swapcontext
   |  |  |
   |  |  |--99.91%-- upsleepgeneric
   |  |
   |  |--49.36%-- __setcontext
   |  |  ktskRun

Looking further into the swapcontext function in glibc, it was found
that the function always call sigprocmask() without checking if there
are changes in the signal mask.

A check was added to the __set_current_blocked() function to avoid
taking the sighand->siglock spinlock if there is no change in the
signal mask. This will prevent unneeded spinlock contention when many
threads are trying to call sigprocmask().

With this patch applied, the spinlock contention in sigprocmask() was
gone.

This patch is currently only active for 64-bit architectures.

Signed-off-by: Waiman Long 
---
 v1->v2:
  - Fix compiler warning in mips.

 kernel/signal.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index af21afc..e4296b6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2485,6 +2485,16 @@ void __set_current_blocked(const sigset_t *newset)
 {
struct task_struct *tsk = current;
 
+   /*
+* In case the signal mask hasn't changed, we won't need to take
+* the lock. The current blocked mask can be modified by other CPUs.
+* To be safe, we need to do an atomic read without lock. As a result,
+* this check will only be done on 64-bit architectures.
+*/
+   if ((_NSIG_WORDS == 1) &&
+   (READ_ONCE(tsk->blocked.sig[0]) == newset->sig[0]))
+   return;
+
spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, newset);
spin_unlock_irq(&tsk->sighand->siglock);
-- 
1.7.1