"Paul E. McKenney" <paul...@linux.vnet.ibm.com> writes: > On Wed, Jun 11, 2014 at 01:46:08PM -0700, Eric W. Biederman wrote: >> On the chance it is dropping the old nsproxy which calls syncrhonize_rcu >> in switch_task_namespaces that is causing you problems I have attached >> a patch that changes from rcu_read_lock to task_lock for code that >> calls task_nsproxy from a different task. The code should be safe >> and it should be an unquestions performance improvement but I have only >> compile tested it. >> >> If you can try the patch it will tell is if the problem is the rcu >> access in switch_task_namespaces (the only one I am aware of network >> namespace creation) or if the problem rcu case is somewhere else. >> >> If nothing else knowing which rcu accesses are causing the slow down >> seem important at the end of the day. >> >> Eric >> > > If this is the culprit, another approach would be to use workqueues from > RCU callbacks. The following (untested, probably does not even build) > patch illustrates one such approach.
For reference the only reason we are using rcu_lock today for nsproxy is an old lock ordering problem that does not exist anymore. I can say that in some workloads setns is a bit heavy today because of the synchronize_rcu and setns is more important that I had previously thought because pthreads break the classic unix ability to do things in your process after fork() (sigh). Today daemonize is gone, and notify the parent process with a signal relies on task_active_pid_ns which does not use nsproxy. So the old lock ordering problem/race is gone. The description of what was happening when the code switched from task_lock to rcu_read_lock to protect nsproxy. commit cf7b708c8d1d7a27736771bcf4c457b332b0f818 Author: Pavel Emelyanov <xe...@openvz.org> Date: Thu Oct 18 23:39:54 2007 -0700 Make access to task's nsproxy lighter When someone wants to deal with some other taks's namespaces it has to lock the task and then to get the desired namespace if the one exists. This is slow on read-only paths and may be impossible in some cases. E.g. Oleg recently noticed a race between unshare() and the (sent for review in cgroups) pid namespaces - when the task notifies the parent it has to know the parent's namespace, but taking the task_lock() is impossible there - the code is under write locked tasklist lock. On the other hand switching the namespace on task (daemonize) and releasing the namespace (after the last task exit) is rather rare operation and we can sacrifice its speed to solve the issues above. The access to other task namespaces is proposed to be performed like this: rcu_read_lock(); nsproxy = task_nsproxy(tsk); if (nsproxy != NULL) { / * * work with the namespaces here * e.g. get the reference on one of them * / } / * * NULL task_nsproxy() means that this task is * almost dead (zombie) * / rcu_read_unlock(); This patch has passed the review by Eric and Oleg :) and, of course, tested. [c...@fr.ibm.com: fix unshare()] [ebied...@xmission.com: Update get_net_ns_by_pid] Signed-off-by: Pavel Emelyanov <xe...@openvz.org> Signed-off-by: Eric W. Biederman <ebied...@xmission.com> Cc: Oleg Nesterov <o...@tv-sign.ru> Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com> Cc: Serge Hallyn <se...@us.ibm.com> Signed-off-by: Cedric Le Goater <c...@fr.ibm.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/