On Fri, Jun 27, 2014 at 07:20:38AM -0700, Paul E. McKenney wrote: > An 80-CPU system with a context-switch-heavy workload can require so > many NOCB kthread wakeups that the RCU grace-period kthreads spend several > tens of percent of a CPU just awakening things. This clearly will not > scale well: If you add enough CPUs, the RCU grace-period kthreads would > get behind, increasing grace-period latency. > > To avoid this problem, this commit divides the NOCB kthreads into leaders > and followers, where the grace-period kthreads awaken the leaders each of > whom in turn awakens its followers. By default, the number of groups of > kthreads is the square root of the number of CPUs, but this default may > be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. > This reduces the number of wakeups done per grace period by the RCU > grace-period kthread by the square root of the number of CPUs, but of > course by shifting those wakeups to the leaders. In addition, because > the leaders do grace periods on behalf of their respective followers, > the number of wakeups of the followers decreases by up to a factor of two. > Instead of being awakened once when new callbacks arrive and again > at the end of the grace period, the followers are awakened only at > the end of the grace period. > > For a numerical example, in a 4096-CPU system, the grace-period kthread > would awaken 64 leaders, each of which would awaken its 63 followers > at the end of the grace period. This compares favorably with the 79 > wakeups for the grace-period kthread on an 80-CPU system.
Urgh, how about we kill the entire nocb nonsense and try again? This is getting quite rediculous.
pgpXt8_H8LS1W.pgp
Description: PGP signature

