On Fri, Jun 13, 2014 at 06:00:04PM +0200, Frederic Weisbecker wrote: > On Fri, Jun 13, 2014 at 08:52:33AM -0700, Paul E. McKenney wrote: > > On Fri, Jun 13, 2014 at 02:47:16PM +0200, Frederic Weisbecker wrote: > > > On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote: > > > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote: > > > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote: > > > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but > > > > > > actual > > > > > > users should be a tiny minority, if actually any. > > > > > > > > > > > > Also there is a risk that affining the GP kthread to a single CPU > > > > > > could > > > > > > end up noticeably reducing RCU performances and increasing energy > > > > > > consumption. > > > > > > > > > > > > So lets affine the GP kthread only when nohz full is actually used > > > > > > (ie: when the nohz_full= parameter is filled or > > > > > > CONFIG_NO_HZ_FULL_ALL=y) > > > > > > > > Which reminds me... Kernel-heavy workloads running NO_HZ_FULL_ALL=y > > > > can see long RCU grace periods, as in about two seconds each. It is > > > > not hard for me to detect this situation. > > > > > > Ah yeah sounds quite long. > > > > > > > Is there some way I can > > > > call for a given CPU's scheduling-clock interrupt to be turned on? > > > > > > Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is > > > merged, > > > a simple call to tick_nohz_full_kick_cpu() should do the trick. Although > > > the > > > right condition must be there on the IPI side. Maybe with rcu_needs_cpu() > > > or such. > > > > I could record the offending GP, and make rcu_needs_cpu() return true > > if the current GP matches the offending one. > > > > > But it would be interesting to identify the sources of these extended > > > grace periods. > > > If we only restart the tick, we may ignore some deeper oustanding issue. > > > > Some of them have been fixable by other means, but they will probably > > come back as system sizes grow. And I really have put preemption points > > into kernel code in response to RCU CPU stall warnings, and the current > > state of NO_HZ_FULL effectively ignores these preemption points. :-/ > > I'm not sure I really understand the issue though. So you have RCU CPU stalls > due > to very extended grace periods, right? > > I'm not sure how preemption points would solve that. Or maybe you're > trying to trigger quiescent states reports through these preemption points?
If we have scheduling-clock interrupts, the preemption points will help push RCU through its state machine. If we don't have scheduling-clock interrupts, RCU can't make progress in this case. > Is it because we have dynticks CPUs staying too long in the kernel without > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or > things? Sort of the former, but combined with the fact that in-kernel CPUs still need scheduling-clock interrupts for RCU to make progress. I could move this to RCU's context-switch hook, but that could be very bad for workloads that do lots of context switching. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/