On Fri, Jun 13, 2014 at 03:49:26PM -0700, Paul E. McKenney wrote: > On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote: > > On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote: > > > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote: > > > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote: > > > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote: > > > > > > > Is it because we have dynticks CPUs staying too long in the > > > > > > > kernel without > > > > > > > taking any quiescent states? Are we perhaps missing some > > > > > > > rcu_user_enter() or > > > > > > > things? > > > > > > > > > > > > Sort of the former, but combined with the fact that in-kernel CPUs > > > > > > still > > > > > > need scheduling-clock interrupts for RCU to make progress. I could > > > > > > move this to RCU's context-switch hook, but that could be very bad > > > > > > for > > > > > > workloads that do lots of context switching. > > > > > > > > > > Or I can restart the tick if the CPU stays in the kernel for too long > > > > > without > > > > > a tick. I think that's what we were doing before but we removed that > > > > > because > > > > > we never implemented it correctly (we sent scheduler IPI that did > > > > > nothing...) > > > > > > > > I wonder if timer slack would make sense here: when you have at least > > > > one RCU callback pending, set a timer with a huge amount of timer slack, > > > > and cancel it if you end up handling the callback via a trip through the > > > > scheduler. > > > > > > But in this case, we need the tick even if the current CPU has no > > > callbacks > > > because it might be in an RCU read-side critical section. > > > > Don't we handle that case via the slowpath of rcu_read_unlock, and a > > flag set via IPI? ("Oh, that CPU has taken too long to note a quiescent > > state; send it an IPI to set the special flag that makes unlock do the > > work.") > > There was once such logic on the force-quiescent-state path, and making > that handle this new case was my first proposal. As Frederic pointed > out, that change requires rcu_needs_cpu()'s cooperation, because otherwise > the CPU will take the IPI, see that it still has but one runnable task, > and then keep its scheduling-clock interrupt off.
Exactly. So that's what happens currently, we call rcu_kick_nohz_cpu() on extended grace periods but the IPI doesn't reconsider the tick. In fact it doesn't do anything at all because the scheduler IPI, when invoked without a reason, doesn't even call irq_enter()/irq_exit(), so rcu_needs_cpu() isn't quite called from there. Now that's going to change with https://lwn.net/Articles/601836/ if we convert rcu_kick_nohz_cpu() to tick_nohz_full_kick_cpu(). Then we have the choice between two options: * We can add a check in tick_nohz_full_check() and restart the tick if necessary. * Extend rcu_needs_cpu() to restore a similar periodic mode until the grace periods get some progress. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/