Re: tty^Wrcu/perf lockdep trace.

Peter Zijlstra Mon, 07 Oct 2013 04:25:14 -0700

On Fri, Oct 04, 2013 at 05:23:48PM -0700, Paul E. McKenney wrote:
> The underlying problem is that perf is invoking call_rcu() with the
> scheduler locks held, but in NOCB mode, call_rcu() will with high
> probability invoke the scheduler -- which just might want to use its
> locks.  The reason that call_rcu() needs to invoke the scheduler is
> to wake up the corresponding rcuo callback-offload kthread, which
> does the job of starting up a grace period and invoking the callbacks
> afterwards.
> 
> One solution (championed on a related problem by Lai Jiangshan) is to


That's rcu_read_unlock_special(), right? 

> simply defer the wakeup to some point where scheduler locks are no longer
> held.  Since we don't want to unnecessarily incur the cost of such
> deferral, the task before us is threefold:
> 
> 1.    Determine when it is likely that a relevant scheduler lock is held.
> 
> 2.    Defer the wakeup in such cases.
> 
> 3.    Ensure that all deferred wakeups eventually happen, preferably
>       sooner rather than later.
> 
> We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> being held.  This works because the relevant locks are always acquired
> with interrupts disabled.  We may defer more often than needed, but that
> is at least safe.

Fair enough; do you feel the need for something more specific?

> The wakeup deferral is tracked via a new field in the per-CPU and
> per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
> 
> This flag is checked by the RCU core processing.  The __rcu_pending()
> function now checks this flag, which causes rcu_check_callbacks()
> to initiate RCU core processing at each scheduling-clock interrupt
> where this flag is set.  Of course this is not sufficient because
> scheduling-clock interrupts are often turned off (the things we used to
> be able to count on!).  So the flags are also checked on entry to any
> state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
> state and NO_HZ_FULL user-mode-execution state.

So RCU doesn't current differentiate between EQS for nr_running==1 and
nr_running==0?

> This approach should allow call_rcu() to be invoked regardless of what
> locks you might be holding, the key word being "should".

Agreed. Except it looks like you've inverted the deferred wakeup
condition :-)

> @@ -2314,6 +2323,22 @@ static int rcu_nocb_kthread(void *arg)
>       return 0;
>  }
>  
> +/* Is a deferred wakeup of rcu_nocb_kthread() required? */
> +static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> +{
> +     return ACCESS_ONCE(rdp->nocb_defer_wakeup);
> +}
> +
> +/* Do a deferred wakeup of rcu_nocb_kthread(). */
> +static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> +{
> +     if (rcu_nocb_need_deferred_wakeup(rdp))

        !rcu_nocb_need_deferred_wakeup() ?

> +             return;
> +     ACCESS_ONCE(rdp->nocb_defer_wakeup) = false;
> +     wake_up(&rdp->nocb_wq);
> +     trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty"));
> +}
> +
>  /* Initialize per-rcu_data variables for no-CBs CPUs. */
>  static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tty^Wrcu/perf lockdep trace.

Reply via email to