On Tue, Oct 13, 2020 at 09:26:50AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 13, 2020 at 01:25:44PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2020 at 12:44:50PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2020 at 12:34:06PM +0200, Peter Zijlstra wrote:
> > > > On Mon, Oct 12, 2020 at 02:28:12PM -0700, Paul E. McKenney wrote:
> > > > > It is certainly an accident waiting to happen.  Would something like
> > > > > the following make sense?
> > > > 
> > > > Sadly no.
> > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index bfd38f2..52a63bc 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -4067,6 +4067,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > >  
> > > > >       rnp = rdp->mynode;
> > > > >       mask = rdp->grpmask;
> > > > > +     lockdep_off();
> > > > >       raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > >       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
> > > > >       newcpu = !(rnp->expmaskinitnext & mask);
> > > > > @@ -4086,6 +4087,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > >       } else {
> > > > >               raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > >       }
> > > > > +     lockdep_on();
> > > > >       smp_mb(); /* Ensure RCU read-side usage follows above 
> > > > > initialization. */
> > > > >  }
> > > > 
> > > > This will just shut it up, but will not fix the actual problem of that
> > > > spin-lock ending up in trace_lock_acquire() which relies on RCU which
> > > > isn't looking.
> > > > 
> > > > What we need here is to supress tracing not lockdep. Let me consider.
> > > 
> > > We appear to have a similar problem with rcu_report_dead(), it's
> > > raw_spin_unlock()s can end up in trace_lock_release() while we just
> > > killed RCU.
> > 
> > So we can deal with the explicit trace_*() calls like the below, but I
> > really don't like it much. It also doesn't help with function tracing.
> > This is really early/late in the hotplug cycle and should be considered
> > entry, we shouldn't be tracing anything here.
> > 
> > Paul, would it be possible to use a scheme similar to IRQ/NMI for
> > hotplug? That seems to mostly rely on atomic ops, not locks.
> 
> The rest of the rcu_node tree and the various grace-period/hotplug races
> makes that question non-trivial.  I will look into it, but I have no
> reason for optimism.
> 
> But there is only one way to find out...  ;-)

The aforementioned races get really ugly really fast.  So I do not
believe that a lockless approach is a strategy to win here.

But why not use something sort of like a sequence counter, but adapted
for local on-CPU use?  This should quiet the diagnostics for the full
time that RCU needs its locks.  Untested patch below.

Thoughts?

                                                        Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1d42909..5b06886 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1152,13 +1152,15 @@ bool rcu_lockdep_current_cpu_online(void)
        struct rcu_data *rdp;
        struct rcu_node *rnp;
        bool ret = false;
+       unsigned long seq;
 
        if (in_nmi() || !rcu_scheduler_fully_active)
                return true;
        preempt_disable_notrace();
        rdp = this_cpu_ptr(&rcu_data);
        rnp = rdp->mynode;
-       if (rdp->grpmask & rcu_rnp_online_cpus(rnp))
+       seq = READ_ONCE(rnp->ofl_seq) & ~0x1;
+       if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || seq != 
READ_ONCE(rnp->ofl_seq))
                ret = true;
        preempt_enable_notrace();
        return ret;
@@ -4065,6 +4067,8 @@ void rcu_cpu_starting(unsigned int cpu)
 
        rnp = rdp->mynode;
        mask = rdp->grpmask;
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
        raw_spin_lock_irqsave_rcu_node(rnp, flags);
        WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
        newcpu = !(rnp->expmaskinitnext & mask);
@@ -4084,6 +4088,8 @@ void rcu_cpu_starting(unsigned int cpu)
        } else {
                raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        }
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(rnp->ofl_seq & 0x1);
        smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
@@ -4111,6 +4117,8 @@ void rcu_report_dead(unsigned int cpu)
 
        /* Remove outgoing CPU from mask in the leaf rcu_node structure. */
        mask = rdp->grpmask;
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
        raw_spin_lock(&rcu_state.ofl_lock);
        raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order 
guarantee. */
        rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4123,6 +4131,8 @@ void rcu_report_dead(unsigned int cpu)
        WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        raw_spin_unlock(&rcu_state.ofl_lock);
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(rnp->ofl_seq & 0x1);
 
        rdp->cpu_started = false;
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 805c9eb..7d802b6 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -57,6 +57,7 @@ struct rcu_node {
                                /*  beginning of each grace period. */
        unsigned long qsmaskinitnext;
                                /* Online CPUs for next grace period. */
+       unsigned long ofl_seq;  /* CPU-hotplug operation sequence count. */
        unsigned long expmask;  /* CPUs or groups that need to check in */
                                /*  to allow the current expedited GP */
                                /*  to complete. */

Reply via email to