On Fri, Mar 22, 2019 at 09:26:35AM -0700, Paul E. McKenney wrote:
> On Fri, Mar 22, 2019 at 11:50:49AM -0400, Joel Fernandes wrote:
> > On Fri, Mar 22, 2019 at 07:58:23AM -0700, Paul E. McKenney wrote:
> > [snip]
> > > > > >  #ifdef CONFIG_RCU_NOCB_CPU
> > > > > >  static cpumask_var_t rcu_nocb_mask; /* CPUs to have callbacks 
> > > > > > offloaded. */
> > > > > > @@ -94,6 +72,8 @@ static void __init 
> > > > > > rcu_bootup_announce_oddness(void)
> > > > > >             pr_info("\tRCU debug GP init slowdown %d jiffies.\n", 
> > > > > > gp_init_delay);
> > > > > >     if (gp_cleanup_delay)
> > > > > >             pr_info("\tRCU debug GP init slowdown %d jiffies.\n", 
> > > > > > gp_cleanup_delay);
> > > > > > +   if (!use_softirq)
> > > > > > +           pr_info("\tRCU_SOFTIRQ processing moved to rcuc 
> > > > > > kthreads.\n");
> > > > > >     if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG))
> > > > > >             pr_info("\tRCU debug extended QS entry/exit.\n");
> > > > > >     rcupdate_announce_bootup_oddness();
> > > > > > @@ -629,7 +609,10 @@ static void rcu_read_unlock_special(struct 
> > > > > > task_struct *t)
> > > > > >             /* Need to defer quiescent state until everything is 
> > > > > > enabled. */
> > > > > >             if (irqs_were_disabled) {
> > > > > >                     /* Enabling irqs does not reschedule, so... */
> > > > > > -                   raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > > +                   if (!use_softirq)
> > > > > > +                           raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > > +                   else
> > > > > > +                           invoke_rcu_core();
> > > > > 
> > > > > This can result in deadlock.  This happens when the scheduler invokes
> > > > > rcu_read_unlock() with one of the rq or pi locks held, which means 
> > > > > that
> > > > > interrupts are disabled.  And it also means that the wakeup done in
> > > > > invoke_rcu_core() could go after the same rq or pi lock.
> > > > > 
> > > > > What we really need here is some way to make soemthing happen on this
> > > > > CPU just after interrupts are re-enabled.  Here are the options I see:
> > > > > 
> > > > > 1.    Do set_tsk_need_resched() and set_preempt_need_resched(),
> > > > >       just like in the "else" clause below.  This sort of works, but
> > > > >       relies on some later interrupt or similar to get things started.
> > > > >       This is just fine for normal grace periods, but not so much for
> > > > >       expedited grace periods.
> > > > > 
> > > > > 2.    IPI some other CPU and have it IPI us back.  Not such a good 
> > > > > plan
> > > > >       when running an SMP kernel on a single CPU.
> > > > > 
> > > > > 3.    Have a "stub" RCU_SOFTIRQ that contains only the following:
> > > > > 
> > > > >       /* Report any deferred quiescent states if preemption enabled. 
> > > > > */
> > > > >       if (!(preempt_count() & PREEMPT_MASK)) {
> > > > >               rcu_preempt_deferred_qs(current);
> > > > >       } else if (rcu_preempt_need_deferred_qs(current)) {
> > > > >               set_tsk_need_resched(current);
> > > > >               set_preempt_need_resched();
> > > > >       }
> > > > > 
> > > > > 4.    Except that raise_softirq_irqoff() could potentially have this
> > > > >       same problem if rcu_read_unlock() is invoked at process level
> > > > >       from the scheduler with either rq or pi locks held.  :-/
> > > > > 
> > > > >       Which raises the question "why aren't I seeing hangs and
> > > > >       lockdep splats?"
> > > > 
> > > > Interesting, could it be you're not seeing a hang in the regular case,
> > > > because enqueuing ksoftirqd on the same CPU as where the 
> > > > rcu_read_unlock is
> > > > happening is a rare event? First, ksoftirqd has to even be awakened in 
> > > > the
> > > > first place. On the other hand, with the new code the thread is always 
> > > > awaked
> > > > and is more likely to run into the issue you found?
> > > 
> > > No, in many cases, including the self-deadlock that showed up last night,
> > > raise_softirq_irqoff() will simply set a bit in a per-CPU variable.
> > > One case where this happens is when called from an interrupt handler.
> > 
> > I think we are saying the same thing, in some cases ksoftirqd will be
> > awakened and some case it will not. I will go through all scenarios to
> > convince myself it is safe, if I find some issue I will let you know.
> 
> I am suspecting that raise_softirq_irqsoff() is in fact unsafe, just
> only very rarely unsafe.
> 
> > > > The lockdep splats should be a more common occurence though IMO. If you 
> > > > could
> > > > let me know which RCU config is hanging, I can try to debug this at my 
> > > > end as
> > > > well.
> > > 
> > > TREE01, TREE02, TREE03, and TREE09.  I would guess that TREE08 would also
> > > do the same thing, given that it also sets PREEMPT=y and tests Tree RCU.
> > > 
> > > Please see the patch I posted and tested overnight.  I suspect that there
> > > is a better fix, but this does at least seem to suppress the error.
> > 
> > Ok, will do.
> > 
> > > > > Also, having lots of non-migratable timers might be considered 
> > > > > unfriendly,
> > > > > though they shouldn't be -that- heavily utilized.  Yet, anyway...
> > > > > I could try adding logic to local_irq_enable() and 
> > > > > local_irq_restore(),
> > > > > but that probably wouldn't go over all that well.  Besides, sometimes
> > > > > interrupt enabling happens in assembly language.
> > > > > 
> > > > > It is quite likely that delays to expedited grace periods wouldn't
> > > > > happen all that often.  First, the grace period has to start while
> > > > > the CPU itself (not some blocked task) is in an RCU read-side critical
> > > > > section, second, that critical section cannot be preempted, and third
> > > > > the rcu_read_unlock() must run with interrupts disabled.
> > > > > 
> > > > > Ah, but that sequence of events is not supposed to happen with the
> > > > > scheduler lock!
> > > > > 
> > > > > From Documentation/RCU/Design/Requirements/Requirements.html:
> > > > > 
> > > > >       It is forbidden to hold any of scheduler's runqueue or
> > > > >       priority-inheritance spinlocks across an rcu_read_unlock()
> > > > >       unless interrupts have been disabled across the entire RCU
> > > > >       read-side critical section, that is, up to and including the
> > > > >       matching rcu_read_lock().
> > > > > 
> > > > > Here are the reasons we even get to rcu_read_unlock_special():
> > > > > 
> > > > > 1.    The just-ended RCU read-side critical section was preempted.
> > > > >       This clearly cannot happen if interrupts are disabled across
> > > > >       the entire critical section.
> > > > > 
> > > > > 2.    The scheduling-clock interrupt noticed that this critical
> > > > >       section has been taking a long time.  But scheduling-clock
> > > > >       interrupts also cannot happen while interrupts are disabled.
> > > > > 
> > > > > 3.    An expedited grace periods started during this critical
> > > > >       section.  But if that happened, the corresponding IPI would
> > > > >       have waited until this CPU enabled interrupts, so this
> > > > >       cannot happen either.
> > > > > 
> > > > > So the call to invoke_rcu_core() should be OK after all.
> > > > > 
> > > > > Which is a bit of a disappointment, given that I am still seeing 
> > > > > hangs!
> > > > 
> > > > Oh ok, discount whatever I just said then ;-) Indeed I remember this
> > > > requirement too now. Your neat documentation skills are indeed life 
> > > > saving :D
> > > 
> > > No, this did turn out to be the problem area.  Or at least one of the
> > > problem areas.  Again, see my earlier email.
> > 
> > Ok. Too many emails so I got confused :-D. I also forgot which version of 
> > the
> > patch are we testing since I don't think an updated one was posted. But I
> > will refer to your last night diff dig out the base patch from your git 
> > tree,
> > no problem.
> > 
> > > > > I might replace this invoke_rcu_core() with set_tsk_need_resched() and
> > > > > set_preempt_need_resched() to see if that gets rid of the hangs, but
> > > > > first...
> > > > 
> > > > Could we use the NMI watchdog to dump the stack at the time of the 
> > > > hang? May
> > > > be a deadlock will present on the stack (I think its config is called
> > > > HARDLOCKUP_DETECTOR or something).
> > > 
> > > Another approach would be to instrument the locking code that notices
> > > the recursive acquisition.  Or to run lockdep...  Because none of the
> > > failing scenarios enable lockdep!  ;-)
> > 
> > I was wondering why lockdep is not always turned on in your testing. Is it
> > due to performance concerns?
> 
> Because I also need to test without lockdep.  I sometimes use
> "--kconfig CONFIG_PROVE_LOCKING=y" to force lockdep everywhere on
> a particular rcutorture run, though.  Like on the run that I just
> now started.  ;-)

But this produced no complaints.  And yes, I did check the console output
to make sure that lockdep was in fact enabled.  Color me confused...

                                                        Thanx, Paul

Reply via email to