On Fri, Jun 22, 2018 at 12:01:49PM -0400, Steven Rostedt wrote: > On Fri, 22 Jun 2018 06:28:43 -0700 > "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote: > > > It has been some years since I traced the code flow, but what happened > > back then is that it switches itself from an interrupt handler to not > > without actually returning from the interrupt. This can only happen when > > interrupting a non-idle process, thankfully, and RCU's dyntick-idle code > > relies on this restriction. If I remember correctly, the code ends up > > executing in the context of the interrupted process, but it has been some > > years, so please apply appropriate skepticism. > > If irq_enter() is not paired with irq_exit() then major things will > break. Especially since that's how in_interrupt() and friends rely on to > work. > > Now, perhaps rcu_irq_enter() is called elsewhere (as a git grep appears > it may be), and that rcu_irq_enter() may not be paired with > rcu_irq_exit(). But that's not anything to do with the irq_enter() and > irq_exit() routines being paired or not.
The non-irq_enter() calls to rcu_irq_enter() and the non-irq_exit() calls to rcu_irq_exit() do appear to be balanced as of v4.17. If I recall correctly, the offending piece of functionality was the usermode helpers, which on some architectures did a syscall exception from within the kernel to make a system call happen. This seems to now be common code using workqueues, kernel threads, and do_execve(). Here is the best reference I could find to the specific problem I encountered back in the day: https://groups.google.com/forum/#!msg/linux.kernel/B5hZX1tJRs8/sOVVfhrirL8J I do recall that there were real failures. There is no way I would have written code tolerating half-interrupts without cause, no more than I would have written code handling what looks to RCU like interrupts from NMI handlers without cause. ;-) One approach would be for me to add a WARN_ON_ONCE() to check for misnesting. If this didn't trigger for some time long enough for the check to propagate to the various distros' users, then this code could be simplified. Though it would not be that big a deal, just the removal of a store or two. Thanx, Paul