Maybe RCU_BARRIER is waiting for all RCU callback to complete but one or more don't exists, since the module was unloaded.
On Fri, Feb 3, 2012 at 11:48 AM, Robert Beckett <robert.beck...@ziilabs.com>wrote: > Hello, > > Does anyone reading this list know much about the rcu subsystem? > > I have been debugging a problem with unmounting disks. Occasionally when > unmounting an ext4 filesystem, the whole system would freeze. > I traced this to it waiting for completion on an rcu_barrier. > > After lots of debugging, I found that the problem was that when scheduling > the rcu barrier callback on each cpu (_rcu_barrier in kernel/rcutree.c), > one of the cpus had just entered a cpu_idle loop, waiting on a timer with a > max timeout. > The on_each_cpu call uses IPI calls to schedule the callback on each cpu. > This exits the pm_idle call, the IPI interrupt is handled, and the callback > is called. It schedules the barrier callback on this cpu (see __call_rcu in > kernel/rcutree.c), but does not kick off the rcu core to start handling the > callback because interrupts are disabled (we are in an interrupt handler, > so interrupts are correctly disabled). It then exits the interrupt handler > for the IPI, but nothing has set the idle thread as needing a resched, so > it stays within the inner loop of cpu_idle and waits for the massive timer > to expire. > > It looks to me that something either needs to wake up the idle cpu when an > rcu callback is scheduled on it (I couldnt figure out how to do that), or > it should not be scheduled on a completely idle cpu as this cpu is already > in a quiescent state. > > A fix that I made was to break out of the inner loop of cpu_idle if > (!need_resched() && !rcu_pending(smp_processor_id(**)). This allows the > IPI call which scheduled the rcu callback to break out of the inner loop > when the interrupt handler is exited because the newly queued rcu callback > has caused rcu_pending to be true. > > Can anyone comment on whether this is in fact a bug, and if so, is this a > reasonable fix ( I suspect that there will be a more elegant solution, but > I dont have time to discover it)? > > (I am running on a quad core A9 CPU, with CONFIG_PREEMPT_NONE and > CONFIG_NO_HZ). > > Thanks > > Bob > > -- > unsubscribe: > android-kernel+unsubscribe@**googlegroups.com<android-kernel%2bunsubscr...@googlegroups.com> > website: > http://groups.google.com/**group/android-kernel<http://groups.google.com/group/android-kernel> -- unsubscribe: android-kernel+unsubscr...@googlegroups.com website: http://groups.google.com/group/android-kernel