Maybe RCU_BARRIER is waiting for all RCU callback to complete but one or
more don't exists, since the module was unloaded.

On Fri, Feb 3, 2012 at 11:48 AM, Robert Beckett
<robert.beck...@ziilabs.com>wrote:

> Hello,
>
> Does anyone reading this list know much about the rcu subsystem?
>
> I have been debugging a problem with unmounting disks. Occasionally when
> unmounting an ext4 filesystem, the whole system would freeze.
> I traced this to it waiting for completion on an rcu_barrier.
>
> After lots of debugging, I found that the problem was that when scheduling
> the rcu barrier callback on each cpu (_rcu_barrier in kernel/rcutree.c),
> one of the cpus had just entered a cpu_idle loop, waiting on a timer with a
> max timeout.
> The on_each_cpu call uses IPI calls to schedule the callback on each cpu.
> This exits the pm_idle call, the IPI interrupt is handled, and the callback
> is called. It schedules the barrier callback on this cpu (see __call_rcu in
> kernel/rcutree.c), but does not kick off the rcu core to start handling the
> callback because interrupts are disabled (we are in an interrupt handler,
> so interrupts are correctly disabled). It then exits the interrupt handler
> for the IPI, but nothing has set the idle thread as needing a resched, so
> it stays within the inner loop of cpu_idle and waits for the massive timer
> to expire.
>
> It looks to me that something either needs to wake up the idle cpu when an
> rcu callback is scheduled on it (I couldnt figure out how to do that), or
> it should not be scheduled on a completely idle cpu as this cpu is already
> in a quiescent state.
>
> A fix that I made was to break out of the inner loop of cpu_idle if
> (!need_resched() && !rcu_pending(smp_processor_id(**)). This allows the
> IPI call which scheduled the rcu callback to break out of the inner loop
> when the interrupt handler is exited because the newly queued rcu callback
> has caused rcu_pending to be true.
>
> Can anyone comment on whether this is in fact a bug, and if so, is this a
> reasonable fix ( I suspect that there will be a more elegant solution, but
> I dont have time to discover it)?
>
> (I am running on a quad core A9 CPU, with CONFIG_PREEMPT_NONE and
> CONFIG_NO_HZ).
>
> Thanks
>
> Bob
>
> --
> unsubscribe: 
> android-kernel+unsubscribe@**googlegroups.com<android-kernel%2bunsubscr...@googlegroups.com>
> website: 
> http://groups.google.com/**group/android-kernel<http://groups.google.com/group/android-kernel>

-- 
unsubscribe: android-kernel+unsubscr...@googlegroups.com
website: http://groups.google.com/group/android-kernel

Reply via email to