On Sun, 2012-07-22 at 20:43 +0200, Mike Galbraith wrote: 
> On Sat, 2012-07-21 at 09:47 +0200, Mike Galbraith wrote: 
> > On Wed, 2012-07-18 at 07:30 +0200, Mike Galbraith wrote: 
> > > On Wed, 2012-07-18 at 06:44 +0200, Mike Galbraith wrote:
> > > 
> > > > The patch in question for missing Cc.  Maybe should be only mutex, but I
> > > > see no reason why IO dependency can only possibly exist for mutexes...
> > > 
> > > Well that was easy, box quickly said "nope, mutex only does NOT cut it".
> > 
> > And I also learned (ouch) that both doesn't cut it either.  Ksoftirqd
> > (or sirq-blk) being nailed by q->lock in blk_done_softirq() is.. not
> > particularly wonderful.  As long as that doesn't happen, IO deadlock
> > doesn't happen, troublesome filesystems just work.  If it does happen
> > though, you've instantly got a problem.
> 
> That problem being slab_lock in practice btw, though I suppose it could
> do the same with any number of others.  In encountered case, ksoftirqd
> (or sirq-blk) blocks on slab_lock while holding q->queue_lock, while a
> userspace task (dbench) blocks on q->queue_lock while holding slab_lock
> on the same cpu.  Game over.

Hello vacationing rt wizards' mail boxen (and others so bored they're
actually reading about obscure -rt IO troubles;).

ext4 is still alive, which is a positive sign, and box hasn't yet
deadlocked either, another sign.  Now all I have to do is (sigh) grind
filesystems to fine powder for a few days.. again.

---
 kernel/rtmutex.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -649,7 +649,14 @@ static inline void rt_spin_lock_fastlock
        if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
                rt_mutex_deadlock_account_lock(lock, current);
        else {
-               if (blk_needs_flush_plug(current))
+               /*
+                * We can't pull the plug if we're already holding a lock
+                * else we can deadlock.  eg, if we're holding slab_lock,
+                * ksoftirqd can block while processing BLOCK_SOFTIRQ after
+                * having acquired q->queue_lock.  If _we_ then block on
+                * that q->queue_lock while flushing our plug, deadlock.
+                */
+               if (__migrate_disabled(current) < 2 && 
blk_needs_flush_plug(current))
                        blk_schedule_flush_plug(current);
                slowfn(lock);
        }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to