Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-17 Thread Mike Galbraith
On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote: 
> On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
> > >  
> > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > > > priority process preempted a lower priority process that holds the 
> > > > > same
> > > > > lock, it will deadlock.
> > > > 
> > > > Hm, how, it's doing cpu_chill()?
> > > 
> > > 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> > > seems to fix it up, maybe that's the patchlet to submit (less is more).
> > 
> > There's that too. But the issue I was talking about is with all trylock
> > loops. As holding an rt-mutex now disables migration, if a high priority
> > process preempts a task that holds the lock, and then the high prio task
> > starts spinning waiting for that lock to release, the lower priority
> > process will never get to run to release it. The cpu_chill() doesn't
> > help.
> 
> Hrm.  I better go make a testcase, this one definitely wants pounding
> through thick skull.
> 
> I think all of the chilling in patchlet is really ugly anyway, so would
> prefer to trash it all, just enable the lockdep bits.  If it turns out
> we really do need to bounce off of counts, go get a bigger hammer when
> the need arises.  For the nonce, the pre-installed hammer _seemed_ big
> enough for the job.

All night dbench session, and all day doing many full xfstests runs
(what will run on btrfs), fsstress -p64, and generic beating says the
pre-installed tool is fine all by itself, so here comes a zero line
patch.. the second best kind ;-)

rt,fs,btrfs: fix rt deadlock on extent_buffer->lock

Trivially repeatable deadlock is cured by enabling lockdep code in
btrfs_clear_path_blocking() as suggested by Chris Mason.  He also
suggested restricting blocking reader count to one, and not allowing
a spinning reader while blocking reader exists.  This has proven to
be unnecessary, the strict lock order enforcement is enough.. or
rather that's my box's opinion after long hours of hard pounding.

Signed-off-by: Mike Galbraith 
Cc: Chris Mason 

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2e66786..1f71eb0 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -72,7 +72,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
int i;
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE))
/* lockdep really cares that we take all of these spinlocks
 * in the right order.  If any of the locks in the path are not
 * currently blocking, it is going to complain.  So, make really
@@ -89,7 +89,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
btrfs_clear_lock_blocking(p->nodes[i]);
}
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE))
if (held)
btrfs_clear_lock_blocking(held);
 #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Tue, 2012-07-17 at 00:34 -0400, Steven Rostedt wrote: 
> On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote:
> 
> > Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> > msleep(1) on RT, which would keep a deadlock from happening.
> 
> Perhaps cpu_chill() isn't a good name, as it doesn't really explain what
> is happening. Perhaps one of the following?
> 
>   cpu_rest()
>   cpu_sleep()
>   cpu_deep_relax()
>   cpu_dream()
>   cpu_hypnotize()

(   cpu_waste_truckloads_of_time();)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote: 
> On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote:
> >  
> > > There's that too. But the issue I was talking about is with all trylock
> > > loops. As holding an rt-mutex now disables migration, if a high priority
> > > process preempts a task that holds the lock, and then the high prio task
> > > starts spinning waiting for that lock to release, the lower priority
> > > process will never get to run to release it. The cpu_chill() doesn't
> > > help.
> > 
> > Hrm.  I better go make a testcase, this one definitely wants pounding
> > through thick skull.
> 
> Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> msleep(1) on RT, which would keep a deadlock from happening.

Whew!  There are no stars and moons on my pointy hat, just plain white
cone, so you had me worried I was missing something critical there.

> It doesn't explain the performance enhancement you get :-/

No, it doesn't.  The only thing I can think of is that while folks are
timed sleeping, they aren't preempting and interleaving IO as much, but
I'm pulling that out of thin air.  Timed sleep should be a lot longer
than regular wakeup, so to my mind, there should be less interleave due
to more thumb twiddling.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Steven Rostedt
On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote:

> Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> msleep(1) on RT, which would keep a deadlock from happening.

Perhaps cpu_chill() isn't a good name, as it doesn't really explain what
is happening. Perhaps one of the following?

  cpu_rest()
  cpu_sleep()
  cpu_deep_relax()
  cpu_dream()
  cpu_hypnotize()

-- Steve
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Steven Rostedt
On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote:
>  
> > There's that too. But the issue I was talking about is with all trylock
> > loops. As holding an rt-mutex now disables migration, if a high priority
> > process preempts a task that holds the lock, and then the high prio task
> > starts spinning waiting for that lock to release, the lower priority
> > process will never get to run to release it. The cpu_chill() doesn't
> > help.
> 
> Hrm.  I better go make a testcase, this one definitely wants pounding
> through thick skull.

Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
msleep(1) on RT, which would keep a deadlock from happening.

It doesn't explain the performance enhancement you get :-/

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: 
> On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
> >  
> > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > > priority process preempted a lower priority process that holds the same
> > > > lock, it will deadlock.
> > > 
> > > Hm, how, it's doing cpu_chill()?
> > 
> > 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> > seems to fix it up, maybe that's the patchlet to submit (less is more).
> 
> There's that too. But the issue I was talking about is with all trylock
> loops. As holding an rt-mutex now disables migration, if a high priority
> process preempts a task that holds the lock, and then the high prio task
> starts spinning waiting for that lock to release, the lower priority
> process will never get to run to release it. The cpu_chill() doesn't
> help.

Hrm.  I better go make a testcase, this one definitely wants pounding
through thick skull.

I think all of the chilling in patchlet is really ugly anyway, so would
prefer to trash it all, just enable the lockdep bits.  If it turns out
we really do need to bounce off of counts, go get a bigger hammer when
the need arises.  For the nonce, the pre-installed hammer _seemed_ big
enough for the job.

What's a good way to beat living hell out of btrfs?  I've never been
into destructive fs testing, since they usually lived on my one and only
disk.  x3550 has two, and OS clone has already been sacrificed.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Mon, 2012-07-16 at 11:43 -0400, Chris Mason wrote: 
> On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote:

> > Seems btrfs isn't entirely convinced either.
> > 
> > [ 2292.336229] use_block_rsv: 1810 callbacks suppressed
> > [ 2292.336231] [ cut here ]
> > [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 
> > use_block_rsv+0x17d/0x190 [btrfs]()
> > [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
> > [ 2292.336259] btrfs: block rsv returned -28
> 
> This is unrelated.  You got far enough into the benchmark to hit an
> ENOSPC warning.  This can be ignored (I just deleted it when we used 3.0
> for oracle).

Ah great, thanks.  I'll whack it in my tree as well then.

> re: dbench performance.  dbench tends to penalize fairness.  I can
> imagine RT making it slower in general.

It seems to work just fine for my normal workloads, and cyclictest is
happy, so I'm happy.  Zillion threads is 'keep the pieces' to me ;-)

If you think the patch is ok as is, I'll go ahead and submit it after I
let dbench hammer on it overnight at least.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Steven Rostedt
On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
>  
> > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > priority process preempted a lower priority process that holds the same
> > > lock, it will deadlock.
> > 
> > Hm, how, it's doing cpu_chill()?
> 
> 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> seems to fix it up, maybe that's the patchlet to submit (less is more).

There's that too. But the issue I was talking about is with all trylock
loops. As holding an rt-mutex now disables migration, if a high priority
process preempts a task that holds the lock, and then the high prio task
starts spinning waiting for that lock to release, the lower priority
process will never get to run to release it. The cpu_chill() doesn't
help.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Mon, 2012-07-16 at 18:26 +0200, Mike Galbraith wrote: 
> On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> > 
> > > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > > performance problems earlier on Saturday, did this improve performance?
> > > 
> > > Yeah, the read_trylock() seems to improve throughput.  That's not
> > > heavily tested, but it certainly looks like it does.  No idea why.
> > 
> > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > priority process preempted a lower priority process that holds the same
> > lock, it will deadlock.
> 
> Hm, how, it's doing cpu_chill()?

'course PI is toast, so *poof*.  Since just enabling the lockdep bits
seems to fix it up, maybe that's the patchlet to submit (less is more).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Chris Mason
On Mon, Jul 16, 2012 at 10:26:08AM -0600, Mike Galbraith wrote:
> On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> > 
> > > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > > performance problems earlier on Saturday, did this improve performance?
> > > 
> > > Yeah, the read_trylock() seems to improve throughput.  That's not
> > > heavily tested, but it certainly looks like it does.  No idea why.
> > 
> > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > priority process preempted a lower priority process that holds the same
> > lock, it will deadlock.
> 
> Hm, how, it's doing cpu_chill()?
> 
> > I'm not sure why you would get a performance benefit from this, as the
> > mutex used is an adaptive one (failure to acquire the lock will only
> > sleep if preempted or if the owner is not running).
> 
> I'm not attached to it, can whack it in a heartbeat.. especially so it
> the thing can deadlock.  I've seen enough of those of late.
> 
> > We should look at why this performs better (if it really does).
> 
> Not sure it really does, there's variance, but it looked like it did.
> 

I'd use a benchmark that is more consistent than dbench for this.  I
love dbench for generating load (and the occasional deadlock) but it
tends to steer you in the wrong direction on performance.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> 
> > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > performance problems earlier on Saturday, did this improve performance?
> > 
> > Yeah, the read_trylock() seems to improve throughput.  That's not
> > heavily tested, but it certainly looks like it does.  No idea why.
> 
> Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> priority process preempted a lower priority process that holds the same
> lock, it will deadlock.

Hm, how, it's doing cpu_chill()?

> I'm not sure why you would get a performance benefit from this, as the
> mutex used is an adaptive one (failure to acquire the lock will only
> sleep if preempted or if the owner is not running).

I'm not attached to it, can whack it in a heartbeat.. especially so it
the thing can deadlock.  I've seen enough of those of late.

> We should look at why this performs better (if it really does).

Not sure it really does, there's variance, but it looked like it did.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Steven Rostedt
On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:

> > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > performance problems earlier on Saturday, did this improve performance?
> 
> Yeah, the read_trylock() seems to improve throughput.  That's not
> heavily tested, but it certainly looks like it does.  No idea why.

Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
priority process preempted a lower priority process that holds the same
lock, it will deadlock.

I'm not sure why you would get a performance benefit from this, as the
mutex used is an adaptive one (failure to acquire the lock will only
sleep if preempted or if the owner is not running).

We should look at why this performs better (if it really does).

-- Steve

> 
> WRT performance, dbench isn't thrilled, but btrfs seems to work just
> fine for my routine usage, and spinning rust bucket is being all it can
> be.  I hope I don't have to care overly much about dbench's opinon.  It
> doesn't make happy multi-thread numbers with btrfs, but those numbers
> suddenly look great if you rebase relative to xfs -rt throughput :)
> 
> > One other question:
> > 
> > >  again:
> > > +#ifdef CONFIG_PREEMPT_RT_BASE
> > > + while (atomic_read(&eb->blocking_readers))
> > > + cpu_chill();
> > > + while(!read_trylock(&eb->lock))
> > > + cpu_chill();
> > > + if (atomic_read(&eb->blocking_readers)) {
> > > + read_unlock(&eb->lock);
> > > + goto again;
> > > + }
> > 
> > Why use read_trylock() in a loop instead of just trying to take the
> > lock?  Is this an RTism or are there other reasons?
> 
> First stab paranoia.  It worked, so I removed it.  It still worked but
> lost throughput, removed all my bits leaving only the lockdep bits, it
> still worked.
> 
> -Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Chris Mason
On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote:
> On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: 
> > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > > Greetings,
> > > 
> > > [ deadlocks with btrfs and the recent RT kernels ]
> > > 
> > > I talked with Thomas about this and I think the problem is the
> > > single-reader nature of the RW rwlocks.  The lockdep report below
> > > mentions that btrfs is calling:
> > > 
> > > > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> > > 
> > > In this case, the task has a number of blocking read locks on the btrfs 
> > > buffers,
> > > and we're trying to turn them back into spinning read locks.  Even
> > > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > > lock operation because we were blocking out new writers.
> > > 
> > > If the second task has taken the spinning read lock, it is going to
> > > prevent that clear_path_blocking operation from progressing, even though
> > > it would have worked on a non-RT kernel.
> > > 
> > > The solution should be to make the blocking read locks in btrfs honor the
> > > single-reader semantics.  This means not allowing more than one blocking
> > > reader and not allowing a spinning reader when there is a blocking
> > > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > > single lock, so I wouldn't worry about that part.
> > > 
> > > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > > sure to strictly honor top down locking order during the conversion.  It
> > > only does this when lockdep is enabled because in non-RT kernels we
> > > don't need to worry about it.  For RT we'll want to enable that as well.
> > > 
> > > I'll give this a shot later today.
> > 
> > I took a poke at it.  Did I do something similar to what you had in
> > mind, or just hide behind performance stealing paranoid trylock loops?
> > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> > bat, so it gets posted despite skepticism.
> 
> Seems btrfs isn't entirely convinced either.
> 
> [ 2292.336229] use_block_rsv: 1810 callbacks suppressed
> [ 2292.336231] [ cut here ]
> [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 
> use_block_rsv+0x17d/0x190 [btrfs]()
> [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
> [ 2292.336259] btrfs: block rsv returned -28

This is unrelated.  You got far enough into the benchmark to hit an
ENOSPC warning.  This can be ignored (I just deleted it when we used 3.0
for oracle).

re: dbench performance.  dbench tends to penalize fairness.  I can
imagine RT making it slower in general.

It also triggers lots of lock contention in btrfs because the dataset is
fairly small and the trees don't fan out a lot.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-16 Thread Mike Galbraith
On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: 
> On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > Greetings,
> > 
> > [ deadlocks with btrfs and the recent RT kernels ]
> > 
> > I talked with Thomas about this and I think the problem is the
> > single-reader nature of the RW rwlocks.  The lockdep report below
> > mentions that btrfs is calling:
> > 
> > > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> > 
> > In this case, the task has a number of blocking read locks on the btrfs 
> > buffers,
> > and we're trying to turn them back into spinning read locks.  Even
> > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > lock operation because we were blocking out new writers.
> > 
> > If the second task has taken the spinning read lock, it is going to
> > prevent that clear_path_blocking operation from progressing, even though
> > it would have worked on a non-RT kernel.
> > 
> > The solution should be to make the blocking read locks in btrfs honor the
> > single-reader semantics.  This means not allowing more than one blocking
> > reader and not allowing a spinning reader when there is a blocking
> > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > single lock, so I wouldn't worry about that part.
> > 
> > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > sure to strictly honor top down locking order during the conversion.  It
> > only does this when lockdep is enabled because in non-RT kernels we
> > don't need to worry about it.  For RT we'll want to enable that as well.
> > 
> > I'll give this a shot later today.
> 
> I took a poke at it.  Did I do something similar to what you had in
> mind, or just hide behind performance stealing paranoid trylock loops?
> Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> bat, so it gets posted despite skepticism.

Seems btrfs isn't entirely convinced either.

[ 2292.336229] use_block_rsv: 1810 callbacks suppressed
[ 2292.336231] [ cut here ]
[ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 
use_block_rsv+0x17d/0x190 [btrfs]()
[ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
[ 2292.336259] btrfs: block rsv returned -28
[ 2292.336260] Modules linked in: joydev st sr_mod ide_gd_mod(N) ide_cd_mod 
ide_core cdrom ibm_rtl nfsd lockd ipmi_devintf nfs_acl auth_rpcgss sunrpc 
ipmi_si ipmi_msghandler ipv6 ipv6_lib af_packet cpufreq_conservative 
cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf edd fuse btrfs 
zlib_deflate ext3 jbd loop dm_mod usbhid hid cdc_ether usbnet mii sg shpchp 
pci_hotplug pcspkr bnx2 ioatdma i2c_i801 i2c_core tpm_tis tpm tpm_bios 
serio_raw i7core_edac edac_core button dca iTCO_wdt iTCO_vendor_support ext4 
mbcache jbd2 uhci_hcd ehci_hcd sd_mod usbcore rtc_cmos crc_t10dif usb_common 
fan processor ata_generic ata_piix libata megaraid_sas scsi_mod thermal 
thermal_sys hwmon
[ 2292.336296] Supported: Yes
[ 2292.336298] Pid: 12975, comm: bonnie Tainted: GW  N  3.0.35-rt56-rt 
#27
[ 2292.336300] Call Trace:
[ 2292.336312]  [] dump_trace+0x82/0x2e0
[ 2292.336320]  [] dump_stack+0x69/0x6f
[ 2292.336325]  [] warn_slowpath_common+0x7b/0xc0
[ 2292.336330]  [] warn_slowpath_fmt+0x45/0x50
[ 2292.336342]  [] use_block_rsv+0x17d/0x190 [btrfs]
[ 2292.336389]  [] btrfs_alloc_free_block+0x49/0x240 [btrfs]
[ 2292.336432]  [] __btrfs_cow_block+0x13e/0x510 [btrfs]
[ 2292.336457]  [] btrfs_cow_block+0xff/0x230 [btrfs]
[ 2292.336482]  [] btrfs_search_slot+0x360/0x7e0 [btrfs]
[ 2292.336513]  [] btrfs_del_csums+0x175/0x2f0 [btrfs]
[ 2292.336562]  [] __btrfs_free_extent+0x550/0x760 [btrfs]
[ 2292.336599]  [] run_delayed_data_ref+0x9d/0x190 [btrfs]
[ 2292.336636]  [] run_clustered_refs+0xd5/0x3a0 [btrfs]
[ 2292.336678]  [] btrfs_run_delayed_refs+0x148/0x350 [btrfs]
[ 2292.336723]  [] __btrfs_end_transaction+0xb7/0x2b0 [btrfs]
[ 2292.336796]  [] btrfs_evict_inode+0x2d3/0x340 [btrfs]
[ 2292.336863]  [] evict+0x91/0x190
[ 2292.336868]  [] do_unlinkat+0x177/0x1f0
[ 2292.336875]  [] system_call_fastpath+0x16/0x1b
[ 2292.336881]  [<7fea187f9e67>] 0x7fea187f9e66
[ 2292.336887] ---[ end trace 0004 ]---
[ 2610.370398] use_block_rsv: 1947 callbacks suppressed
[ 2610.370400] [ cut here ]

> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4106264..ae47cc2 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path 
> *p,
>  {
>   int i;
>  
> -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
>   /* lockdep really cares that we take all of these spinlocks
>* in the right order.  If any of the locks in the path are not
>* currently blocking, it is going to complain.  So, make really
> @@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blo

Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-15 Thread Mike Galbraith
On Sun, 2012-07-15 at 13:56 -0400, Chris Mason wrote: 
> On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote:
> > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > > Greetings,
> > > 
> > > [ deadlocks with btrfs and the recent RT kernels ]
> > > 
> > > I talked with Thomas about this and I think the problem is the
> > > single-reader nature of the RW rwlocks.  The lockdep report below
> > > mentions that btrfs is calling:
> > > 
> > > > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> > > 
> > > In this case, the task has a number of blocking read locks on the btrfs 
> > > buffers,
> > > and we're trying to turn them back into spinning read locks.  Even
> > > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > > lock operation because we were blocking out new writers.
> > > 
> > > If the second task has taken the spinning read lock, it is going to
> > > prevent that clear_path_blocking operation from progressing, even though
> > > it would have worked on a non-RT kernel.
> > > 
> > > The solution should be to make the blocking read locks in btrfs honor the
> > > single-reader semantics.  This means not allowing more than one blocking
> > > reader and not allowing a spinning reader when there is a blocking
> > > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > > single lock, so I wouldn't worry about that part.
> > > 
> > > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > > sure to strictly honor top down locking order during the conversion.  It
> > > only does this when lockdep is enabled because in non-RT kernels we
> > > don't need to worry about it.  For RT we'll want to enable that as well.
> > > 
> > > I'll give this a shot later today.
> > 
> > I took a poke at it.  Did I do something similar to what you had in
> > mind, or just hide behind performance stealing paranoid trylock loops?
> > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> > bat, so it gets posted despite skepticism.
> 
> Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> performance problems earlier on Saturday, did this improve performance?

Yeah, the read_trylock() seems to improve throughput.  That's not
heavily tested, but it certainly looks like it does.  No idea why.

WRT performance, dbench isn't thrilled, but btrfs seems to work just
fine for my routine usage, and spinning rust bucket is being all it can
be.  I hope I don't have to care overly much about dbench's opinon.  It
doesn't make happy multi-thread numbers with btrfs, but those numbers
suddenly look great if you rebase relative to xfs -rt throughput :)

> One other question:
> 
> >  again:
> > +#ifdef CONFIG_PREEMPT_RT_BASE
> > +   while (atomic_read(&eb->blocking_readers))
> > +   cpu_chill();
> > +   while(!read_trylock(&eb->lock))
> > +   cpu_chill();
> > +   if (atomic_read(&eb->blocking_readers)) {
> > +   read_unlock(&eb->lock);
> > +   goto again;
> > +   }
> 
> Why use read_trylock() in a loop instead of just trying to take the
> lock?  Is this an RTism or are there other reasons?

First stab paranoia.  It worked, so I removed it.  It still worked but
lost throughput, removed all my bits leaving only the lockdep bits, it
still worked.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-15 Thread Chris Mason
On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote:
> On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > Greetings,
> > 
> > [ deadlocks with btrfs and the recent RT kernels ]
> > 
> > I talked with Thomas about this and I think the problem is the
> > single-reader nature of the RW rwlocks.  The lockdep report below
> > mentions that btrfs is calling:
> > 
> > > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> > 
> > In this case, the task has a number of blocking read locks on the btrfs 
> > buffers,
> > and we're trying to turn them back into spinning read locks.  Even
> > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > lock operation because we were blocking out new writers.
> > 
> > If the second task has taken the spinning read lock, it is going to
> > prevent that clear_path_blocking operation from progressing, even though
> > it would have worked on a non-RT kernel.
> > 
> > The solution should be to make the blocking read locks in btrfs honor the
> > single-reader semantics.  This means not allowing more than one blocking
> > reader and not allowing a spinning reader when there is a blocking
> > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > single lock, so I wouldn't worry about that part.
> > 
> > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > sure to strictly honor top down locking order during the conversion.  It
> > only does this when lockdep is enabled because in non-RT kernels we
> > don't need to worry about it.  For RT we'll want to enable that as well.
> > 
> > I'll give this a shot later today.
> 
> I took a poke at it.  Did I do something similar to what you had in
> mind, or just hide behind performance stealing paranoid trylock loops?
> Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> bat, so it gets posted despite skepticism.

Great, thanks!  I got stuck in bug land on Friday.  You mentioned
performance problems earlier on Saturday, did this improve performance?

One other question:

>  again:
> +#ifdef CONFIG_PREEMPT_RT_BASE
> + while (atomic_read(&eb->blocking_readers))
> + cpu_chill();
> + while(!read_trylock(&eb->lock))
> + cpu_chill();
> + if (atomic_read(&eb->blocking_readers)) {
> + read_unlock(&eb->lock);
> + goto again;
> + }

Why use read_trylock() in a loop instead of just trying to take the
lock?  Is this an RTism or are there other reasons?  

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-14 Thread Mike Galbraith
On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote:

> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.

Hm, _seems_ that alone is enough prevent deadlock.  Throughput really
sucks though.  The other bits of my stab bump throughput for dbench 128
from ~200 mb/s to ~360 mb/s (appears it's the paranoid trylock loops).
ext3 does 775 mb/s with the same kernel.  Or, dbench 8 on ext3 gives
~1800 mb/s and ~480 mb/s btrfs.  Not exactly wonderful.

Hohum, guess I'll wait and see what your patch looks like.  I bet it'll
work a lot better than mine does :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-14 Thread Mike Galbraith
On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > Greetings,
> 
> [ deadlocks with btrfs and the recent RT kernels ]
> 
> I talked with Thomas about this and I think the problem is the
> single-reader nature of the RW rwlocks.  The lockdep report below
> mentions that btrfs is calling:
> 
> > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> 
> In this case, the task has a number of blocking read locks on the btrfs 
> buffers,
> and we're trying to turn them back into spinning read locks.  Even
> though btrfs is taking the read rwlock, it doesn't think of this as a new
> lock operation because we were blocking out new writers.
> 
> If the second task has taken the spinning read lock, it is going to
> prevent that clear_path_blocking operation from progressing, even though
> it would have worked on a non-RT kernel.
> 
> The solution should be to make the blocking read locks in btrfs honor the
> single-reader semantics.  This means not allowing more than one blocking
> reader and not allowing a spinning reader when there is a blocking
> reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> single lock, so I wouldn't worry about that part.
> 
> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.
> 
> I'll give this a shot later today.

I took a poke at it.  Did I do something similar to what you had in
mind, or just hide behind performance stealing paranoid trylock loops?
Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
bat, so it gets posted despite skepticism.

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4106264..ae47cc2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
int i;
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
/* lockdep really cares that we take all of these spinlocks
 * in the right order.  If any of the locks in the path are not
 * currently blocking, it is going to complain.  So, make really
@@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path 
*p,
}
}
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
if (held)
btrfs_clear_lock_blocking_rw(held, held_rw);
 #endif
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 272f911..4db7c14 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ctree.h"
 #include "extent_io.h"
@@ -97,7 +98,18 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, 
int rw)
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
 again:
+#ifdef CONFIG_PREEMPT_RT_BASE
+   while (atomic_read(&eb->blocking_readers))
+   cpu_chill();
+   while(!read_trylock(&eb->lock))
+   cpu_chill();
+   if (atomic_read(&eb->blocking_readers)) {
+   read_unlock(&eb->lock);
+   goto again;
+   }
+#else
read_lock(&eb->lock);
+#endif
if (atomic_read(&eb->blocking_writers) &&
current->pid == eb->lock_owner) {
/*
@@ -131,11 +143,26 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
if (atomic_read(&eb->blocking_writers))
return 0;
 
+#ifdef CONFIG_PREEMPT_RT_BASE
+   if (atomic_read(&eb->blocking_readers))
+   return 0;
+   while(!read_trylock(&eb->lock))
+   cpu_chill();
+#else
read_lock(&eb->lock);
+#endif
+
if (atomic_read(&eb->blocking_writers)) {
read_unlock(&eb->lock);
return 0;
}
+
+#ifdef CONFIG_PREEMPT_RT_BASE
+   if (atomic_read(&eb->blocking_readers)) {
+   read_unlock(&eb->lock);
+   return 0;
+   }
+#endif
atomic_inc(&eb->read_locks);
atomic_inc(&eb->spinning_readers);
return 1;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Thomas Gleixner
Chris,

On Fri, 13 Jul 2012, Chris Mason wrote:
> On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > Greetings,
> 
> [ deadlocks with btrfs and the recent RT kernels ]
> 
> I talked with Thomas about this and I think the problem is the
> single-reader nature of the RW rwlocks.  The lockdep report below
> mentions that btrfs is calling:
> 
> > [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
> 
> In this case, the task has a number of blocking read locks on the btrfs 
> buffers,
> and we're trying to turn them back into spinning read locks.  Even
> though btrfs is taking the read rwlock, it doesn't think of this as a new
> lock operation because we were blocking out new writers.
> 
> If the second task has taken the spinning read lock, it is going to
> prevent that clear_path_blocking operation from progressing, even though
> it would have worked on a non-RT kernel.
> 
> The solution should be to make the blocking read locks in btrfs honor the
> single-reader semantics.  This means not allowing more than one blocking
> reader and not allowing a spinning reader when there is a blocking
> reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> single lock, so I wouldn't worry about that part.
> 
> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.

thanks for explaining this. I really got lost in that code completely.
 
> I'll give this a shot later today.

Cool.

Aside of that I'm still pondering to experiment with a non-pi variant
of rw locks which allows multiple readers. For such cases as btrfs I
think they would be well suited and avoid the performance overhead of
the single writer restriction. But that's not going to happen before
my vacation, so we'll stick with your workaround for now and let Mike
beat the hell out of it.

Thanks,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Chris Mason
On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> Greetings,

[ deadlocks with btrfs and the recent RT kernels ]

I talked with Thomas about this and I think the problem is the
single-reader nature of the RW rwlocks.  The lockdep report below
mentions that btrfs is calling:

> [  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70

In this case, the task has a number of blocking read locks on the btrfs buffers,
and we're trying to turn them back into spinning read locks.  Even
though btrfs is taking the read rwlock, it doesn't think of this as a new
lock operation because we were blocking out new writers.

If the second task has taken the spinning read lock, it is going to
prevent that clear_path_blocking operation from progressing, even though
it would have worked on a non-RT kernel.

The solution should be to make the blocking read locks in btrfs honor the
single-reader semantics.  This means not allowing more than one blocking
reader and not allowing a spinning reader when there is a blocking
reader.  Strictly speaking btrfs shouldn't need recursive readers on a
single lock, so I wouldn't worry about that part.

There is also a chunk of code in btrfs_clear_path_blocking that makes
sure to strictly honor top down locking order during the conversion.  It
only does this when lockdep is enabled because in non-RT kernels we
don't need to worry about it.  For RT we'll want to enable that as well.

I'll give this a shot later today.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Mike Galbraith
On Fri, 2012-07-13 at 06:47 -0400, Chris Mason wrote: 
> On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote:
> > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > > > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > > > initializing the lock and then freeing the original structure.
> > > > > > 
> > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > > > copy, but I have no idea whether that's the place we are looking 
> > > > > > for.
> > > > > 
> > > > > Thanks a bunch Thomas.  I doubt I would have ever figured out that 
> > > > > lala
> > > > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > > > any time soon.  Box not only survived a few thousand xfstests 006 
> > > > > runs,
> > > > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > > > 
> > > > Cute. It think that the lock copying caused the deadlock problem as
> > > > the list pointed to the wrong place, so we might have ended up with
> > > > following down the wrong chain when walking the list as long as the
> > > > original struct was not freed. That beast is freed under RCU so there
> > > > could be a rcu read side critical section fiddling with the old lock
> > > > and cause utter confusion.
> > > 
> > > Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> > > pesky rwlocks.
> > 
> > Ah. So 3.0 is not having those rwlock thingies. Bummer.
> >  
> > > > /me goes and writes a nastigram^W proper changelog
> > > > 
> > > > > btrfs still locks up in my enterprise kernel, so I suppose I had 
> > > > > better
> > > > > plug your fix into 3.4-rt and see what happens, and go beat hell out 
> > > > > of
> > > > > virgin 3.0-rt again to be sure box really really survives dbench.
> > > > 
> > > > A test against 3.4-rt sans enterprise mess might be nice as well.
> > > 
> > > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> > > 
> > > Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> > > deadlocks, so I have another adventure in my future even if I figure out
> > > wth to do about rwlocks.
> > 
> > Hrmpf. /me goes to stare into fs/btrfs/ some more.
> 
> Please post the deadlocks here, I'll help ;)

This is the one from top of thread.  Below that is without the deadlock
detector.

[  692.857165] 
[  692.863963] [ BUG: circular locking deadlock detected! ]
[  692.869264] Not tainted
[  692.871708] 
[  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
[  692.877009] 
[  692.885183] 
[  692.885184] 1) dbench/7937 is trying to acquire this lock:
[  692.892149]  [88014d6aea80] {&(&eb->lock)->lock}
[  692.897102] .. ->owner: 880175808501
[  692.901018] .. held by:   btrfs-delayed-m: 1404 [880175808500, 120]
[  692.907657] 
[  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
[  692.914797]  [88014bf58d60] {&(&eb->lock)->lock}
[  692.919751] .. ->owner: 880175186101
[  692.923672] .. held by:dbench: 7937 [880175186100, 120]
[  692.930309] 
[  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:
[  692.930310] 
[  692.938504]  880177575aa0 0046 88014bf58d60 
fb00
[  692.938507]  fb00 880177575fd8 fb00 
880177574000
[  692.938509]  880177575fd8 fb00 88017662f240 
880175808500
[  692.960635] Call Trace:
[  692.963085]  [] schedule+0x29/0x90
[  692.963087]  [] rt_spin_lock_slowlock+0xfd/0x330
[  692.963090]  [] __rt_spin_lock+0x9/0x10
[  692.963092]  [] rt_read_lock+0x27/0x40
[  692.963096]  [] btrfs_clear_lock_blocking_rw+0x6f/0x180
[  692.963099]  [] btrfs_clear_path_blocking+0x32/0x70
[  692.963102]  [] btrfs_search_slot+0x6b2/0x810
[  692.963105]  [] btrfs_lookup_inode+0x2a/0xa0
[  692.963107]  [] ? rt_mutex_lock+0x12/0x20
[  692.963111]  [] btrfs_update_delayed_inode+0x6c/0x160
[  692.963113]  [] ? _mutex_unlock+0x9/0x10
[  692.963116]  [] 
btrfs_async_run_delayed_node_done+0x182/0x1a0
[  692.963119]  [] worker_loop+0xaf/0x430
[  692.963121]  [] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963123]  [] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963127]  [] kthread+0xae/0xc0
[  692.963129]  [] ? schedule+0x29/0x90
[  692.963133]  [] ? __switch_to+0x14c/0x410
[  692.963137]  [] ? finish_task_switch+0x54/0xd0
[  692.963140]  [] kernel_thread_helper+0x4/0x10
[  692.963143]  [] ? __init_kthread_worker+0x50/0x50
[  692.963145]  [] ? gs_change+0x13/0x13
[  692.963146] 
[  692.963147] dbench/7937's [current] stackdump:
[  692.963147] 
[  693.098724] Pid: 7937, comm: dbench Not tainted 3.4.4-rt13 #25
[  693.104544

Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Chris Mason
On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > > initializing the lock and then freeing the original structure.
> > > > > 
> > > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > > copy, but I have no idea whether that's the place we are looking for.
> > > > 
> > > > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > > 
> > > Cute. It think that the lock copying caused the deadlock problem as
> > > the list pointed to the wrong place, so we might have ended up with
> > > following down the wrong chain when walking the list as long as the
> > > original struct was not freed. That beast is freed under RCU so there
> > > could be a rcu read side critical section fiddling with the old lock
> > > and cause utter confusion.
> > 
> > Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> > pesky rwlocks.
> 
> Ah. So 3.0 is not having those rwlock thingies. Bummer.
>  
> > > /me goes and writes a nastigram^W proper changelog
> > > 
> > > > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > > > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > > > virgin 3.0-rt again to be sure box really really survives dbench.
> > > 
> > > A test against 3.4-rt sans enterprise mess might be nice as well.
> > 
> > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> > 
> > Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> > deadlocks, so I have another adventure in my future even if I figure out
> > wth to do about rwlocks.
> 
> Hrmpf. /me goes to stare into fs/btrfs/ some more.

Please post the deadlocks here, I'll help ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Thomas Gleixner
On Fri, 13 Jul 2012, Mike Galbraith wrote:
> On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > initializing the lock and then freeing the original structure.
> > > > 
> > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > copy, but I have no idea whether that's the place we are looking for.
> > > 
> > > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > 
> > Cute. It think that the lock copying caused the deadlock problem as
> > the list pointed to the wrong place, so we might have ended up with
> > following down the wrong chain when walking the list as long as the
> > original struct was not freed. That beast is freed under RCU so there
> > could be a rcu read side critical section fiddling with the old lock
> > and cause utter confusion.
> 
> Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> pesky rwlocks.

Ah. So 3.0 is not having those rwlock thingies. Bummer.
 
> > /me goes and writes a nastigram^W proper changelog
> > 
> > > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > > virgin 3.0-rt again to be sure box really really survives dbench.
> > 
> > A test against 3.4-rt sans enterprise mess might be nice as well.
> 
> Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> 
> Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> deadlocks, so I have another adventure in my future even if I figure out
> wth to do about rwlocks.

Hrmpf. /me goes to stare into fs/btrfs/ some more.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Mike Galbraith
On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > Bingo, that makes it more likely that this is caused by copying w/o
> > > initializing the lock and then freeing the original structure.
> > > 
> > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > copy, but I have no idea whether that's the place we are looking for.
> > 
> > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> 
> Cute. It think that the lock copying caused the deadlock problem as
> the list pointed to the wrong place, so we might have ended up with
> following down the wrong chain when walking the list as long as the
> original struct was not freed. That beast is freed under RCU so there
> could be a rcu read side critical section fiddling with the old lock
> and cause utter confusion.

Virgin 3.0-rt appears to really be solid.  But then it doesn't have
pesky rwlocks.

> /me goes and writes a nastigram^W proper changelog
> 
> > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > virgin 3.0-rt again to be sure box really really survives dbench.
> 
> A test against 3.4-rt sans enterprise mess might be nice as well.

Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).

Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
deadlocks, so I have another adventure in my future even if I figure out
wth to do about rwlocks.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Thomas Gleixner
On Thu, 12 Jul 2012, Chris Mason wrote:
> On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote:
> > On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > > crash> struct rt_mutex 0x8801770601c8
> > > struct rt_mutex {
> > >   wait_lock = {
> > > raw_lock = {
> > >   slock = 7966
> > > }
> > >   }, 
> > >   wait_list = {
> > > node_list = {
> > >   next = 0x880175eedbe0, 
> > >   prev = 0x880175eedbe0
> > > }, 
> > > rawlock = 0x880175eedbd8, 
> > 
> > Urgh. Here is something completely wrong. That should point to
> > wait_lock, i.e. the rt_mutex itself, but that points into lala land.
> 
> This is probably the memcpy you found later this morning, right?

As Mike found out, it looks like the culprit.
 
> The reader/writer part in btrfs is just an optimization.  If we need
> them to be all writer locks for RT purposes, that's not a problem.
> 
> But, before we go down that road, we do annotations trying
> to make sure lockdep doesn't get confused about lock classes.  Basically
> the tree is locked level by level.  So its safe to take eb->lock while
> holding eb->lock as long as you follow the rules.
> 
> Are additional annotations required for RT?

I don't think so. I'm sure it has been caused by the lock copying as
well. Walking the wrong list can cause complete confusion all over the
place. So lets wait for Mike beating the hell out of it.

Find the patch with a proper changelog below.

Thanks,

tglx
-->
From: Thomas Gleixner 
Date: Thu, 12 Jul 2012 15:30:02 +0200
Subject: btrfs: Init io_lock after cloning btrfs device struct

__btrfs_close_devices() clones btrfs device structs with
memcpy(). Some of the fields in the clone are reinitialized, but it's
missing to init io_lock. In mainline this goes unnoticed, but on RT it
leaves the plist pointing to the original about to be freed lock
struct.

Initialize io_lock after cloning, so no references to the original
struct are left.

Reported-and-tested-by: Mike Galbraith 
Cc: sta...@vger.kernel.org
Signed-off-by: Thomas Gleixner 
---
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43baaf0..06c8ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
new_device->writeable = 0;
new_device->in_fs_metadata = 0;
new_device->can_discard = 0;
+   spin_lock_init(&new_device->io_lock);
list_replace_rcu(&device->dev_list, &new_device->dev_list);
 
call_rcu(&device->rcu, free_device);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-13 Thread Thomas Gleixner
On Fri, 13 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > Bingo, that makes it more likely that this is caused by copying w/o
> > initializing the lock and then freeing the original structure.
> > 
> > A quick check for memcpy finds that __btrfs_close_devices() does a
> > memcpy of btrfs_device structs w/o initializing the lock in the new
> > copy, but I have no idea whether that's the place we are looking for.
> 
> Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> land resulted from _copying_ a lock.  That's one I won't be forgetting
> any time soon.  Box not only survived a few thousand xfstests 006 runs,
> dbench seemed disinterested in deadlocking virgin 3.0-rt.

Cute. It think that the lock copying caused the deadlock problem as
the list pointed to the wrong place, so we might have ended up with
following down the wrong chain when walking the list as long as the
original struct was not freed. That beast is freed under RCU so there
could be a rcu read side critical section fiddling with the old lock
and cause utter confusion.

/me goes and writes a nastigram^W proper changelog

> btrfs still locks up in my enterprise kernel, so I suppose I had better
> plug your fix into 3.4-rt and see what happens, and go beat hell out of
> virgin 3.0-rt again to be sure box really really survives dbench.

A test against 3.4-rt sans enterprise mess might be nice as well.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > > rawlock points to ...968 and the node_list to ...970.
> > > 
> > > struct rt_mutex {
> > > raw_spinlock_t  wait_lock;
> > > struct plist_head   wait_list;
> > > 
> > > The raw_lock pointer of the plist_head is initialized in
> > > __rt_mutex_init() so it points to wait_lock. 
> > > 
> > > Can you check the offset of wait_list vs. the rt_mutex itself?
> > > 
> > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > > looks like a copied lock with stale pointers to hell. Eew.
> > 
> > crash> struct rt_mutex -o
> > struct rt_mutex {
> >[0] raw_spinlock_t wait_lock;
> >[8] struct plist_head wait_list;
> 
> Bingo, that makes it more likely that this is caused by copying w/o
> initializing the lock and then freeing the original structure.
> 
> A quick check for memcpy finds that __btrfs_close_devices() does a
> memcpy of btrfs_device structs w/o initializing the lock in the new
> copy, but I have no idea whether that's the place we are looking for.

Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
land resulted from _copying_ a lock.  That's one I won't be forgetting
any time soon.  Box not only survived a few thousand xfstests 006 runs,
dbench seemed disinterested in deadlocking virgin 3.0-rt.

btrfs still locks up in my enterprise kernel, so I suppose I had better
plug your fix into 3.4-rt and see what happens, and go beat hell out of
virgin 3.0-rt again to be sure box really really survives dbench.

>   tglx
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 43baaf0..06c8ced 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
> *fs_devices)
>   new_device->writeable = 0;
>   new_device->in_fs_metadata = 0;
>   new_device->can_discard = 0;
> + spin_lock_init(&new_device->io_lock);
>   list_replace_rcu(&device->dev_list, &new_device->dev_list);
>  
>   call_rcu(&device->rcu, free_device);
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Chris Mason
On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote:
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > crash> struct rt_mutex 0x8801770601c8
> > struct rt_mutex {
> >   wait_lock = {
> > raw_lock = {
> >   slock = 7966
> > }
> >   }, 
> >   wait_list = {
> > node_list = {
> >   next = 0x880175eedbe0, 
> >   prev = 0x880175eedbe0
> > }, 
> > rawlock = 0x880175eedbd8, 
> 
> Urgh. Here is something completely wrong. That should point to
> wait_lock, i.e. the rt_mutex itself, but that points into lala land.

This is probably the memcpy you found later this morning, right?

>  
> > Reproducer2: dbench -t 30 8
> > 
> > [  692.857164] 
> > [  692.857165] 
> > [  692.863963] [ BUG: circular locking deadlock detected! ]
> > [  692.869264] Not tainted
> > [  692.871708] 
> > [  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
> > [  692.877009] 
> > [  692.885183] 
> > [  692.885184] 1) dbench/7937 is trying to acquire this lock:
> > [  692.892149]  [88014d6aea80] {&(&eb->lock)->lock}
> > [  692.897102] .. ->owner: 880175808501
> > [  692.901018] .. held by:   btrfs-delayed-m: 1404 [880175808500, 120]
> > [  692.907657] 
> > [  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
> > [  692.914797]  [88014bf58d60] {&(&eb->lock)->lock}
> > [  692.919751] .. ->owner: 880175186101
> > [  692.923672] .. held by:dbench: 7937 [880175186100, 120]
> > [  692.930309] 
> > [  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:
> 
> Hrmm. Both locks are rw_locks and we prevent multiple readers for the
> known reasons in RT. No idea how to deal with that one :(

The reader/writer part in btrfs is just an optimization.  If we need
them to be all writer locks for RT purposes, that's not a problem.

But, before we go down that road, we do annotations trying
to make sure lockdep doesn't get confused about lock classes.  Basically
the tree is locked level by level.  So its safe to take eb->lock while
holding eb->lock as long as you follow the rules.

Are additional annotations required for RT?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 15:48 +0200, Mike Galbraith wrote: 
> On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: 
> > On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > copy, but I have no idea whether that's the place we are looking for.
> > > 
> > > 
> > > Cool, you found one, thanks!  I'm setting boobytraps.
> > > 
> > > Um, correction, box says I'm setting _buggy_ boobytraps :)
> > > 
> > > Tomorrow-man will test this and frob traps anew.
> > 
> > What kind of test setup do you have? i.e. raid, single disk ...
> 
> Yeah, megaraid sas.. x3550 M3.

(one disk for OS, one disk for xfstests to mangle)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > copy, but I have no idea whether that's the place we are looking for.
> > 
> > 
> > Cool, you found one, thanks!  I'm setting boobytraps.
> > 
> > Um, correction, box says I'm setting _buggy_ boobytraps :)
> > 
> > Tomorrow-man will test this and frob traps anew.
> 
> What kind of test setup do you have? i.e. raid, single disk ...

Yeah, megaraid sas.. x3550 M3.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Thomas Gleixner
On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > A quick check for memcpy finds that __btrfs_close_devices() does a
> > memcpy of btrfs_device structs w/o initializing the lock in the new
> > copy, but I have no idea whether that's the place we are looking for.
> 
> 
> Cool, you found one, thanks!  I'm setting boobytraps.
> 
> Um, correction, box says I'm setting _buggy_ boobytraps :)
> 
> Tomorrow-man will test this and frob traps anew.

What kind of test setup do you have? i.e. raid, single disk ...

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > > rawlock points to ...968 and the node_list to ...970.
> > > 
> > > struct rt_mutex {
> > > raw_spinlock_t  wait_lock;
> > > struct plist_head   wait_list;
> > > 
> > > The raw_lock pointer of the plist_head is initialized in
> > > __rt_mutex_init() so it points to wait_lock. 
> > > 
> > > Can you check the offset of wait_list vs. the rt_mutex itself?
> > > 
> > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > > looks like a copied lock with stale pointers to hell. Eew.
> > 
> > crash> struct rt_mutex -o
> > struct rt_mutex {
> >[0] raw_spinlock_t wait_lock;
> >[8] struct plist_head wait_list;
> 
> Bingo, that makes it more likely that this is caused by copying w/o
> initializing the lock and then freeing the original structure.
> 
> A quick check for memcpy finds that __btrfs_close_devices() does a
> memcpy of btrfs_device structs w/o initializing the lock in the new
> copy, but I have no idea whether that's the place we are looking for.


Cool, you found one, thanks!  I'm setting boobytraps.

Um, correction, box says I'm setting _buggy_ boobytraps :)

Tomorrow-man will test this and frob traps anew.

> 
> Thanks,
> 
>   tglx
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 43baaf0..06c8ced 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
> *fs_devices)
>   new_device->writeable = 0;
>   new_device->in_fs_metadata = 0;
>   new_device->can_discard = 0;
> + spin_lock_init(&new_device->io_lock);
>   list_replace_rcu(&device->dev_list, &new_device->dev_list);
>  
>   call_rcu(&device->rcu, free_device);
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Thomas Gleixner
On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > rawlock points to ...968 and the node_list to ...970.
> > 
> > struct rt_mutex {
> > raw_spinlock_t  wait_lock;
> > struct plist_head   wait_list;
> > 
> > The raw_lock pointer of the plist_head is initialized in
> > __rt_mutex_init() so it points to wait_lock. 
> > 
> > Can you check the offset of wait_list vs. the rt_mutex itself?
> > 
> > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > looks like a copied lock with stale pointers to hell. Eew.
> 
> crash> struct rt_mutex -o
> struct rt_mutex {
>[0] raw_spinlock_t wait_lock;
>[8] struct plist_head wait_list;

Bingo, that makes it more likely that this is caused by copying w/o
initializing the lock and then freeing the original structure.

A quick check for memcpy finds that __btrfs_close_devices() does a
memcpy of btrfs_device structs w/o initializing the lock in the new
copy, but I have no idea whether that's the place we are looking for.

Thanks,

tglx

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43baaf0..06c8ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
new_device->writeable = 0;
new_device->in_fs_metadata = 0;
new_device->can_discard = 0;
+   spin_lock_init(&new_device->io_lock);
list_replace_rcu(&device->dev_list, &new_device->dev_list);
 
call_rcu(&device->rcu, free_device);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> > > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > > > Greetings,
> > > > 
> > > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > > > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > > > 
> > > > Both are indeed alive and well, ie I didn't break it, nor did the
> > > > zillion patches in enterprise base kernel, so others may have an
> > > > opportunity to meet these critters up close and personal as well.
> > > 
> > > 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> > > neither, so with enough re-integrate investment, it might be bisectable.
> > 
> > Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
> > again to run hefty test over lunch, it didn't survive 1 xfstests 006,
> > much less hundreds.
> > 
> > crash> bt
> > PID: 7604   TASK: 880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
> >  #0 [88017455d9c8] machine_kexec at 81025794
> >  #1 [88017455da28] crash_kexec at 8109781d
> >  #2 [88017455daf8] panic at 814a0661
> >  #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f
> >  #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670
> >  #5 [88017455dca8] rt_spin_lock at 814a2db9
> >  #6 [88017455dcb8] schedule_bio at 81243133
> >  #7 [88017455dcf8] btrfs_map_bio at 812477be
> >  #8 [88017455dd68] __btree_submit_bio_done at 812152f6
> >  #9 [88017455dd78] run_one_async_done at 812148fa
> > #10 [88017455dd98] run_ordered_completions at 812493e8
> > #11 [88017455ddd8] worker_loop at 81249dc9
> > #12 [88017455de88] kthread at 81070266
> > #13 [88017455df48] kernel_thread_helper at 814a9be4
> > crash> struct rt_mutex 0x880174530108
> > struct rt_mutex {
> >   wait_lock = {
> > raw_lock = {
> >   slock = 7966
> > }
> >   }, 
> >   wait_list = {
> > node_list = {
> >   next = 0x880175ecc970, 
> >   prev = 0x880175ecc970
> > }, 
> > rawlock = 0x880175ecc968, 
> 
> Pointer into lala land again.

Yeah, and freed again.

> rawlock points to ...968 and the node_list to ...970.
> 
> struct rt_mutex {
> raw_spinlock_t  wait_lock;
> struct plist_head   wait_list;
> 
> The raw_lock pointer of the plist_head is initialized in
> __rt_mutex_init() so it points to wait_lock. 
> 
> Can you check the offset of wait_list vs. the rt_mutex itself?
> 
> I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> looks like a copied lock with stale pointers to hell. Eew.

crash> struct rt_mutex -o
struct rt_mutex {
   [0] raw_spinlock_t wait_lock;
   [8] struct plist_head wait_list;
  [40] struct task_struct *owner;
  [48] int save_state;
  [56] const char *file;
  [64] const char *name;
  [72] int line;
  [80] void *magic;
}
SIZE: 88


-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Thomas Gleixner
On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > > Greetings,
> > > 
> > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > > 
> > > Both are indeed alive and well, ie I didn't break it, nor did the
> > > zillion patches in enterprise base kernel, so others may have an
> > > opportunity to meet these critters up close and personal as well.
> > 
> > 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> > neither, so with enough re-integrate investment, it might be bisectable.
> 
> Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
> again to run hefty test over lunch, it didn't survive 1 xfstests 006,
> much less hundreds.
> 
> crash> bt
> PID: 7604   TASK: 880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
>  #0 [88017455d9c8] machine_kexec at 81025794
>  #1 [88017455da28] crash_kexec at 8109781d
>  #2 [88017455daf8] panic at 814a0661
>  #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f
>  #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670
>  #5 [88017455dca8] rt_spin_lock at 814a2db9
>  #6 [88017455dcb8] schedule_bio at 81243133
>  #7 [88017455dcf8] btrfs_map_bio at 812477be
>  #8 [88017455dd68] __btree_submit_bio_done at 812152f6
>  #9 [88017455dd78] run_one_async_done at 812148fa
> #10 [88017455dd98] run_ordered_completions at 812493e8
> #11 [88017455ddd8] worker_loop at 81249dc9
> #12 [88017455de88] kthread at 81070266
> #13 [88017455df48] kernel_thread_helper at 814a9be4
> crash> struct rt_mutex 0x880174530108
> struct rt_mutex {
>   wait_lock = {
> raw_lock = {
>   slock = 7966
> }
>   }, 
>   wait_list = {
> node_list = {
>   next = 0x880175ecc970, 
>   prev = 0x880175ecc970
> }, 
> rawlock = 0x880175ecc968, 

Pointer into lala land again.

rawlock points to ...968 and the node_list to ...970.

struct rt_mutex {
raw_spinlock_t  wait_lock;
struct plist_head   wait_list;

The raw_lock pointer of the plist_head is initialized in
__rt_mutex_init() so it points to wait_lock. 

Can you check the offset of wait_list vs. the rt_mutex itself?

I wouldn't be surprised if it's exactly 8 bytes. And then this thing
looks like a copied lock with stale pointers to hell. Eew.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Thomas Gleixner
On Thu, 12 Jul 2012, Mike Galbraith wrote:
> crash> struct rt_mutex 0x8801770601c8
> struct rt_mutex {
>   wait_lock = {
> raw_lock = {
>   slock = 7966
> }
>   }, 
>   wait_list = {
> node_list = {
>   next = 0x880175eedbe0, 
>   prev = 0x880175eedbe0
> }, 
> rawlock = 0x880175eedbd8, 

Urgh. Here is something completely wrong. That should point to
wait_lock, i.e. the rt_mutex itself, but that points into lala land.

> spinlock = 0x0
>   }, 
>   owner = 0x1, 
>   save_state = 0, 
>   file = 0x0, 
>   name = 0x81781b9b "&(&device->io_lock)->lock", 
>   line = 0, 
>   magic = 0x0
> }
> crash> struct list_head 0x880175eedbe0
> struct list_head {
>   next = 0x6b6b6b6b6b6b6b6b, 
>   prev = 0x6b6b6b6b6b6b6b6b
> }

That's POISON_FREE. How the heck can this happen ?

 
> Reproducer2: dbench -t 30 8
> 
> [  692.857164] 
> [  692.857165] 
> [  692.863963] [ BUG: circular locking deadlock detected! ]
> [  692.869264] Not tainted
> [  692.871708] 
> [  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
> [  692.877009] 
> [  692.885183] 
> [  692.885184] 1) dbench/7937 is trying to acquire this lock:
> [  692.892149]  [88014d6aea80] {&(&eb->lock)->lock}
> [  692.897102] .. ->owner: 880175808501
> [  692.901018] .. held by:   btrfs-delayed-m: 1404 [880175808500, 120]
> [  692.907657] 
> [  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
> [  692.914797]  [88014bf58d60] {&(&eb->lock)->lock}
> [  692.919751] .. ->owner: 880175186101
> [  692.923672] .. held by:dbench: 7937 [880175186100, 120]
> [  692.930309] 
> [  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:

Hrmm. Both locks are rw_locks and we prevent multiple readers for the
known reasons in RT. No idea how to deal with that one :(

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > Greetings,
> > 
> > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > 
> > Both are indeed alive and well, ie I didn't break it, nor did the
> > zillion patches in enterprise base kernel, so others may have an
> > opportunity to meet these critters up close and personal as well.
> 
> 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> neither, so with enough re-integrate investment, it might be bisectable.

Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
again to run hefty test over lunch, it didn't survive 1 xfstests 006,
much less hundreds.

crash> bt
PID: 7604   TASK: 880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
 #0 [88017455d9c8] machine_kexec at 81025794
 #1 [88017455da28] crash_kexec at 8109781d
 #2 [88017455daf8] panic at 814a0661
 #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f
 #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670
 #5 [88017455dca8] rt_spin_lock at 814a2db9
 #6 [88017455dcb8] schedule_bio at 81243133
 #7 [88017455dcf8] btrfs_map_bio at 812477be
 #8 [88017455dd68] __btree_submit_bio_done at 812152f6
 #9 [88017455dd78] run_one_async_done at 812148fa
#10 [88017455dd98] run_ordered_completions at 812493e8
#11 [88017455ddd8] worker_loop at 81249dc9
#12 [88017455de88] kthread at 81070266
#13 [88017455df48] kernel_thread_helper at 814a9be4
crash> struct rt_mutex 0x880174530108
struct rt_mutex {
  wait_lock = {
raw_lock = {
  slock = 7966
}
  }, 
  wait_list = {
node_list = {
  next = 0x880175ecc970, 
  prev = 0x880175ecc970
}, 
rawlock = 0x880175ecc968, 
spinlock = 0x0
  }, 
  owner = 0x1, 
  save_state = 0, 
  file = 0x0, 
  name = 0x81763d02 "&(&device->io_lock)->lock", 
  line = 0, 
  magic = 0x0
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

2012-07-12 Thread Mike Galbraith
On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> Greetings,
> 
> I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> checked to see if they're alive in virgin latest/greatest rt kernel.  
> 
> Both are indeed alive and well, ie I didn't break it, nor did the
> zillion patches in enterprise base kernel, so others may have an
> opportunity to meet these critters up close and personal as well.

3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
neither, so with enough re-integrate investment, it might be bisectable.

Rummaging in btrfs, that begins to look down right attractive ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/