Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote: > On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: > > On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote: > > > > > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > > > > > priority process preempted a lower priority process that holds the > > > > > same > > > > > lock, it will deadlock. > > > > > > > > Hm, how, it's doing cpu_chill()? > > > > > > 'course PI is toast, so *poof*. Since just enabling the lockdep bits > > > seems to fix it up, maybe that's the patchlet to submit (less is more). > > > > There's that too. But the issue I was talking about is with all trylock > > loops. As holding an rt-mutex now disables migration, if a high priority > > process preempts a task that holds the lock, and then the high prio task > > starts spinning waiting for that lock to release, the lower priority > > process will never get to run to release it. The cpu_chill() doesn't > > help. > > Hrm. I better go make a testcase, this one definitely wants pounding > through thick skull. > > I think all of the chilling in patchlet is really ugly anyway, so would > prefer to trash it all, just enable the lockdep bits. If it turns out > we really do need to bounce off of counts, go get a bigger hammer when > the need arises. For the nonce, the pre-installed hammer _seemed_ big > enough for the job. All night dbench session, and all day doing many full xfstests runs (what will run on btrfs), fsstress -p64, and generic beating says the pre-installed tool is fine all by itself, so here comes a zero line patch.. the second best kind ;-) rt,fs,btrfs: fix rt deadlock on extent_buffer->lock Trivially repeatable deadlock is cured by enabling lockdep code in btrfs_clear_path_blocking() as suggested by Chris Mason. He also suggested restricting blocking reader count to one, and not allowing a spinning reader while blocking reader exists. This has proven to be unnecessary, the strict lock order enforcement is enough.. or rather that's my box's opinion after long hours of hard pounding. Signed-off-by: Mike Galbraith Cc: Chris Mason diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 2e66786..1f71eb0 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -72,7 +72,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p, { int i; -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE)) /* lockdep really cares that we take all of these spinlocks * in the right order. If any of the locks in the path are not * currently blocking, it is going to complain. So, make really @@ -89,7 +89,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p, btrfs_clear_lock_blocking(p->nodes[i]); } -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE)) if (held) btrfs_clear_lock_blocking(held); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Tue, 2012-07-17 at 00:34 -0400, Steven Rostedt wrote: > On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote: > > > Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as > > msleep(1) on RT, which would keep a deadlock from happening. > > Perhaps cpu_chill() isn't a good name, as it doesn't really explain what > is happening. Perhaps one of the following? > > cpu_rest() > cpu_sleep() > cpu_deep_relax() > cpu_dream() > cpu_hypnotize() ( cpu_waste_truckloads_of_time();) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote: > On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote: > > > > > There's that too. But the issue I was talking about is with all trylock > > > loops. As holding an rt-mutex now disables migration, if a high priority > > > process preempts a task that holds the lock, and then the high prio task > > > starts spinning waiting for that lock to release, the lower priority > > > process will never get to run to release it. The cpu_chill() doesn't > > > help. > > > > Hrm. I better go make a testcase, this one definitely wants pounding > > through thick skull. > > Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as > msleep(1) on RT, which would keep a deadlock from happening. Whew! There are no stars and moons on my pointy hat, just plain white cone, so you had me worried I was missing something critical there. > It doesn't explain the performance enhancement you get :-/ No, it doesn't. The only thing I can think of is that while folks are timed sleeping, they aren't preempting and interleaving IO as much, but I'm pulling that out of thin air. Timed sleep should be a lot longer than regular wakeup, so to my mind, there should be less interleave due to more thumb twiddling. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote: > Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as > msleep(1) on RT, which would keep a deadlock from happening. Perhaps cpu_chill() isn't a good name, as it doesn't really explain what is happening. Perhaps one of the following? cpu_rest() cpu_sleep() cpu_deep_relax() cpu_dream() cpu_hypnotize() -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote: > > > There's that too. But the issue I was talking about is with all trylock > > loops. As holding an rt-mutex now disables migration, if a high priority > > process preempts a task that holds the lock, and then the high prio task > > starts spinning waiting for that lock to release, the lower priority > > process will never get to run to release it. The cpu_chill() doesn't > > help. > > Hrm. I better go make a testcase, this one definitely wants pounding > through thick skull. Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as msleep(1) on RT, which would keep a deadlock from happening. It doesn't explain the performance enhancement you get :-/ -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: > On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote: > > > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > > > > priority process preempted a lower priority process that holds the same > > > > lock, it will deadlock. > > > > > > Hm, how, it's doing cpu_chill()? > > > > 'course PI is toast, so *poof*. Since just enabling the lockdep bits > > seems to fix it up, maybe that's the patchlet to submit (less is more). > > There's that too. But the issue I was talking about is with all trylock > loops. As holding an rt-mutex now disables migration, if a high priority > process preempts a task that holds the lock, and then the high prio task > starts spinning waiting for that lock to release, the lower priority > process will never get to run to release it. The cpu_chill() doesn't > help. Hrm. I better go make a testcase, this one definitely wants pounding through thick skull. I think all of the chilling in patchlet is really ugly anyway, so would prefer to trash it all, just enable the lockdep bits. If it turns out we really do need to bounce off of counts, go get a bigger hammer when the need arises. For the nonce, the pre-installed hammer _seemed_ big enough for the job. What's a good way to beat living hell out of btrfs? I've never been into destructive fs testing, since they usually lived on my one and only disk. x3550 has two, and OS clone has already been sacrificed. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 11:43 -0400, Chris Mason wrote: > On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote: > > Seems btrfs isn't entirely convinced either. > > > > [ 2292.336229] use_block_rsv: 1810 callbacks suppressed > > [ 2292.336231] [ cut here ] > > [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 > > use_block_rsv+0x17d/0x190 [btrfs]() > > [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]- > > [ 2292.336259] btrfs: block rsv returned -28 > > This is unrelated. You got far enough into the benchmark to hit an > ENOSPC warning. This can be ignored (I just deleted it when we used 3.0 > for oracle). Ah great, thanks. I'll whack it in my tree as well then. > re: dbench performance. dbench tends to penalize fairness. I can > imagine RT making it slower in general. It seems to work just fine for my normal workloads, and cyclictest is happy, so I'm happy. Zillion threads is 'keep the pieces' to me ;-) If you think the patch is ok as is, I'll go ahead and submit it after I let dbench hammer on it overnight at least. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote: > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > > > priority process preempted a lower priority process that holds the same > > > lock, it will deadlock. > > > > Hm, how, it's doing cpu_chill()? > > 'course PI is toast, so *poof*. Since just enabling the lockdep bits > seems to fix it up, maybe that's the patchlet to submit (less is more). There's that too. But the issue I was talking about is with all trylock loops. As holding an rt-mutex now disables migration, if a high priority process preempts a task that holds the lock, and then the high prio task starts spinning waiting for that lock to release, the lower priority process will never get to run to release it. The cpu_chill() doesn't help. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 18:26 +0200, Mike Galbraith wrote: > On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: > > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote: > > > > > > Great, thanks! I got stuck in bug land on Friday. You mentioned > > > > performance problems earlier on Saturday, did this improve performance? > > > > > > Yeah, the read_trylock() seems to improve throughput. That's not > > > heavily tested, but it certainly looks like it does. No idea why. > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > > priority process preempted a lower priority process that holds the same > > lock, it will deadlock. > > Hm, how, it's doing cpu_chill()? 'course PI is toast, so *poof*. Since just enabling the lockdep bits seems to fix it up, maybe that's the patchlet to submit (less is more). -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, Jul 16, 2012 at 10:26:08AM -0600, Mike Galbraith wrote: > On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: > > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote: > > > > > > Great, thanks! I got stuck in bug land on Friday. You mentioned > > > > performance problems earlier on Saturday, did this improve performance? > > > > > > Yeah, the read_trylock() seems to improve throughput. That's not > > > heavily tested, but it certainly looks like it does. No idea why. > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > > priority process preempted a lower priority process that holds the same > > lock, it will deadlock. > > Hm, how, it's doing cpu_chill()? > > > I'm not sure why you would get a performance benefit from this, as the > > mutex used is an adaptive one (failure to acquire the lock will only > > sleep if preempted or if the owner is not running). > > I'm not attached to it, can whack it in a heartbeat.. especially so it > the thing can deadlock. I've seen enough of those of late. > > > We should look at why this performs better (if it really does). > > Not sure it really does, there's variance, but it looked like it did. > I'd use a benchmark that is more consistent than dbench for this. I love dbench for generating load (and the occasional deadlock) but it tends to steer you in the wrong direction on performance. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote: > > > > Great, thanks! I got stuck in bug land on Friday. You mentioned > > > performance problems earlier on Saturday, did this improve performance? > > > > Yeah, the read_trylock() seems to improve throughput. That's not > > heavily tested, but it certainly looks like it does. No idea why. > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher > priority process preempted a lower priority process that holds the same > lock, it will deadlock. Hm, how, it's doing cpu_chill()? > I'm not sure why you would get a performance benefit from this, as the > mutex used is an adaptive one (failure to acquire the lock will only > sleep if preempted or if the owner is not running). I'm not attached to it, can whack it in a heartbeat.. especially so it the thing can deadlock. I've seen enough of those of late. > We should look at why this performs better (if it really does). Not sure it really does, there's variance, but it looked like it did. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote: > > Great, thanks! I got stuck in bug land on Friday. You mentioned > > performance problems earlier on Saturday, did this improve performance? > > Yeah, the read_trylock() seems to improve throughput. That's not > heavily tested, but it certainly looks like it does. No idea why. Ouch, you just turned the rt_read_lock() into a spin lock. If a higher priority process preempted a lower priority process that holds the same lock, it will deadlock. I'm not sure why you would get a performance benefit from this, as the mutex used is an adaptive one (failure to acquire the lock will only sleep if preempted or if the owner is not running). We should look at why this performs better (if it really does). -- Steve > > WRT performance, dbench isn't thrilled, but btrfs seems to work just > fine for my routine usage, and spinning rust bucket is being all it can > be. I hope I don't have to care overly much about dbench's opinon. It > doesn't make happy multi-thread numbers with btrfs, but those numbers > suddenly look great if you rebase relative to xfs -rt throughput :) > > > One other question: > > > > > again: > > > +#ifdef CONFIG_PREEMPT_RT_BASE > > > + while (atomic_read(&eb->blocking_readers)) > > > + cpu_chill(); > > > + while(!read_trylock(&eb->lock)) > > > + cpu_chill(); > > > + if (atomic_read(&eb->blocking_readers)) { > > > + read_unlock(&eb->lock); > > > + goto again; > > > + } > > > > Why use read_trylock() in a loop instead of just trying to take the > > lock? Is this an RTism or are there other reasons? > > First stab paranoia. It worked, so I removed it. It still worked but > lost throughput, removed all my bits leaving only the lockdep bits, it > still worked. > > -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote: > On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: > > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > > > Greetings, > > > > > > [ deadlocks with btrfs and the recent RT kernels ] > > > > > > I talked with Thomas about this and I think the problem is the > > > single-reader nature of the RW rwlocks. The lockdep report below > > > mentions that btrfs is calling: > > > > > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > > > > > In this case, the task has a number of blocking read locks on the btrfs > > > buffers, > > > and we're trying to turn them back into spinning read locks. Even > > > though btrfs is taking the read rwlock, it doesn't think of this as a new > > > lock operation because we were blocking out new writers. > > > > > > If the second task has taken the spinning read lock, it is going to > > > prevent that clear_path_blocking operation from progressing, even though > > > it would have worked on a non-RT kernel. > > > > > > The solution should be to make the blocking read locks in btrfs honor the > > > single-reader semantics. This means not allowing more than one blocking > > > reader and not allowing a spinning reader when there is a blocking > > > reader. Strictly speaking btrfs shouldn't need recursive readers on a > > > single lock, so I wouldn't worry about that part. > > > > > > There is also a chunk of code in btrfs_clear_path_blocking that makes > > > sure to strictly honor top down locking order during the conversion. It > > > only does this when lockdep is enabled because in non-RT kernels we > > > don't need to worry about it. For RT we'll want to enable that as well. > > > > > > I'll give this a shot later today. > > > > I took a poke at it. Did I do something similar to what you had in > > mind, or just hide behind performance stealing paranoid trylock loops? > > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the > > bat, so it gets posted despite skepticism. > > Seems btrfs isn't entirely convinced either. > > [ 2292.336229] use_block_rsv: 1810 callbacks suppressed > [ 2292.336231] [ cut here ] > [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 > use_block_rsv+0x17d/0x190 [btrfs]() > [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]- > [ 2292.336259] btrfs: block rsv returned -28 This is unrelated. You got far enough into the benchmark to hit an ENOSPC warning. This can be ignored (I just deleted it when we used 3.0 for oracle). re: dbench performance. dbench tends to penalize fairness. I can imagine RT making it slower in general. It also triggers lots of lock contention in btrfs because the dataset is fairly small and the trees don't fan out a lot. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > > Greetings, > > > > [ deadlocks with btrfs and the recent RT kernels ] > > > > I talked with Thomas about this and I think the problem is the > > single-reader nature of the RW rwlocks. The lockdep report below > > mentions that btrfs is calling: > > > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > > > In this case, the task has a number of blocking read locks on the btrfs > > buffers, > > and we're trying to turn them back into spinning read locks. Even > > though btrfs is taking the read rwlock, it doesn't think of this as a new > > lock operation because we were blocking out new writers. > > > > If the second task has taken the spinning read lock, it is going to > > prevent that clear_path_blocking operation from progressing, even though > > it would have worked on a non-RT kernel. > > > > The solution should be to make the blocking read locks in btrfs honor the > > single-reader semantics. This means not allowing more than one blocking > > reader and not allowing a spinning reader when there is a blocking > > reader. Strictly speaking btrfs shouldn't need recursive readers on a > > single lock, so I wouldn't worry about that part. > > > > There is also a chunk of code in btrfs_clear_path_blocking that makes > > sure to strictly honor top down locking order during the conversion. It > > only does this when lockdep is enabled because in non-RT kernels we > > don't need to worry about it. For RT we'll want to enable that as well. > > > > I'll give this a shot later today. > > I took a poke at it. Did I do something similar to what you had in > mind, or just hide behind performance stealing paranoid trylock loops? > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the > bat, so it gets posted despite skepticism. Seems btrfs isn't entirely convinced either. [ 2292.336229] use_block_rsv: 1810 callbacks suppressed [ 2292.336231] [ cut here ] [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 use_block_rsv+0x17d/0x190 [btrfs]() [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]- [ 2292.336259] btrfs: block rsv returned -28 [ 2292.336260] Modules linked in: joydev st sr_mod ide_gd_mod(N) ide_cd_mod ide_core cdrom ibm_rtl nfsd lockd ipmi_devintf nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_msghandler ipv6 ipv6_lib af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf edd fuse btrfs zlib_deflate ext3 jbd loop dm_mod usbhid hid cdc_ether usbnet mii sg shpchp pci_hotplug pcspkr bnx2 ioatdma i2c_i801 i2c_core tpm_tis tpm tpm_bios serio_raw i7core_edac edac_core button dca iTCO_wdt iTCO_vendor_support ext4 mbcache jbd2 uhci_hcd ehci_hcd sd_mod usbcore rtc_cmos crc_t10dif usb_common fan processor ata_generic ata_piix libata megaraid_sas scsi_mod thermal thermal_sys hwmon [ 2292.336296] Supported: Yes [ 2292.336298] Pid: 12975, comm: bonnie Tainted: GW N 3.0.35-rt56-rt #27 [ 2292.336300] Call Trace: [ 2292.336312] [] dump_trace+0x82/0x2e0 [ 2292.336320] [] dump_stack+0x69/0x6f [ 2292.336325] [] warn_slowpath_common+0x7b/0xc0 [ 2292.336330] [] warn_slowpath_fmt+0x45/0x50 [ 2292.336342] [] use_block_rsv+0x17d/0x190 [btrfs] [ 2292.336389] [] btrfs_alloc_free_block+0x49/0x240 [btrfs] [ 2292.336432] [] __btrfs_cow_block+0x13e/0x510 [btrfs] [ 2292.336457] [] btrfs_cow_block+0xff/0x230 [btrfs] [ 2292.336482] [] btrfs_search_slot+0x360/0x7e0 [btrfs] [ 2292.336513] [] btrfs_del_csums+0x175/0x2f0 [btrfs] [ 2292.336562] [] __btrfs_free_extent+0x550/0x760 [btrfs] [ 2292.336599] [] run_delayed_data_ref+0x9d/0x190 [btrfs] [ 2292.336636] [] run_clustered_refs+0xd5/0x3a0 [btrfs] [ 2292.336678] [] btrfs_run_delayed_refs+0x148/0x350 [btrfs] [ 2292.336723] [] __btrfs_end_transaction+0xb7/0x2b0 [btrfs] [ 2292.336796] [] btrfs_evict_inode+0x2d3/0x340 [btrfs] [ 2292.336863] [] evict+0x91/0x190 [ 2292.336868] [] do_unlinkat+0x177/0x1f0 [ 2292.336875] [] system_call_fastpath+0x16/0x1b [ 2292.336881] [<7fea187f9e67>] 0x7fea187f9e66 [ 2292.336887] ---[ end trace 0004 ]--- [ 2610.370398] use_block_rsv: 1947 callbacks suppressed [ 2610.370400] [ cut here ] > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index 4106264..ae47cc2 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path > *p, > { > int i; > > -#ifdef CONFIG_DEBUG_LOCK_ALLOC > +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE)) > /* lockdep really cares that we take all of these spinlocks >* in the right order. If any of the locks in the path are not >* currently blocking, it is going to complain. So, make really > @@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blo
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Sun, 2012-07-15 at 13:56 -0400, Chris Mason wrote: > On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote: > > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > > > Greetings, > > > > > > [ deadlocks with btrfs and the recent RT kernels ] > > > > > > I talked with Thomas about this and I think the problem is the > > > single-reader nature of the RW rwlocks. The lockdep report below > > > mentions that btrfs is calling: > > > > > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > > > > > In this case, the task has a number of blocking read locks on the btrfs > > > buffers, > > > and we're trying to turn them back into spinning read locks. Even > > > though btrfs is taking the read rwlock, it doesn't think of this as a new > > > lock operation because we were blocking out new writers. > > > > > > If the second task has taken the spinning read lock, it is going to > > > prevent that clear_path_blocking operation from progressing, even though > > > it would have worked on a non-RT kernel. > > > > > > The solution should be to make the blocking read locks in btrfs honor the > > > single-reader semantics. This means not allowing more than one blocking > > > reader and not allowing a spinning reader when there is a blocking > > > reader. Strictly speaking btrfs shouldn't need recursive readers on a > > > single lock, so I wouldn't worry about that part. > > > > > > There is also a chunk of code in btrfs_clear_path_blocking that makes > > > sure to strictly honor top down locking order during the conversion. It > > > only does this when lockdep is enabled because in non-RT kernels we > > > don't need to worry about it. For RT we'll want to enable that as well. > > > > > > I'll give this a shot later today. > > > > I took a poke at it. Did I do something similar to what you had in > > mind, or just hide behind performance stealing paranoid trylock loops? > > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the > > bat, so it gets posted despite skepticism. > > Great, thanks! I got stuck in bug land on Friday. You mentioned > performance problems earlier on Saturday, did this improve performance? Yeah, the read_trylock() seems to improve throughput. That's not heavily tested, but it certainly looks like it does. No idea why. WRT performance, dbench isn't thrilled, but btrfs seems to work just fine for my routine usage, and spinning rust bucket is being all it can be. I hope I don't have to care overly much about dbench's opinon. It doesn't make happy multi-thread numbers with btrfs, but those numbers suddenly look great if you rebase relative to xfs -rt throughput :) > One other question: > > > again: > > +#ifdef CONFIG_PREEMPT_RT_BASE > > + while (atomic_read(&eb->blocking_readers)) > > + cpu_chill(); > > + while(!read_trylock(&eb->lock)) > > + cpu_chill(); > > + if (atomic_read(&eb->blocking_readers)) { > > + read_unlock(&eb->lock); > > + goto again; > > + } > > Why use read_trylock() in a loop instead of just trying to take the > lock? Is this an RTism or are there other reasons? First stab paranoia. It worked, so I removed it. It still worked but lost throughput, removed all my bits leaving only the lockdep bits, it still worked. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote: > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > > Greetings, > > > > [ deadlocks with btrfs and the recent RT kernels ] > > > > I talked with Thomas about this and I think the problem is the > > single-reader nature of the RW rwlocks. The lockdep report below > > mentions that btrfs is calling: > > > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > > > In this case, the task has a number of blocking read locks on the btrfs > > buffers, > > and we're trying to turn them back into spinning read locks. Even > > though btrfs is taking the read rwlock, it doesn't think of this as a new > > lock operation because we were blocking out new writers. > > > > If the second task has taken the spinning read lock, it is going to > > prevent that clear_path_blocking operation from progressing, even though > > it would have worked on a non-RT kernel. > > > > The solution should be to make the blocking read locks in btrfs honor the > > single-reader semantics. This means not allowing more than one blocking > > reader and not allowing a spinning reader when there is a blocking > > reader. Strictly speaking btrfs shouldn't need recursive readers on a > > single lock, so I wouldn't worry about that part. > > > > There is also a chunk of code in btrfs_clear_path_blocking that makes > > sure to strictly honor top down locking order during the conversion. It > > only does this when lockdep is enabled because in non-RT kernels we > > don't need to worry about it. For RT we'll want to enable that as well. > > > > I'll give this a shot later today. > > I took a poke at it. Did I do something similar to what you had in > mind, or just hide behind performance stealing paranoid trylock loops? > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the > bat, so it gets posted despite skepticism. Great, thanks! I got stuck in bug land on Friday. You mentioned performance problems earlier on Saturday, did this improve performance? One other question: > again: > +#ifdef CONFIG_PREEMPT_RT_BASE > + while (atomic_read(&eb->blocking_readers)) > + cpu_chill(); > + while(!read_trylock(&eb->lock)) > + cpu_chill(); > + if (atomic_read(&eb->blocking_readers)) { > + read_unlock(&eb->lock); > + goto again; > + } Why use read_trylock() in a loop instead of just trying to take the lock? Is this an RTism or are there other reasons? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > There is also a chunk of code in btrfs_clear_path_blocking that makes > sure to strictly honor top down locking order during the conversion. It > only does this when lockdep is enabled because in non-RT kernels we > don't need to worry about it. For RT we'll want to enable that as well. Hm, _seems_ that alone is enough prevent deadlock. Throughput really sucks though. The other bits of my stab bump throughput for dbench 128 from ~200 mb/s to ~360 mb/s (appears it's the paranoid trylock loops). ext3 does 775 mb/s with the same kernel. Or, dbench 8 on ext3 gives ~1800 mb/s and ~480 mb/s btrfs. Not exactly wonderful. Hohum, guess I'll wait and see what your patch looks like. I bet it'll work a lot better than mine does :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > Greetings, > > [ deadlocks with btrfs and the recent RT kernels ] > > I talked with Thomas about this and I think the problem is the > single-reader nature of the RW rwlocks. The lockdep report below > mentions that btrfs is calling: > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > In this case, the task has a number of blocking read locks on the btrfs > buffers, > and we're trying to turn them back into spinning read locks. Even > though btrfs is taking the read rwlock, it doesn't think of this as a new > lock operation because we were blocking out new writers. > > If the second task has taken the spinning read lock, it is going to > prevent that clear_path_blocking operation from progressing, even though > it would have worked on a non-RT kernel. > > The solution should be to make the blocking read locks in btrfs honor the > single-reader semantics. This means not allowing more than one blocking > reader and not allowing a spinning reader when there is a blocking > reader. Strictly speaking btrfs shouldn't need recursive readers on a > single lock, so I wouldn't worry about that part. > > There is also a chunk of code in btrfs_clear_path_blocking that makes > sure to strictly honor top down locking order during the conversion. It > only does this when lockdep is enabled because in non-RT kernels we > don't need to worry about it. For RT we'll want to enable that as well. > > I'll give this a shot later today. I took a poke at it. Did I do something similar to what you had in mind, or just hide behind performance stealing paranoid trylock loops? Box survived 1000 x xfstests 006 and dbench [-s] massive right off the bat, so it gets posted despite skepticism. diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 4106264..ae47cc2 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p, { int i; -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE)) /* lockdep really cares that we take all of these spinlocks * in the right order. If any of the locks in the path are not * currently blocking, it is going to complain. So, make really @@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p, } } -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE)) if (held) btrfs_clear_lock_blocking_rw(held, held_rw); #endif diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index 272f911..4db7c14 100644 --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "ctree.h" #include "extent_io.h" @@ -97,7 +98,18 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) void btrfs_tree_read_lock(struct extent_buffer *eb) { again: +#ifdef CONFIG_PREEMPT_RT_BASE + while (atomic_read(&eb->blocking_readers)) + cpu_chill(); + while(!read_trylock(&eb->lock)) + cpu_chill(); + if (atomic_read(&eb->blocking_readers)) { + read_unlock(&eb->lock); + goto again; + } +#else read_lock(&eb->lock); +#endif if (atomic_read(&eb->blocking_writers) && current->pid == eb->lock_owner) { /* @@ -131,11 +143,26 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb) if (atomic_read(&eb->blocking_writers)) return 0; +#ifdef CONFIG_PREEMPT_RT_BASE + if (atomic_read(&eb->blocking_readers)) + return 0; + while(!read_trylock(&eb->lock)) + cpu_chill(); +#else read_lock(&eb->lock); +#endif + if (atomic_read(&eb->blocking_writers)) { read_unlock(&eb->lock); return 0; } + +#ifdef CONFIG_PREEMPT_RT_BASE + if (atomic_read(&eb->blocking_readers)) { + read_unlock(&eb->lock); + return 0; + } +#endif atomic_inc(&eb->read_locks); atomic_inc(&eb->spinning_readers); return 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
Chris, On Fri, 13 Jul 2012, Chris Mason wrote: > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > > Greetings, > > [ deadlocks with btrfs and the recent RT kernels ] > > I talked with Thomas about this and I think the problem is the > single-reader nature of the RW rwlocks. The lockdep report below > mentions that btrfs is calling: > > > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 > > In this case, the task has a number of blocking read locks on the btrfs > buffers, > and we're trying to turn them back into spinning read locks. Even > though btrfs is taking the read rwlock, it doesn't think of this as a new > lock operation because we were blocking out new writers. > > If the second task has taken the spinning read lock, it is going to > prevent that clear_path_blocking operation from progressing, even though > it would have worked on a non-RT kernel. > > The solution should be to make the blocking read locks in btrfs honor the > single-reader semantics. This means not allowing more than one blocking > reader and not allowing a spinning reader when there is a blocking > reader. Strictly speaking btrfs shouldn't need recursive readers on a > single lock, so I wouldn't worry about that part. > > There is also a chunk of code in btrfs_clear_path_blocking that makes > sure to strictly honor top down locking order during the conversion. It > only does this when lockdep is enabled because in non-RT kernels we > don't need to worry about it. For RT we'll want to enable that as well. thanks for explaining this. I really got lost in that code completely. > I'll give this a shot later today. Cool. Aside of that I'm still pondering to experiment with a non-pi variant of rw locks which allows multiple readers. For such cases as btrfs I think they would be well suited and avoid the performance overhead of the single writer restriction. But that's not going to happen before my vacation, so we'll stick with your workaround for now and let Mike beat the hell out of it. Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote: > Greetings, [ deadlocks with btrfs and the recent RT kernels ] I talked with Thomas about this and I think the problem is the single-reader nature of the RW rwlocks. The lockdep report below mentions that btrfs is calling: > [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 In this case, the task has a number of blocking read locks on the btrfs buffers, and we're trying to turn them back into spinning read locks. Even though btrfs is taking the read rwlock, it doesn't think of this as a new lock operation because we were blocking out new writers. If the second task has taken the spinning read lock, it is going to prevent that clear_path_blocking operation from progressing, even though it would have worked on a non-RT kernel. The solution should be to make the blocking read locks in btrfs honor the single-reader semantics. This means not allowing more than one blocking reader and not allowing a spinning reader when there is a blocking reader. Strictly speaking btrfs shouldn't need recursive readers on a single lock, so I wouldn't worry about that part. There is also a chunk of code in btrfs_clear_path_blocking that makes sure to strictly honor top down locking order during the conversion. It only does this when lockdep is enabled because in non-RT kernels we don't need to worry about it. For RT we'll want to enable that as well. I'll give this a shot later today. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 2012-07-13 at 06:47 -0400, Chris Mason wrote: > On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote: > > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: > > > > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > > > > Bingo, that makes it more likely that this is caused by copying w/o > > > > > > initializing the lock and then freeing the original structure. > > > > > > > > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > > > > copy, but I have no idea whether that's the place we are looking > > > > > > for. > > > > > > > > > > Thanks a bunch Thomas. I doubt I would have ever figured out that > > > > > lala > > > > > land resulted from _copying_ a lock. That's one I won't be forgetting > > > > > any time soon. Box not only survived a few thousand xfstests 006 > > > > > runs, > > > > > dbench seemed disinterested in deadlocking virgin 3.0-rt. > > > > > > > > Cute. It think that the lock copying caused the deadlock problem as > > > > the list pointed to the wrong place, so we might have ended up with > > > > following down the wrong chain when walking the list as long as the > > > > original struct was not freed. That beast is freed under RCU so there > > > > could be a rcu read side critical section fiddling with the old lock > > > > and cause utter confusion. > > > > > > Virgin 3.0-rt appears to really be solid. But then it doesn't have > > > pesky rwlocks. > > > > Ah. So 3.0 is not having those rwlock thingies. Bummer. > > > > > > /me goes and writes a nastigram^W proper changelog > > > > > > > > > btrfs still locks up in my enterprise kernel, so I suppose I had > > > > > better > > > > > plug your fix into 3.4-rt and see what happens, and go beat hell out > > > > > of > > > > > virgin 3.0-rt again to be sure box really really survives dbench. > > > > > > > > A test against 3.4-rt sans enterprise mess might be nice as well. > > > > > > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear). > > > > > > Virgin 3.4-rt and 3.2-rt deadlock gripe. Enterprise doesn't gripe, but > > > deadlocks, so I have another adventure in my future even if I figure out > > > wth to do about rwlocks. > > > > Hrmpf. /me goes to stare into fs/btrfs/ some more. > > Please post the deadlocks here, I'll help ;) This is the one from top of thread. Below that is without the deadlock detector. [ 692.857165] [ 692.863963] [ BUG: circular locking deadlock detected! ] [ 692.869264] Not tainted [ 692.871708] [ 692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937 [ 692.877009] [ 692.885183] [ 692.885184] 1) dbench/7937 is trying to acquire this lock: [ 692.892149] [88014d6aea80] {&(&eb->lock)->lock} [ 692.897102] .. ->owner: 880175808501 [ 692.901018] .. held by: btrfs-delayed-m: 1404 [880175808500, 120] [ 692.907657] [ 692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock: [ 692.914797] [88014bf58d60] {&(&eb->lock)->lock} [ 692.919751] .. ->owner: 880175186101 [ 692.923672] .. held by:dbench: 7937 [880175186100, 120] [ 692.930309] [ 692.930309] btrfs-delayed-m/1404's [blocked] stackdump: [ 692.930310] [ 692.938504] 880177575aa0 0046 88014bf58d60 fb00 [ 692.938507] fb00 880177575fd8 fb00 880177574000 [ 692.938509] 880177575fd8 fb00 88017662f240 880175808500 [ 692.960635] Call Trace: [ 692.963085] [] schedule+0x29/0x90 [ 692.963087] [] rt_spin_lock_slowlock+0xfd/0x330 [ 692.963090] [] __rt_spin_lock+0x9/0x10 [ 692.963092] [] rt_read_lock+0x27/0x40 [ 692.963096] [] btrfs_clear_lock_blocking_rw+0x6f/0x180 [ 692.963099] [] btrfs_clear_path_blocking+0x32/0x70 [ 692.963102] [] btrfs_search_slot+0x6b2/0x810 [ 692.963105] [] btrfs_lookup_inode+0x2a/0xa0 [ 692.963107] [] ? rt_mutex_lock+0x12/0x20 [ 692.963111] [] btrfs_update_delayed_inode+0x6c/0x160 [ 692.963113] [] ? _mutex_unlock+0x9/0x10 [ 692.963116] [] btrfs_async_run_delayed_node_done+0x182/0x1a0 [ 692.963119] [] worker_loop+0xaf/0x430 [ 692.963121] [] ? btrfs_queue_worker+0x1e0/0x1e0 [ 692.963123] [] ? btrfs_queue_worker+0x1e0/0x1e0 [ 692.963127] [] kthread+0xae/0xc0 [ 692.963129] [] ? schedule+0x29/0x90 [ 692.963133] [] ? __switch_to+0x14c/0x410 [ 692.963137] [] ? finish_task_switch+0x54/0xd0 [ 692.963140] [] kernel_thread_helper+0x4/0x10 [ 692.963143] [] ? __init_kthread_worker+0x50/0x50 [ 692.963145] [] ? gs_change+0x13/0x13 [ 692.963146] [ 692.963147] dbench/7937's [current] stackdump: [ 692.963147] [ 693.098724] Pid: 7937, comm: dbench Not tainted 3.4.4-rt13 #25 [ 693.104544
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote: > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: > > > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > > > Bingo, that makes it more likely that this is caused by copying w/o > > > > > initializing the lock and then freeing the original structure. > > > > > > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > > > copy, but I have no idea whether that's the place we are looking for. > > > > > > > > Thanks a bunch Thomas. I doubt I would have ever figured out that lala > > > > land resulted from _copying_ a lock. That's one I won't be forgetting > > > > any time soon. Box not only survived a few thousand xfstests 006 runs, > > > > dbench seemed disinterested in deadlocking virgin 3.0-rt. > > > > > > Cute. It think that the lock copying caused the deadlock problem as > > > the list pointed to the wrong place, so we might have ended up with > > > following down the wrong chain when walking the list as long as the > > > original struct was not freed. That beast is freed under RCU so there > > > could be a rcu read side critical section fiddling with the old lock > > > and cause utter confusion. > > > > Virgin 3.0-rt appears to really be solid. But then it doesn't have > > pesky rwlocks. > > Ah. So 3.0 is not having those rwlock thingies. Bummer. > > > > /me goes and writes a nastigram^W proper changelog > > > > > > > btrfs still locks up in my enterprise kernel, so I suppose I had better > > > > plug your fix into 3.4-rt and see what happens, and go beat hell out of > > > > virgin 3.0-rt again to be sure box really really survives dbench. > > > > > > A test against 3.4-rt sans enterprise mess might be nice as well. > > > > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear). > > > > Virgin 3.4-rt and 3.2-rt deadlock gripe. Enterprise doesn't gripe, but > > deadlocks, so I have another adventure in my future even if I figure out > > wth to do about rwlocks. > > Hrmpf. /me goes to stare into fs/btrfs/ some more. Please post the deadlocks here, I'll help ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 13 Jul 2012, Mike Galbraith wrote: > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: > > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > > Bingo, that makes it more likely that this is caused by copying w/o > > > > initializing the lock and then freeing the original structure. > > > > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > > copy, but I have no idea whether that's the place we are looking for. > > > > > > Thanks a bunch Thomas. I doubt I would have ever figured out that lala > > > land resulted from _copying_ a lock. That's one I won't be forgetting > > > any time soon. Box not only survived a few thousand xfstests 006 runs, > > > dbench seemed disinterested in deadlocking virgin 3.0-rt. > > > > Cute. It think that the lock copying caused the deadlock problem as > > the list pointed to the wrong place, so we might have ended up with > > following down the wrong chain when walking the list as long as the > > original struct was not freed. That beast is freed under RCU so there > > could be a rcu read side critical section fiddling with the old lock > > and cause utter confusion. > > Virgin 3.0-rt appears to really be solid. But then it doesn't have > pesky rwlocks. Ah. So 3.0 is not having those rwlock thingies. Bummer. > > /me goes and writes a nastigram^W proper changelog > > > > > btrfs still locks up in my enterprise kernel, so I suppose I had better > > > plug your fix into 3.4-rt and see what happens, and go beat hell out of > > > virgin 3.0-rt again to be sure box really really survives dbench. > > > > A test against 3.4-rt sans enterprise mess might be nice as well. > > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear). > > Virgin 3.4-rt and 3.2-rt deadlock gripe. Enterprise doesn't gripe, but > deadlocks, so I have another adventure in my future even if I figure out > wth to do about rwlocks. Hrmpf. /me goes to stare into fs/btrfs/ some more. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: > On Fri, 13 Jul 2012, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > Bingo, that makes it more likely that this is caused by copying w/o > > > initializing the lock and then freeing the original structure. > > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > copy, but I have no idea whether that's the place we are looking for. > > > > Thanks a bunch Thomas. I doubt I would have ever figured out that lala > > land resulted from _copying_ a lock. That's one I won't be forgetting > > any time soon. Box not only survived a few thousand xfstests 006 runs, > > dbench seemed disinterested in deadlocking virgin 3.0-rt. > > Cute. It think that the lock copying caused the deadlock problem as > the list pointed to the wrong place, so we might have ended up with > following down the wrong chain when walking the list as long as the > original struct was not freed. That beast is freed under RCU so there > could be a rcu read side critical section fiddling with the old lock > and cause utter confusion. Virgin 3.0-rt appears to really be solid. But then it doesn't have pesky rwlocks. > /me goes and writes a nastigram^W proper changelog > > > btrfs still locks up in my enterprise kernel, so I suppose I had better > > plug your fix into 3.4-rt and see what happens, and go beat hell out of > > virgin 3.0-rt again to be sure box really really survives dbench. > > A test against 3.4-rt sans enterprise mess might be nice as well. Enterprise is 3.0-stable with um 555 btrfs patches (oh dear). Virgin 3.4-rt and 3.2-rt deadlock gripe. Enterprise doesn't gripe, but deadlocks, so I have another adventure in my future even if I figure out wth to do about rwlocks. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 12 Jul 2012, Chris Mason wrote: > On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote: > > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > > crash> struct rt_mutex 0x8801770601c8 > > > struct rt_mutex { > > > wait_lock = { > > > raw_lock = { > > > slock = 7966 > > > } > > > }, > > > wait_list = { > > > node_list = { > > > next = 0x880175eedbe0, > > > prev = 0x880175eedbe0 > > > }, > > > rawlock = 0x880175eedbd8, > > > > Urgh. Here is something completely wrong. That should point to > > wait_lock, i.e. the rt_mutex itself, but that points into lala land. > > This is probably the memcpy you found later this morning, right? As Mike found out, it looks like the culprit. > The reader/writer part in btrfs is just an optimization. If we need > them to be all writer locks for RT purposes, that's not a problem. > > But, before we go down that road, we do annotations trying > to make sure lockdep doesn't get confused about lock classes. Basically > the tree is locked level by level. So its safe to take eb->lock while > holding eb->lock as long as you follow the rules. > > Are additional annotations required for RT? I don't think so. I'm sure it has been caused by the lock copying as well. Walking the wrong list can cause complete confusion all over the place. So lets wait for Mike beating the hell out of it. Find the patch with a proper changelog below. Thanks, tglx --> From: Thomas Gleixner Date: Thu, 12 Jul 2012 15:30:02 +0200 Subject: btrfs: Init io_lock after cloning btrfs device struct __btrfs_close_devices() clones btrfs device structs with memcpy(). Some of the fields in the clone are reinitialized, but it's missing to init io_lock. In mainline this goes unnoticed, but on RT it leaves the plist pointing to the original about to be freed lock struct. Initialize io_lock after cloning, so no references to the original struct are left. Reported-and-tested-by: Mike Galbraith Cc: sta...@vger.kernel.org Signed-off-by: Thomas Gleixner --- diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 43baaf0..06c8ced 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) new_device->writeable = 0; new_device->in_fs_metadata = 0; new_device->can_discard = 0; + spin_lock_init(&new_device->io_lock); list_replace_rcu(&device->dev_list, &new_device->dev_list); call_rcu(&device->rcu, free_device); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Fri, 13 Jul 2012, Mike Galbraith wrote: > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > Bingo, that makes it more likely that this is caused by copying w/o > > initializing the lock and then freeing the original structure. > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > memcpy of btrfs_device structs w/o initializing the lock in the new > > copy, but I have no idea whether that's the place we are looking for. > > Thanks a bunch Thomas. I doubt I would have ever figured out that lala > land resulted from _copying_ a lock. That's one I won't be forgetting > any time soon. Box not only survived a few thousand xfstests 006 runs, > dbench seemed disinterested in deadlocking virgin 3.0-rt. Cute. It think that the lock copying caused the deadlock problem as the list pointed to the wrong place, so we might have ended up with following down the wrong chain when walking the list as long as the original struct was not freed. That beast is freed under RCU so there could be a rcu read side critical section fiddling with the old lock and cause utter confusion. /me goes and writes a nastigram^W proper changelog > btrfs still locks up in my enterprise kernel, so I suppose I had better > plug your fix into 3.4-rt and see what happens, and go beat hell out of > virgin 3.0-rt again to be sure box really really survives dbench. A test against 3.4-rt sans enterprise mess might be nice as well. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: > > > rawlock points to ...968 and the node_list to ...970. > > > > > > struct rt_mutex { > > > raw_spinlock_t wait_lock; > > > struct plist_head wait_list; > > > > > > The raw_lock pointer of the plist_head is initialized in > > > __rt_mutex_init() so it points to wait_lock. > > > > > > Can you check the offset of wait_list vs. the rt_mutex itself? > > > > > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing > > > looks like a copied lock with stale pointers to hell. Eew. > > > > crash> struct rt_mutex -o > > struct rt_mutex { > >[0] raw_spinlock_t wait_lock; > >[8] struct plist_head wait_list; > > Bingo, that makes it more likely that this is caused by copying w/o > initializing the lock and then freeing the original structure. > > A quick check for memcpy finds that __btrfs_close_devices() does a > memcpy of btrfs_device structs w/o initializing the lock in the new > copy, but I have no idea whether that's the place we are looking for. Thanks a bunch Thomas. I doubt I would have ever figured out that lala land resulted from _copying_ a lock. That's one I won't be forgetting any time soon. Box not only survived a few thousand xfstests 006 runs, dbench seemed disinterested in deadlocking virgin 3.0-rt. btrfs still locks up in my enterprise kernel, so I suppose I had better plug your fix into 3.4-rt and see what happens, and go beat hell out of virgin 3.0-rt again to be sure box really really survives dbench. > tglx > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 43baaf0..06c8ced 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices > *fs_devices) > new_device->writeable = 0; > new_device->in_fs_metadata = 0; > new_device->can_discard = 0; > + spin_lock_init(&new_device->io_lock); > list_replace_rcu(&device->dev_list, &new_device->dev_list); > > call_rcu(&device->rcu, free_device); > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote: > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > crash> struct rt_mutex 0x8801770601c8 > > struct rt_mutex { > > wait_lock = { > > raw_lock = { > > slock = 7966 > > } > > }, > > wait_list = { > > node_list = { > > next = 0x880175eedbe0, > > prev = 0x880175eedbe0 > > }, > > rawlock = 0x880175eedbd8, > > Urgh. Here is something completely wrong. That should point to > wait_lock, i.e. the rt_mutex itself, but that points into lala land. This is probably the memcpy you found later this morning, right? > > > Reproducer2: dbench -t 30 8 > > > > [ 692.857164] > > [ 692.857165] > > [ 692.863963] [ BUG: circular locking deadlock detected! ] > > [ 692.869264] Not tainted > > [ 692.871708] > > [ 692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937 > > [ 692.877009] > > [ 692.885183] > > [ 692.885184] 1) dbench/7937 is trying to acquire this lock: > > [ 692.892149] [88014d6aea80] {&(&eb->lock)->lock} > > [ 692.897102] .. ->owner: 880175808501 > > [ 692.901018] .. held by: btrfs-delayed-m: 1404 [880175808500, 120] > > [ 692.907657] > > [ 692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock: > > [ 692.914797] [88014bf58d60] {&(&eb->lock)->lock} > > [ 692.919751] .. ->owner: 880175186101 > > [ 692.923672] .. held by:dbench: 7937 [880175186100, 120] > > [ 692.930309] > > [ 692.930309] btrfs-delayed-m/1404's [blocked] stackdump: > > Hrmm. Both locks are rw_locks and we prevent multiple readers for the > known reasons in RT. No idea how to deal with that one :( The reader/writer part in btrfs is just an optimization. If we need them to be all writer locks for RT purposes, that's not a problem. But, before we go down that road, we do annotations trying to make sure lockdep doesn't get confused about lock classes. Basically the tree is locked level by level. So its safe to take eb->lock while holding eb->lock as long as you follow the rules. Are additional annotations required for RT? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 15:48 +0200, Mike Galbraith wrote: > On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: > > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > > copy, but I have no idea whether that's the place we are looking for. > > > > > > > > > Cool, you found one, thanks! I'm setting boobytraps. > > > > > > Um, correction, box says I'm setting _buggy_ boobytraps :) > > > > > > Tomorrow-man will test this and frob traps anew. > > > > What kind of test setup do you have? i.e. raid, single disk ... > > Yeah, megaraid sas.. x3550 M3. (one disk for OS, one disk for xfstests to mangle) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > > memcpy of btrfs_device structs w/o initializing the lock in the new > > > copy, but I have no idea whether that's the place we are looking for. > > > > > > Cool, you found one, thanks! I'm setting boobytraps. > > > > Um, correction, box says I'm setting _buggy_ boobytraps :) > > > > Tomorrow-man will test this and frob traps anew. > > What kind of test setup do you have? i.e. raid, single disk ... Yeah, megaraid sas.. x3550 M3. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 12 Jul 2012, Mike Galbraith wrote: > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > A quick check for memcpy finds that __btrfs_close_devices() does a > > memcpy of btrfs_device structs w/o initializing the lock in the new > > copy, but I have no idea whether that's the place we are looking for. > > > Cool, you found one, thanks! I'm setting boobytraps. > > Um, correction, box says I'm setting _buggy_ boobytraps :) > > Tomorrow-man will test this and frob traps anew. What kind of test setup do you have? i.e. raid, single disk ... Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: > > > rawlock points to ...968 and the node_list to ...970. > > > > > > struct rt_mutex { > > > raw_spinlock_t wait_lock; > > > struct plist_head wait_list; > > > > > > The raw_lock pointer of the plist_head is initialized in > > > __rt_mutex_init() so it points to wait_lock. > > > > > > Can you check the offset of wait_list vs. the rt_mutex itself? > > > > > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing > > > looks like a copied lock with stale pointers to hell. Eew. > > > > crash> struct rt_mutex -o > > struct rt_mutex { > >[0] raw_spinlock_t wait_lock; > >[8] struct plist_head wait_list; > > Bingo, that makes it more likely that this is caused by copying w/o > initializing the lock and then freeing the original structure. > > A quick check for memcpy finds that __btrfs_close_devices() does a > memcpy of btrfs_device structs w/o initializing the lock in the new > copy, but I have no idea whether that's the place we are looking for. Cool, you found one, thanks! I'm setting boobytraps. Um, correction, box says I'm setting _buggy_ boobytraps :) Tomorrow-man will test this and frob traps anew. > > Thanks, > > tglx > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 43baaf0..06c8ced 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices > *fs_devices) > new_device->writeable = 0; > new_device->in_fs_metadata = 0; > new_device->can_discard = 0; > + spin_lock_init(&new_device->io_lock); > list_replace_rcu(&device->dev_list, &new_device->dev_list); > > call_rcu(&device->rcu, free_device); > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 12 Jul 2012, Mike Galbraith wrote: > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: > > rawlock points to ...968 and the node_list to ...970. > > > > struct rt_mutex { > > raw_spinlock_t wait_lock; > > struct plist_head wait_list; > > > > The raw_lock pointer of the plist_head is initialized in > > __rt_mutex_init() so it points to wait_lock. > > > > Can you check the offset of wait_list vs. the rt_mutex itself? > > > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing > > looks like a copied lock with stale pointers to hell. Eew. > > crash> struct rt_mutex -o > struct rt_mutex { >[0] raw_spinlock_t wait_lock; >[8] struct plist_head wait_list; Bingo, that makes it more likely that this is caused by copying w/o initializing the lock and then freeing the original structure. A quick check for memcpy finds that __btrfs_close_devices() does a memcpy of btrfs_device structs w/o initializing the lock in the new copy, but I have no idea whether that's the place we are looking for. Thanks, tglx diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 43baaf0..06c8ced 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) new_device->writeable = 0; new_device->in_fs_metadata = 0; new_device->can_discard = 0; + spin_lock_init(&new_device->io_lock); list_replace_rcu(&device->dev_list, &new_device->dev_list); call_rcu(&device->rcu, free_device); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: > On Thu, 12 Jul 2012, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: > > > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: > > > > Greetings, > > > > > > > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just > > > > checked to see if they're alive in virgin latest/greatest rt kernel. > > > > > > > > Both are indeed alive and well, ie I didn't break it, nor did the > > > > zillion patches in enterprise base kernel, so others may have an > > > > opportunity to meet these critters up close and personal as well. > > > > > > 3.2-rt both explodes and deadlocks as well. 3.0-rt (virgin I mean) does > > > neither, so with enough re-integrate investment, it might be bisectable. > > > > Nope, virgin 3.0-rt just didn't feel like it at the time. Booted it > > again to run hefty test over lunch, it didn't survive 1 xfstests 006, > > much less hundreds. > > > > crash> bt > > PID: 7604 TASK: 880174238b20 CPU: 0 COMMAND: "btrfs-worker-0" > > #0 [88017455d9c8] machine_kexec at 81025794 > > #1 [88017455da28] crash_kexec at 8109781d > > #2 [88017455daf8] panic at 814a0661 > > #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f > > #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670 > > #5 [88017455dca8] rt_spin_lock at 814a2db9 > > #6 [88017455dcb8] schedule_bio at 81243133 > > #7 [88017455dcf8] btrfs_map_bio at 812477be > > #8 [88017455dd68] __btree_submit_bio_done at 812152f6 > > #9 [88017455dd78] run_one_async_done at 812148fa > > #10 [88017455dd98] run_ordered_completions at 812493e8 > > #11 [88017455ddd8] worker_loop at 81249dc9 > > #12 [88017455de88] kthread at 81070266 > > #13 [88017455df48] kernel_thread_helper at 814a9be4 > > crash> struct rt_mutex 0x880174530108 > > struct rt_mutex { > > wait_lock = { > > raw_lock = { > > slock = 7966 > > } > > }, > > wait_list = { > > node_list = { > > next = 0x880175ecc970, > > prev = 0x880175ecc970 > > }, > > rawlock = 0x880175ecc968, > > Pointer into lala land again. Yeah, and freed again. > rawlock points to ...968 and the node_list to ...970. > > struct rt_mutex { > raw_spinlock_t wait_lock; > struct plist_head wait_list; > > The raw_lock pointer of the plist_head is initialized in > __rt_mutex_init() so it points to wait_lock. > > Can you check the offset of wait_list vs. the rt_mutex itself? > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing > looks like a copied lock with stale pointers to hell. Eew. crash> struct rt_mutex -o struct rt_mutex { [0] raw_spinlock_t wait_lock; [8] struct plist_head wait_list; [40] struct task_struct *owner; [48] int save_state; [56] const char *file; [64] const char *name; [72] int line; [80] void *magic; } SIZE: 88 -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 12 Jul 2012, Mike Galbraith wrote: > On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: > > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: > > > Greetings, > > > > > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just > > > checked to see if they're alive in virgin latest/greatest rt kernel. > > > > > > Both are indeed alive and well, ie I didn't break it, nor did the > > > zillion patches in enterprise base kernel, so others may have an > > > opportunity to meet these critters up close and personal as well. > > > > 3.2-rt both explodes and deadlocks as well. 3.0-rt (virgin I mean) does > > neither, so with enough re-integrate investment, it might be bisectable. > > Nope, virgin 3.0-rt just didn't feel like it at the time. Booted it > again to run hefty test over lunch, it didn't survive 1 xfstests 006, > much less hundreds. > > crash> bt > PID: 7604 TASK: 880174238b20 CPU: 0 COMMAND: "btrfs-worker-0" > #0 [88017455d9c8] machine_kexec at 81025794 > #1 [88017455da28] crash_kexec at 8109781d > #2 [88017455daf8] panic at 814a0661 > #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f > #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670 > #5 [88017455dca8] rt_spin_lock at 814a2db9 > #6 [88017455dcb8] schedule_bio at 81243133 > #7 [88017455dcf8] btrfs_map_bio at 812477be > #8 [88017455dd68] __btree_submit_bio_done at 812152f6 > #9 [88017455dd78] run_one_async_done at 812148fa > #10 [88017455dd98] run_ordered_completions at 812493e8 > #11 [88017455ddd8] worker_loop at 81249dc9 > #12 [88017455de88] kthread at 81070266 > #13 [88017455df48] kernel_thread_helper at 814a9be4 > crash> struct rt_mutex 0x880174530108 > struct rt_mutex { > wait_lock = { > raw_lock = { > slock = 7966 > } > }, > wait_list = { > node_list = { > next = 0x880175ecc970, > prev = 0x880175ecc970 > }, > rawlock = 0x880175ecc968, Pointer into lala land again. rawlock points to ...968 and the node_list to ...970. struct rt_mutex { raw_spinlock_t wait_lock; struct plist_head wait_list; The raw_lock pointer of the plist_head is initialized in __rt_mutex_init() so it points to wait_lock. Can you check the offset of wait_list vs. the rt_mutex itself? I wouldn't be surprised if it's exactly 8 bytes. And then this thing looks like a copied lock with stale pointers to hell. Eew. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 12 Jul 2012, Mike Galbraith wrote: > crash> struct rt_mutex 0x8801770601c8 > struct rt_mutex { > wait_lock = { > raw_lock = { > slock = 7966 > } > }, > wait_list = { > node_list = { > next = 0x880175eedbe0, > prev = 0x880175eedbe0 > }, > rawlock = 0x880175eedbd8, Urgh. Here is something completely wrong. That should point to wait_lock, i.e. the rt_mutex itself, but that points into lala land. > spinlock = 0x0 > }, > owner = 0x1, > save_state = 0, > file = 0x0, > name = 0x81781b9b "&(&device->io_lock)->lock", > line = 0, > magic = 0x0 > } > crash> struct list_head 0x880175eedbe0 > struct list_head { > next = 0x6b6b6b6b6b6b6b6b, > prev = 0x6b6b6b6b6b6b6b6b > } That's POISON_FREE. How the heck can this happen ? > Reproducer2: dbench -t 30 8 > > [ 692.857164] > [ 692.857165] > [ 692.863963] [ BUG: circular locking deadlock detected! ] > [ 692.869264] Not tainted > [ 692.871708] > [ 692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937 > [ 692.877009] > [ 692.885183] > [ 692.885184] 1) dbench/7937 is trying to acquire this lock: > [ 692.892149] [88014d6aea80] {&(&eb->lock)->lock} > [ 692.897102] .. ->owner: 880175808501 > [ 692.901018] .. held by: btrfs-delayed-m: 1404 [880175808500, 120] > [ 692.907657] > [ 692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock: > [ 692.914797] [88014bf58d60] {&(&eb->lock)->lock} > [ 692.919751] .. ->owner: 880175186101 > [ 692.923672] .. held by:dbench: 7937 [880175186100, 120] > [ 692.930309] > [ 692.930309] btrfs-delayed-m/1404's [blocked] stackdump: Hrmm. Both locks are rw_locks and we prevent multiple readers for the known reasons in RT. No idea how to deal with that one :( Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: > > Greetings, > > > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just > > checked to see if they're alive in virgin latest/greatest rt kernel. > > > > Both are indeed alive and well, ie I didn't break it, nor did the > > zillion patches in enterprise base kernel, so others may have an > > opportunity to meet these critters up close and personal as well. > > 3.2-rt both explodes and deadlocks as well. 3.0-rt (virgin I mean) does > neither, so with enough re-integrate investment, it might be bisectable. Nope, virgin 3.0-rt just didn't feel like it at the time. Booted it again to run hefty test over lunch, it didn't survive 1 xfstests 006, much less hundreds. crash> bt PID: 7604 TASK: 880174238b20 CPU: 0 COMMAND: "btrfs-worker-0" #0 [88017455d9c8] machine_kexec at 81025794 #1 [88017455da28] crash_kexec at 8109781d #2 [88017455daf8] panic at 814a0661 #3 [88017455db78] __try_to_take_rt_mutex at 81086d2f #4 [88017455dbc8] rt_spin_lock_slowlock at 814a2670 #5 [88017455dca8] rt_spin_lock at 814a2db9 #6 [88017455dcb8] schedule_bio at 81243133 #7 [88017455dcf8] btrfs_map_bio at 812477be #8 [88017455dd68] __btree_submit_bio_done at 812152f6 #9 [88017455dd78] run_one_async_done at 812148fa #10 [88017455dd98] run_ordered_completions at 812493e8 #11 [88017455ddd8] worker_loop at 81249dc9 #12 [88017455de88] kthread at 81070266 #13 [88017455df48] kernel_thread_helper at 814a9be4 crash> struct rt_mutex 0x880174530108 struct rt_mutex { wait_lock = { raw_lock = { slock = 7966 } }, wait_list = { node_list = { next = 0x880175ecc970, prev = 0x880175ecc970 }, rawlock = 0x880175ecc968, spinlock = 0x0 }, owner = 0x1, save_state = 0, file = 0x0, name = 0x81763d02 "&(&device->io_lock)->lock", line = 0, magic = 0x0 } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: > Greetings, > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just > checked to see if they're alive in virgin latest/greatest rt kernel. > > Both are indeed alive and well, ie I didn't break it, nor did the > zillion patches in enterprise base kernel, so others may have an > opportunity to meet these critters up close and personal as well. 3.2-rt both explodes and deadlocks as well. 3.0-rt (virgin I mean) does neither, so with enough re-integrate investment, it might be bisectable. Rummaging in btrfs, that begins to look down right attractive ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/