On Mon, 18 Feb 2013, Stefan Behrens wrote:
On Fri, 15 Feb 2013 22:56:19 +0100 (CET), Fredrik Tolf wrote:
The oops cut can be found here:
<http://www.dolda2000.com/~fredrik/tmp/btrfs-oops>

This scrub issue is fixed since Linux 3.8-rc1 with commit
4ded4f6 Btrfs: fix BUG() in scrub when first superblock reading gives EIO

I see, thanks!

Rebooting the system did get me running again, allowing me to remove the missing device from filesystem. However, I encountered a couple of somewhat strange happenings as I did that. I don't know if they're considered bugs or not, but I thought I had best report them.

To begin with, the act of removing the missing device from the filesystem itself caused the resynchronization to the "new" device to happen in blocking mode, so the "btrfs device delete missing" operation took about a day to finish. My expectation would have been that the device removal would have been a fast operation and that I would have had to scrub the filesystem or something in order to resynchronize, but I can see how this would be intented behavior.

However, what's weirder is that while the resynchronization was underway, I couldn't mount subvolumes on other mountpoints. The mount commands blocked (disk-slept) until the entire synchronization was done, and I don't think this was intended behavior, because I had the kernel saying the following while it happened:

Feb 16 06:01:27 nerv kernel: [ 3482.512106] INFO: task mount:3525 blocked for 
more than 120 seconds.
Feb 16 06:01:28 nerv kernel: [ 3482.518484] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 06:01:28 nerv kernel: [ 3482.526324] mount           D ffff88003e220e40  
   0  3525   3524 0x00000000
Feb 16 06:01:28 nerv kernel: [ 3482.533587]  ffff88003e220e40 0000000000000082 
ffffffffa0067470 ffff88003e2300c0
Feb 16 06:01:28 nerv kernel: [ 3482.541088]  0000000000013b40 ffff88001126dfd8 
0000000000013b40 ffff88001126dfd8
Feb 16 06:01:28 nerv kernel: [ 3482.548584]  0000000000013b40 ffff88003e220e40 
0000000000013b40 ffff88001126c010
Feb 16 06:01:28 nerv kernel: [ 3482.556280] Call Trace:
Feb 16 06:01:28 nerv kernel: [ 3482.558776]  [<ffffffff81396132>] ? 
__mutex_lock_common+0x10d/0x175
Feb 16 06:01:28 nerv kernel: [ 3482.565078]  [<ffffffff81396260>] ? 
mutex_lock+0x1a/0x2c
Feb 16 06:01:28 nerv kernel: [ 3482.570661]  [<ffffffffa05a38c2>] ? 
btrfs_scan_one_device+0x40/0x133 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.577752]  [<ffffffffa0564e8b>] ? 
btrfs_mount+0x1c4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.584080]  [<ffffffff810e56cb>] ? 
pcpu_next_pop+0x37/0x43
Feb 16 06:01:28 nerv kernel: [ 3482.589709]  [<ffffffff810e52c0>] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.595226]  [<ffffffff811012aa>] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.601345]  [<ffffffff81113778>] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.606595]  [<ffffffff811291f7>] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.612292]  [<ffffffffa056516b>] ? 
btrfs_mount+0x4a4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.618673]  [<ffffffff810e52c0>] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.624178]  [<ffffffff811012aa>] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.630347]  [<ffffffff81113778>] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.635580]  [<ffffffff811291f7>] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.641258]  [<ffffffff811292e0>] ? 
do_kern_mount+0x49/0xd6
Feb 16 06:01:29 nerv kernel: [ 3482.646855]  [<ffffffff81129a98>] ? 
do_mount+0x72b/0x791
Feb 16 06:01:29 nerv kernel: [ 3482.652186]  [<ffffffff81129b86>] ? 
sys_mount+0x88/0xc3
Feb 16 06:01:29 nerv kernel: [ 3482.657464]  [<ffffffff8139d229>] ? 
system_call_fastpath+0x16/0x1b

Furthermore, it struck me that the consequences of having to mount a filesystem with missing deviced with -o degraded can be a bit strange. I realize what the intentions of the behavior is, of course, but I think it might cause quite some difficulties when trying to mount a degraded btrfs filesystem as root on a system that you don't have physical access to, like a hosted server, because it might be hard to manipulate the boot process so as to pass that mountflag to the initrd. Note that this is not a problem with md-raid; it will simply assemble its arrays in degraded mode automatically, without intervention. I'm not necessarily saying that's better, but I thought I should bring up the point.

--

Fredrik Tolf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to