Re: Rebalancing RAID1

Fredrik Tolf Fri, 22 Feb 2013 16:36:16 -0800

On Mon, 18 Feb 2013, Stefan Behrens wrote:

On Fri, 15 Feb 2013 22:56:19 +0100 (CET), Fredrik Tolf wrote:

The oops cut can be found here:
<http://www.dolda2000.com/~fredrik/tmp/btrfs-oops>


This scrub issue is fixed since Linux 3.8-rc1 with commit
4ded4f6 Btrfs: fix BUG() in scrub when first superblock reading gives EIO


I see, thanks!

Rebooting the system did get me running again, allowing me to remove themissing device from filesystem. However, I encountered a couple ofsomewhat strange happenings as I did that. I don't know if they'reconsidered bugs or not, but I thought I had best report them.

To begin with, the act of removing the missing device from the filesystemitself caused the resynchronization to the "new" device to happen inblocking mode, so the "btrfs device delete missing" operation took about aday to finish. My expectation would have been that the device removalwould have been a fast operation and that I would have had to scrub thefilesystem or something in order to resynchronize, but I can see how thiswould be intented behavior.

However, what's weirder is that while the resynchronization was underway,I couldn't mount subvolumes on other mountpoints. The mount commandsblocked (disk-slept) until the entire synchronization was done, and Idon't think this was intended behavior, because I had the kernel sayingthe following while it happened:


Feb 16 06:01:27 nerv kernel: [ 3482.512106] INFO: task mount:3525 blocked for 
more than 120 seconds.
Feb 16 06:01:28 nerv kernel: [ 3482.518484] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 06:01:28 nerv kernel: [ 3482.526324] mount           D ffff88003e220e40  
   0  3525   3524 0x00000000
Feb 16 06:01:28 nerv kernel: [ 3482.533587]  ffff88003e220e40 0000000000000082 
ffffffffa0067470 ffff88003e2300c0
Feb 16 06:01:28 nerv kernel: [ 3482.541088]  0000000000013b40 ffff88001126dfd8 
0000000000013b40 ffff88001126dfd8
Feb 16 06:01:28 nerv kernel: [ 3482.548584]  0000000000013b40 ffff88003e220e40 
0000000000013b40 ffff88001126c010
Feb 16 06:01:28 nerv kernel: [ 3482.556280] Call Trace:
Feb 16 06:01:28 nerv kernel: [ 3482.558776]  [<ffffffff81396132>] ? 
__mutex_lock_common+0x10d/0x175
Feb 16 06:01:28 nerv kernel: [ 3482.565078]  [<ffffffff81396260>] ? 
mutex_lock+0x1a/0x2c
Feb 16 06:01:28 nerv kernel: [ 3482.570661]  [<ffffffffa05a38c2>] ? 
btrfs_scan_one_device+0x40/0x133 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.577752]  [<ffffffffa0564e8b>] ? 
btrfs_mount+0x1c4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.584080]  [<ffffffff810e56cb>] ? 
pcpu_next_pop+0x37/0x43
Feb 16 06:01:28 nerv kernel: [ 3482.589709]  [<ffffffff810e52c0>] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.595226]  [<ffffffff811012aa>] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.601345]  [<ffffffff81113778>] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.606595]  [<ffffffff811291f7>] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.612292]  [<ffffffffa056516b>] ? 
btrfs_mount+0x4a4/0x4d8 [btrfs]
Feb 16 06:01:28 nerv kernel: [ 3482.618673]  [<ffffffff810e52c0>] ? 
cpumask_next+0x18/0x1a
Feb 16 06:01:28 nerv kernel: [ 3482.624178]  [<ffffffff811012aa>] ? 
alloc_pages_current+0xbb/0xd8
Feb 16 06:01:28 nerv kernel: [ 3482.630347]  [<ffffffff81113778>] ? 
mount_fs+0x6c/0x149
Feb 16 06:01:28 nerv kernel: [ 3482.635580]  [<ffffffff811291f7>] ? 
vfs_kern_mount+0x67/0xdd
Feb 16 06:01:28 nerv kernel: [ 3482.641258]  [<ffffffff811292e0>] ? 
do_kern_mount+0x49/0xd6
Feb 16 06:01:29 nerv kernel: [ 3482.646855]  [<ffffffff81129a98>] ? 
do_mount+0x72b/0x791
Feb 16 06:01:29 nerv kernel: [ 3482.652186]  [<ffffffff81129b86>] ? 
sys_mount+0x88/0xc3
Feb 16 06:01:29 nerv kernel: [ 3482.657464]  [<ffffffff8139d229>] ? 
system_call_fastpath+0x16/0x1b

Furthermore, it struck me that the consequences of having to mount afilesystem with missing deviced with -o degraded can be a bit strange. Irealize what the intentions of the behavior is, of course, but I think itmight cause quite some difficulties when trying to mount a degraded btrfsfilesystem as root on a system that you don't have physical access to,like a hosted server, because it might be hard to manipulate the bootprocess so as to pass that mountflag to the initrd. Note that this is nota problem with md-raid; it will simply assemble its arrays in degradedmode automatically, without intervention. I'm not necessarily sayingthat's better, but I thought I should bring up the point.


--

Fredrik Tolf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Rebalancing RAID1

Reply via email to