Re: RAID-1 - handling disk failures?

2014-03-28 Thread Duncan
Tomasz Chmielewski posted on Thu, 27 Mar 2014 21:52:15 +0100 as excerpted:

 Is btrfs supposed to handle disk failures in RAID-1 mode?
 
 It doesn't seem to be the case for me, with 3.14.0-rc8.
 
 Right now, the system doesn't see the faulty drive anymore (i.e. hdparm
 -i /dev/sdd is unable to give any info).
 
 Accesses to most files on btrfs filesystem just freeze (waiting for
 IO) the process which is accessing the data.
 
 The other drive in RAID-1, /dev/sdc, is healthy.

Well, btrfs raid1 mode handles (single) drive loss, but rather 
differently than you might be used to raid1 working, if you've worked 
with it on mdraid or the like.

1) (Not directly related to your problem, but it likely differs from 
other raid1 you've worked with...) Unline normal raid1, btrfs' so-called 
raid1 mode is actually two-way-(only-)mirrored.  No matter how many 
devices there are in the filesystem, btrfs will only do two-way-mirroring 
of each chunk.  Thus, btrfs raid1 mode only tolerates loss of a single 
device without data loss, since once you lose two, both copies of some 
chunks will be gone and not recoverable, regardless of how many devices 
were in the raid1.

2) In btrfs, once you drop below the natural minimum number of devices to 
sustain that raid type, btrfs goes read-only as writes can no longer be 
done in the configured raid mode, which naturally blocks anything 
attempting to write to the filesystem.  I suspect that's what's happening 
to you.

With raid0 or raid1, the natural minimum operational number of devices is 
two.  With raid5, it's three.  With raid6 and raid10, it's four.  
(However, do note that raid5/6 support isn't complete yet.  Don't 
actually rely on it working as raid5/6 if something goes wrong, just yet.)

In your raid1 case, once you drop to a single device, writes can no 
longer be done to two mirrors, so the filesystem is forced read-only.  
Naturally that's going to hang any thread trying to do a write in 
D (disk-sleep) state.  Once those hung writing threads plug the IO 
queue reads will stall behind the writes, and anything trying to read 
from that filesystem will ultimately deadlock and hang as well.

OTOH, if you have more than the minimum number of devices, say you have 
three devices for raid1 mode, drop one device and writes can still be 
done in btrfs' normal two-way-mirrored raid1 write mode to the two 
remaining devices.  I'm not actually sure if it goes read-only when a 
device drops in this case or not, but if it does, you should be able to 
set it back to read/write mode and get on with things if you need to.

Basically what that means is that once you drop below two devices in 
raid1 mode, that btrfs will drop to read-only.  If it's your rootfs or 
the like, you're pretty well hosed and will be forced to reboot pretty 
quickly, altho if you catch it quickly enough you can probably umount 
other filesystems, etc, not on the dropped devices.  If it's just some 
auxiliary filesystem, you'll probably lose any processes working with it, 
but otherwise you should hopefully continue to stay in operation.

Mounting the still degraded filesystem in degraded mode (with the 
degraded mount-option) after a shutdown or other fully filesystem 
unmount, will result in the same force-read-only situation, except since 
the filesystem was never writable in the first place, nothing should have 
been able to open files on it in write-mode, so you should be able to get 
back workable enough at least to do a btrfs device add to it, bringing it 
back to the minimum two devices again, after which you should then be 
able to remount it writable.  With it again mounted writable, you should 
be able to do a btrfs device delete missing to remove the bad device, 
followed by a rebalance to create a new second mirror of all chunks where 
one mirror was on the missing device.

Basically, all this means in ordered to keep a btrfs raid1 fully usable 
without rebooting in the event of a dropped device, you'll need to build 
it out to three devices, so you can drop one and still have enough 
devices left to continue writing to a full pair of devices in two-way-
mirroring.

Depending upon your use-case, the drop to read-only and potentially 
forced-reboot may or may not be acceptable, as long as the data's still 
there and accessible, to copy elsewhere or whatever, after the reboot.  
If it's not acceptable, then as mentioned, do plan on making it three 
devices in normal mode, so the filesystem can continue writing in so-
called raid1 mode to the two remaining devices if one drops.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 - handling disk failures?

2014-03-28 Thread Tomasz Chmielewski
 2) In btrfs, once you drop below the natural minimum number of devices
 to sustain that raid type, btrfs goes read-only as writes can no
 longer be done in the configured raid mode, which naturally blocks
 anything attempting to write to the filesystem.  I suspect that's
 what's happening to you.

No, it never went into read only mode.
If it did, I would see:

# touch testfile
touch: cannot touch `testfile': Read-only file system

and not waiting for IO.

Anyway, the RAID-1 filesystem looks now hosed after a drive failed in
it, and btrfs filesystem hanged when adding a new device.

Getting these kernel oopses now when trying to write anything there:

[  553.040075] BUG: unable to handle kernel NULL pointer dereference at 
0098
[  553.040264] IP: [8111f33b] bio_get_nr_vecs+0x0/0x38
[  553.040378] PGD 0 
[  553.040484] Oops:  [#1] SMP 
[  553.040622] Modules linked in: cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate loop i2c_i801 parport_pc i2c_core parport tpm_infineon tpm_tis 
video ehci_pci pcspkr ehci_hcd lpc_ich mfd_core acpi_cpufreq button battery tpm 
ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata scsi_mod r8169 mii
[  553.042270] CPU: 1 PID: 4951 Comm: btrfs-delalloc- Not tainted 3.14.0-rc8 #1
[  553.042351] Hardware name: System manufacturer System Product Name/P8H77-M 
PRO, BIOS 1101 02/04/2013
[  553.042474] task: 8807f3f98000 ti: 8807ebc42000 task.ti: 
8807ebc42000
[  553.042594] RIP: 0010:[8111f33b]  [8111f33b] 
bio_get_nr_vecs+0x0/0x38
[  553.042749] RSP: 0018:8807ebc43af0  EFLAGS: 00010246
[  553.042828] RAX: 0100 RBX: 1000 RCX: 000214919ca0
[  553.042909] RDX: ea001f4ccc00 RSI: 8807ff148430 RDI: 
[  553.042990] RBP: 8807ebc43b48 R08: 1000 R09: 
[  553.043071] R10:  R11: 00014a98 R12: 8807ebc43c78
[  553.043151] R13:  R14: 000214919ca0 R15: 8807ff148430
[  553.043233] FS:  () GS:88081fa4() 
knlGS:
[  553.043354] CS:  0010 DS:  ES:  CR0: 80050033
[  553.043433] CR2: 0098 CR3: 0160b000 CR4: 001407e0
[  553.043513] Stack:
[  553.043587]  a02e3b08 0010ebc43b28  
ea001f4ccc00
[  553.043835]  0411 8807ebc43b28 ea001f4ccc00 

[  553.044082]  0001 8807ff148430 8807ff1485a8 
8807ebc43c58
[  553.044330] Call Trace:
[  553.044419]  [a02e3b08] ? submit_extent_page.isra.38+0x10c/0x17e 
[btrfs]
[  553.044551]  [a02e535d] __extent_writepage+0x542/0x5d2 [btrfs]
[  553.044643]  [a02e389a] ? end_extent_writepage+0x5c/0x5c [btrfs]
[  553.044734]  [a02e58c6] extent_write_locked_range+0xbf/0x124 
[btrfs]
[  553.044865]  [a02cec56] ? btrfs_fiemap+0x4c/0x4c [btrfs]
[  553.044954]  [a02d2349] submit_compressed_extents+0x133/0x424 
[btrfs]
[  553.045084]  [a02d26bd] async_cow_submit+0x83/0x88 [btrfs]
[  553.045174]  [a02f0fcc] run_ordered_completions+0x68/0xc5 [btrfs]
[  553.045264]  [a02f1659] worker_loop+0x16e/0x495 [btrfs]
[  553.045353]  [a02f14eb] ? btrfs_queue_worker+0x269/0x269 [btrfs]
[  553.045435]  [81050c92] kthread+0xcd/0xd5
[  553.045516]  [81050bc5] ? kthread_freezable_should_stop+0x43/0x43
[  553.045598]  [8139a03c] ret_from_fork+0x7c/0xb0
[  553.045678]  [81050bc5] ? kthread_freezable_should_stop+0x43/0x43
[  553.045758] Code: c4 b8 f1 ff 48 83 c8 ff 41 59 5b 5d c3 90 90 90 55 48 89 
e5 53 48 89 f3 51 f6 46 10 08 75 05 e8 e6 62 07 00 8b 43 38 5a 5b 5d c3 48 8b 
87 98 00 00 00 55 b9 00 01 00 00 48 89 e5 48 8b 90 80 02 
[  553.048083] RIP  [8111f33b] bio_get_nr_vecs+0x0/0x38
[  553.048196]  RSP 8807ebc43af0
[  553.048272] CR2: 0098
[  553.048349] ---[ end trace 36d74486b120a453 ]---
[  581.331680] BUG: unable to handle kernel NULL pointer dereference at 
0098
[  581.331867] IP: [8111f33b] bio_get_nr_vecs+0x0/0x38
[  581.331981] PGD 0 
[  581.332087] Oops:  [#2] SMP 
[  581.332227] Modules linked in: cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate loop i2c_i801 parport_pc i2c_core parport tpm_infineon tpm_tis 
video ehci_pci pcspkr ehci_hcd lpc_ich mfd_core acpi_cpufreq button battery tpm 
ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata scsi_mod r8169 mii
[  581.333870] CPU: 3 PID: 5025 Comm: btrfs-transacti Tainted: G  D  
3.14.0-rc8 #1
[  581.333989] Hardware name: System manufacturer System Product Name/P8H77-M 
PRO, BIOS 1101 02/04/2013
[  581.334109] task: 8807f3e3 ti: 8807e770a000 task.ti: 
8807e770a000
[  581.334226] RIP: 0010:[8111f33b]  [8111f33b] 

RAID-1 - handling disk failures?

2014-03-27 Thread Tomasz Chmielewski
Is btrfs supposed to handle disk failures in RAID-1 mode?

It doesn't seem to be the case for me, with 3.14.0-rc8.

Right now, the system doesn't see the faulty drive anymore (i.e. hdparm -i 
/dev/sdd is unable to give any info).

Accesses to most files on btrfs filesystem just freeze (waiting for IO) the 
process which is accessing the data.

The other drive in RAID-1, /dev/sdc, is healthy.

# grep -i btrfs syslog
Mar 27 09:57:59 bkp010 kernel: [157256.352840] BTRFS: bdev /dev/sdd1 errs: wr 
31, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.353334] BTRFS: bdev /dev/sdd1 errs: wr 
32, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.353816] BTRFS: bdev /dev/sdd1 errs: wr 
33, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.354338] BTRFS: bdev /dev/sdd1 errs: wr 
34, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.354826] BTRFS: bdev /dev/sdd1 errs: wr 
35, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.355314] BTRFS: bdev /dev/sdd1 errs: wr 
36, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.355810] BTRFS: bdev /dev/sdd1 errs: wr 
37, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.356302] BTRFS: bdev /dev/sdd1 errs: wr 
38, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.356790] BTRFS: bdev /dev/sdd1 errs: wr 
39, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:57:59 bkp010 kernel: [157256.357275] BTRFS: bdev /dev/sdd1 errs: wr 
40, rd 1, flush 0, corrupt 0, gen 0
Mar 27 09:58:02 bkp010 kernel: [157259.298965] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:02 bkp010 kernel: [157259.299309] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:02 bkp010 kernel: [157259.299637] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:04 bkp010 kernel: [157261.358796] btrfs_dev_stat_print_on_error: 
9038 callbacks suppressed
Mar 27 09:58:04 bkp010 kernel: [157261.358844] BTRFS: bdev /dev/sdd1 errs: wr 
9007, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.359215] BTRFS: bdev /dev/sdd1 errs: wr 
9008, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.359585] BTRFS: bdev /dev/sdd1 errs: wr 
9009, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.359954] BTRFS: bdev /dev/sdd1 errs: wr 
9010, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.360323] BTRFS: bdev /dev/sdd1 errs: wr 
9011, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.360693] BTRFS: bdev /dev/sdd1 errs: wr 
9012, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.361063] BTRFS: bdev /dev/sdd1 errs: wr 
9013, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.361433] BTRFS: bdev /dev/sdd1 errs: wr 
9014, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.361802] BTRFS: bdev /dev/sdd1 errs: wr 
9015, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:04 bkp010 kernel: [157261.362172] BTRFS: bdev /dev/sdd1 errs: wr 
9016, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.046550] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:09 bkp010 kernel: [157266.046931] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:09 bkp010 kernel: [157266.047307] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:09 bkp010 kernel: [157266.427724] btrfs_dev_stat_print_on_error: 
13860 callbacks suppressed
Mar 27 09:58:09 bkp010 kernel: [157266.427788] BTRFS: bdev /dev/sdd1 errs: wr 
22877, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.428288] BTRFS: bdev /dev/sdd1 errs: wr 
22878, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.431504] BTRFS: bdev /dev/sdd1 errs: wr 
22879, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.432047] BTRFS: bdev /dev/sdd1 errs: wr 
22880, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.499055] BTRFS: bdev /dev/sdd1 errs: wr 
22881, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.499453] BTRFS: bdev /dev/sdd1 errs: wr 
22882, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.499847] BTRFS: bdev /dev/sdd1 errs: wr 
22883, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.500238] BTRFS: bdev /dev/sdd1 errs: wr 
22884, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.500625] BTRFS: bdev /dev/sdd1 errs: wr 
22885, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:09 bkp010 kernel: [157266.501692] BTRFS: bdev /dev/sdd1 errs: wr 
22886, rd 73, flush 0, corrupt 0, gen 0
Mar 27 09:58:10 bkp010 kernel: [157267.726185] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:10 bkp010 kernel: [157267.726472] BTRFS: lost page write due to 
I/O error on /dev/sdd1
Mar 27 09:58:10 bkp010 kernel: