Re: Please help with exact actions for raid1 hot-swap
On 2017-09-11 17:33, Duncan wrote: Austin S. Hemmelgarn posted on Mon, 11 Sep 2017 11:11:01 -0400 as excerpted: On 2017-09-11 09:16, Marat Khalili wrote: Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? This depends. I don't remember if the replace command wipes the super-block on the old device after the replace completes or not. AFAIK it does. Based on checking after I sent my reply, it does. If it does not, then you can't safely mount the filesystem while that device is still in the system, but can transfer it to another system and mount it degraded (probably, not a certainty). It's worth noting that while this shouldn't be a problem here (because the magic should be gone), the problem does appear in other contexts. In particular, any context that does device duplication is a problem. This means dd-ing the content of a device to another device is a problem, because once btrfs device scan is triggered (and udev can trigger it automatically/unexpectedly), btrfs will see the second device and consider it part of the same filesystem as the first, causing problems if either one is mounted. dd-ing to a file tends to be less of a problem, because it's just a file until activated as a loopback device, and that doesn't tend to happen automatically. Similarly, lvm's device mirroring modes can be problematic, with udev again sometimes unexpectedly triggering btrfs device scan on device appearance, unless measures are taken to hide the new device. I tried lvm some time ago and decided I didn't find it useful for my on use- cases, so I don't know the details here, in particular, I'm not sure of the device hiding options, but there have certainly been threads on the list discussing the problem and the option to hide the device to prevent it came up in one of them. Based on my own experience, LVM works fine as of right now provided you use the standard LVM udev rules (which disable almost all udev processing on LVM internal devices). In fact, the only issues I've had in the past with BTRFS on LVM were related to dm-cache not properly hiding the backing device originally, and some generic stability issues early on with BTRFS on top of dm-thinp if it does, then you can safely keep the device in the system, but won't be able to move it to another computer and get data off of it. This should be the case. Tho it may be as simple as restoring the btrfs magic in the superblock to restore it to mountability, but I believe the replace process deletes chunks as they are transfered, so actually getting data off it may be more complicated than simply making it mountable again. Regardless of which is the case, you won't see /dev/sdb7 mounted as a separate filesystem when you reboot. Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? `btrfs replace` is functionally equivalent to using dd to copy the contents of the device being replaced to the new device, albeit a bit smarter (as mentioned above). This means in particular that it does not resize the filesystem (although i think I saw some discussion and possibly patches to handle that with a command-line option). This is documented. From the btrfs-replace manpage (from btrfs-progs 4.12, reformatted a bit here for posting): The needs to be same size or larger than the . Note: The filesystem has to be resized to fully take advantage of a larger target device, this can be achieved with btrfs filesystem resize :max /path << -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Austin S. Hemmelgarn posted on Mon, 11 Sep 2017 11:11:01 -0400 as excerpted: > On 2017-09-11 09:16, Marat Khalili wrote: >> Patrik, Duncan, thank you for the help. The `btrfs replace start >> /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't >> try to reboot yet, still have grub/efi/several mdadm partitions to >> copy). >> Does this mean: >> * I should not be afraid to reboot and find /dev/sdb7 mounted again? >> * I will not be able to easily mount /dev/sdb7 on a different computer >> to do some tests? > This depends. I don't remember if the replace command wipes the > super-block on the old device after the replace completes or not. AFAIK it does. > If it > does not, then you can't safely mount the filesystem while that device > is still in the system, but can transfer it to another system and mount > it degraded (probably, not a certainty). It's worth noting that while this shouldn't be a problem here (because the magic should be gone), the problem does appear in other contexts. In particular, any context that does device duplication is a problem. This means dd-ing the content of a device to another device is a problem, because once btrfs device scan is triggered (and udev can trigger it automatically/unexpectedly), btrfs will see the second device and consider it part of the same filesystem as the first, causing problems if either one is mounted. dd-ing to a file tends to be less of a problem, because it's just a file until activated as a loopback device, and that doesn't tend to happen automatically. Similarly, lvm's device mirroring modes can be problematic, with udev again sometimes unexpectedly triggering btrfs device scan on device appearance, unless measures are taken to hide the new device. I tried lvm some time ago and decided I didn't find it useful for my on use- cases, so I don't know the details here, in particular, I'm not sure of the device hiding options, but there have certainly been threads on the list discussing the problem and the option to hide the device to prevent it came up in one of them. > if it does, then you can > safely keep the device in the system, but won't be able to move it to > another computer and get data off of it. This should be the case. Tho it may be as simple as restoring the btrfs magic in the superblock to restore it to mountability, but I believe the replace process deletes chunks as they are transfered, so actually getting data off it may be more complicated than simply making it mountable again. > Regardless of which is the > case, you won't see /dev/sdb7 mounted as a separate filesystem when you > reboot. >> Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi >> show` still displays it as 2.71TiB, why? > `btrfs replace` is functionally equivalent to using dd to copy the > contents of the device being replaced to the new device, albeit a bit > smarter (as mentioned above). This means in particular that it does not > resize the filesystem (although i think I saw some discussion and > possibly patches to handle that with a command-line option). This is documented. From the btrfs-replace manpage (from btrfs-progs 4.12, reformatted a bit here for posting): >> The needs to be same size or larger than the . Note: The filesystem has to be resized to fully take advantage of a larger target device, this can be achieved with btrfs filesystem resize :max /path << -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 2017-09-11 09:16, Marat Khalili wrote: Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). It also worked much faster than mdadm would take, apparently only moving 126GB used, not 2.71TB total. This is why replace is preferred over add/remove. The replace operation only copies exactly the data that is needed off of the old device, instead of copying the whole device like LVM and MD need to, or rewriting the whole filesystem (like add/remove does). For what it's worth, if you can't use replace for some reason and have to use add and remove, it is more efficient to add the new device and then remove the old one, because it will require less data movement to get a properly balanced filesystem (removing a device is actually a balance operation that prevents writes to the device being removed). Interestingly, according to HDD lights it mostly read from the remaining /dev/sda, not from replaced /dev/sdb (which must be completely readable now according to smartctl -- problematic sector got finally remapped after ~1day). This is odd. I was under the impression that replace preferentially reads from the device being replaced unless you tell it to avoid reading from said device. It now looks like follows: $ sudo blkid /dev/sda7 /dev/sdb7 /dev/sdd7 /dev/sda7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="db644855-2334-4d61-a27b-9a591255aa39" TYPE="btrfs" PARTUUID="c5ceab7e-e5f8-47c8-b922-c5fa0678831f" /dev/sdb7: PARTUUID="493923cd-9ecb-4ee8-988b-5d0bfa8991b3" /dev/sdd7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="9c2f05e9-5996-479f-89ad-f94f7ce130e6" TYPE="btrfs" PARTUUID="178cd274-7251-4d25-9116-ce0732d2410b" $ sudo btrfs fi show /dev/sdb7 ERROR: no btrfs on /dev/sdb7 $ sudo btrfs fi show /dev/sdd7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 108.05GiB devid 1 size 2.71TiB used 131.03GiB path /dev/sda7 devid 2 size 2.71TiB used 131.03GiB path /dev/sdd7 Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? This depends. I don't remember if the replace command wipes the super-block on the old device after the replace completes or not. If it does not, then you can't safely mount the filesystem while that device is still in the system, but can transfer it to another system and mount it degraded (probably, not a certainty). if it does, then you can safely keep the device in the system, but won't be able to move it to another computer and get data off of it. Regardless of which is the case, you won't see /dev/sdb7 mounted as a separate filesystem when you reboot. Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? `btrfs replace` is functionally equivalent to using dd to copy the contents of the device being replaced to the new device, albeit a bit smarter (as mentioned above). This means in particular that it does not resize the filesystem (although i think I saw some discussion and possibly patches to handle that with a command-line option). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Patrik, Duncan, thank you for the help. The `btrfs replace start /dev/sdb7 /dev/sdd7 /mnt/data` worked without a hitch (though I didn't try to reboot yet, still have grub/efi/several mdadm partitions to copy). It also worked much faster than mdadm would take, apparently only moving 126GB used, not 2.71TB total. Interestingly, according to HDD lights it mostly read from the remaining /dev/sda, not from replaced /dev/sdb (which must be completely readable now according to smartctl -- problematic sector got finally remapped after ~1day). It now looks like follows: $ sudo blkid /dev/sda7 /dev/sdb7 /dev/sdd7 /dev/sda7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="db644855-2334-4d61-a27b-9a591255aa39" TYPE="btrfs" PARTUUID="c5ceab7e-e5f8-47c8-b922-c5fa0678831f" /dev/sdb7: PARTUUID="493923cd-9ecb-4ee8-988b-5d0bfa8991b3" /dev/sdd7: LABEL="data" UUID="37d3313a-e2ad-4b7f-98fc-a01d815952e0" UUID_SUB="9c2f05e9-5996-479f-89ad-f94f7ce130e6" TYPE="btrfs" PARTUUID="178cd274-7251-4d25-9116-ce0732d2410b" $ sudo btrfs fi show /dev/sdb7 ERROR: no btrfs on /dev/sdb7 $ sudo btrfs fi show /dev/sdd7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 108.05GiB devid1 size 2.71TiB used 131.03GiB path /dev/sda7 devid2 size 2.71TiB used 131.03GiB path /dev/sdd7 Does this mean: * I should not be afraid to reboot and find /dev/sdb7 mounted again? * I will not be able to easily mount /dev/sdb7 on a different computer to do some tests? Also, although /dev/sdd7 is much larger than /dev/sdb7 was, `btrfs fi show` still displays it as 2.71TiB, why? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 2017-09-10 02:33, Marat Khalili wrote: It doesn't need replaced disk to be readable, right? Then what prevents same procedure to work without a spare bay? In theory, nothing. In practice, there are reliability issues with mounting a filesystem degraded (and you should be avoiding running any array degraded, regardless of if it's BTRFS or actual RAID (be that LVM, MD, or hardware)). It's also significantly faster to do it with a spare drive bay because that will just read from the device being replaced and copy data directly, while pulling the device to be replaced requires rebuilding the data (there is more involved than just copying, even with a raid1 profile). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 10 September 2017 at 08:33, Marat Khaliliwrote: > It doesn't need replaced disk to be readable, right? Only enough to be mountable, which it already is, so your read errors on /dev/sdb isn't a problem. > Then what prevents same procedure to work without a spare bay? It is basically the same procedure but with a bunch of gotchas due to bugs and odd behaviour. Only having one shot at it, before it can only be mounted read-only, is especially problematic (will be fixed in Linux 4.14). > -- > > With Best Regards, > Marat Khalili > > On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquist > wrote: >>On 9 September 2017 at 12:05, Marat Khalili wrote: >>> Forgot to add, I've got a spare empty bay if it can be useful here. >> >>That makes it much easier since you don't have to mount it degraded, >>with the risks involved. >> >>Add and partition the disk. >> >># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data >> >>Remove the old disk when it is done. >> >>> -- >>> >>> With Best Regards, >>> Marat Khalili >>> >>> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili >> wrote: Dear list, I'm going to replace one hard drive (partition actually) of a btrfs raid1. Can you please spell exactly what I need to do in order to get my filesystem working as RAID1 again after replacement, exactly as it >>was before? I saw some bad examples of drive replacement in this list so >>I afraid to just follow random instructions on wiki, and putting this system out of action even temporarily would be very inconvenient. For this filesystem: > $ sudo btrfs fi show /dev/sdb7 > Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 > Total devices 2 FS bytes used 106.23GiB > devid1 size 2.71TiB used 126.01GiB path /dev/sda7 > devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 > $ grep /mnt/data /proc/mounts > /dev/sda7 /mnt/data btrfs > rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 > $ sudo btrfs fi df /mnt/data > Data, RAID1: total=123.00GiB, used=104.57GiB > System, RAID1: total=8.00MiB, used=48.00KiB > Metadata, RAID1: total=3.00GiB, used=1.67GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > $ uname -a > Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux I've got this in dmesg: > [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 > action 0x0 > [ +0.51] ata6.00: irq_stat 0x4008 > [ +0.29] ata6.00: failed command: READ FPDMA QUEUED > [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag >>3 > ncq 57344 in >res 41/40:00:68:6c:f3/00:00:79:00:00/40 >>Emask > 0x409 (media error) > [ +0.94] ata6.00: status: { DRDY ERR } > [ +0.26] ata6.00: error: { UNC } > [ +0.001195] ata6.00: configured for UDMA/133 > [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: >>hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error > [current] [descriptor] > [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read > error - auto reallocate failed > [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 >>00 > 79 f3 6c 50 00 00 00 70 00 00 > [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 > [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd > 1, flush 0, corrupt 0, gen 0 > [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd > 2, flush 0, corrupt 0, gen 0 > [ +0.77] ata6: EH complete There's still 1 in Current_Pending_Sector line of smartctl output as >>of now, so it probably won't heal by itself. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe >>linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
It doesn't need replaced disk to be readable, right? Then what prevents same procedure to work without a spare bay? -- With Best Regards, Marat Khalili On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquistwrote: >On 9 September 2017 at 12:05, Marat Khalili wrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > >That makes it much easier since you don't have to mount it degraded, >with the risks involved. > >Add and partition the disk. > ># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > >Remove the old disk when it is done. > >> -- >> >> With Best Regards, >> Marat Khalili >> >> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili > wrote: >>>Dear list, >>> >>>I'm going to replace one hard drive (partition actually) of a btrfs >>>raid1. Can you please spell exactly what I need to do in order to get >>>my >>>filesystem working as RAID1 again after replacement, exactly as it >was >>>before? I saw some bad examples of drive replacement in this list so >I >>>afraid to just follow random instructions on wiki, and putting this >>>system out of action even temporarily would be very inconvenient. >>> >>>For this filesystem: >>> $ sudo btrfs fi show /dev/sdb7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 106.23GiB devid1 size 2.71TiB used 126.01GiB path /dev/sda7 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 $ grep /mnt/data /proc/mounts /dev/sda7 /mnt/data btrfs rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 $ sudo btrfs fi df /mnt/data Data, RAID1: total=123.00GiB, used=104.57GiB System, RAID1: total=8.00MiB, used=48.00KiB Metadata, RAID1: total=3.00GiB, used=1.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ uname -a Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>>I've got this in dmesg: >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 0x0 [ +0.51] ata6.00: irq_stat 0x4008 [ +0.29] ata6.00: failed command: READ FPDMA QUEUED [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag >3 ncq 57344 in res 41/40:00:68:6c:f3/00:00:79:00:00/40 >Emask 0x409 (media error) [ +0.94] ata6.00: status: { DRDY ERR } [ +0.26] ata6.00: error: { UNC } [ +0.001195] ata6.00: configured for UDMA/133 [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: >hostbyte=DID_OK driverbyte=DRIVER_SENSE [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] [descriptor] [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 >00 >>> 79 f3 6c 50 00 00 00 70 00 00 [ +0.03] blk_update_request: I/O error, dev sdb, sector >>>2045996136 [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 1, flush 0, corrupt 0, gen 0 [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 2, flush 0, corrupt 0, gen 0 [ +0.77] ata6: EH complete >>> >>>There's still 1 in Current_Pending_Sector line of smartctl output as >of >>> >>>now, so it probably won't heal by itself. >>> >>>-- >>> >>>With Best Regards, >>>Marat Khalili >>>-- >>>To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" >>>in >>>the body of a message to majord...@vger.kernel.org >>>More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Patrik Lundquist posted on Sat, 09 Sep 2017 12:29:08 +0200 as excerpted: > On 9 September 2017 at 12:05, Marat Khaliliwrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > > That makes it much easier since you don't have to mount it degraded, > with the risks involved. > > Add and partition the disk. > > # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > > Remove the old disk when it is done. I did this with my dozen-plus (but small) btrfs raid1s on ssd partitions several kernel cycles ago. It went very smoothly. =:^) (TL;DR can stop there.) I had actually been taking advantage of btrfs raid1's checksumming and scrub ability to continue running a failing ssd, with more and more sectors going bad and being replaced from spares, for quite some time after I'd have otherwise replaced it. Everything of value was backed up, and I was simply doing it for the experience with both btrfs raid1 scrubbing and continuing ssd sector failure. But eventually the scrubs were finding and fixing errors every boot, especially when off for several hours, and further experience was of diminishing value while the hassle factor was building fast, so I attached the spare ssd, partitioned it up, did a final scrub on all the btrfs, and then one btrfs at a time btrfs replaced the devices from the old ssd's partitions to the new one's partitions. Given that I was already used to running scrubs at every boot, the entirely uneventful replacements were actually somewhat anticlimactic, but that was a good thing! =:^) Then more recently I bought a larger/newer pair of ssds (1 TB each, the old ones were quarter TB each) and converted my media partitions and secondary backups, which had still been on reiserfs on spinning rust, to btrfs raid1 on ssd as well, making me all-btrfs on all-ssd now, with everything but /boot and its backups on the other ssds being btrfs raid1, and /boot and its backups being btrfs dup. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 12:05, Marat Khaliliwrote: > Forgot to add, I've got a spare empty bay if it can be useful here. That makes it much easier since you don't have to mount it degraded, with the risks involved. Add and partition the disk. # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data Remove the old disk when it is done. > -- > > With Best Regards, > Marat Khalili > > On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili wrote: >>Dear list, >> >>I'm going to replace one hard drive (partition actually) of a btrfs >>raid1. Can you please spell exactly what I need to do in order to get >>my >>filesystem working as RAID1 again after replacement, exactly as it was >>before? I saw some bad examples of drive replacement in this list so I >>afraid to just follow random instructions on wiki, and putting this >>system out of action even temporarily would be very inconvenient. >> >>For this filesystem: >> >>> $ sudo btrfs fi show /dev/sdb7 >>> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >>> Total devices 2 FS bytes used 106.23GiB >>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >>> $ grep /mnt/data /proc/mounts >>> /dev/sda7 /mnt/data btrfs >>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >>> $ sudo btrfs fi df /mnt/data >>> Data, RAID1: total=123.00GiB, used=104.57GiB >>> System, RAID1: total=8.00MiB, used=48.00KiB >>> Metadata, RAID1: total=3.00GiB, used=1.67GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> $ uname -a >>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >>> 2017 x86_64 x86_64 x86_64 GNU/Linux >> >>I've got this in dmesg: >> >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >>> action 0x0 >>> [ +0.51] ata6.00: irq_stat 0x4008 >>> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >>> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >>> ncq 57344 in >>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >>> 0x409 (media error) >>> [ +0.94] ata6.00: status: { DRDY ERR } >>> [ +0.26] ata6.00: error: { UNC } >>> [ +0.001195] ata6.00: configured for UDMA/133 >>> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >>> driverbyte=DRIVER_SENSE >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >>> [current] [descriptor] >>> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >>> error - auto reallocate failed >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 >> >>> 79 f3 6c 50 00 00 00 70 00 00 >>> [ +0.03] blk_update_request: I/O error, dev sdb, sector >>2045996136 >>> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 1, flush 0, corrupt 0, gen 0 >>> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 2, flush 0, corrupt 0, gen 0 >>> [ +0.77] ata6: EH complete >> >>There's still 1 in Current_Pending_Sector line of smartctl output as of >> >>now, so it probably won't heal by itself. >> >>-- >> >>With Best Regards, >>Marat Khalili >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Forgot to add, I've got a spare empty bay if it can be useful here. -- With Best Regards, Marat Khalili On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khaliliwrote: >Dear list, > >I'm going to replace one hard drive (partition actually) of a btrfs >raid1. Can you please spell exactly what I need to do in order to get >my >filesystem working as RAID1 again after replacement, exactly as it was >before? I saw some bad examples of drive replacement in this list so I >afraid to just follow random instructions on wiki, and putting this >system out of action even temporarily would be very inconvenient. > >For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >> 2017 x86_64 x86_64 x86_64 GNU/Linux > >I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >> action 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >> ncq 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >> 0x409 (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >> [current] [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >> error - auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 > >> 79 f3 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector >2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 1, flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 2, flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > >There's still 1 in Current_Pending_Sector line of smartctl output as of > >now, so it probably won't heal by itself. > >-- > >With Best Regards, >Marat Khalili >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 09:46, Marat Khaliliwrote: > > Dear list, > > I'm going to replace one hard drive (partition actually) of a btrfs raid1. > Can you please spell exactly what I need to do in order to get my filesystem > working as RAID1 again after replacement, exactly as it was before? I saw > some bad examples of drive replacement in this list so I afraid to just > follow random instructions on wiki, and putting this system out of action > even temporarily would be very inconvenient. I recently replaced both disks in a two disk Btrfs raid1 to increase capacity and took some notes. Using systemd? systemd will automatically unmount a degraded disk and ruin your one chance to replace the disk as long as Btrfs has the bug where it notes single chunks and one disk missing and refuses to mount degraded again. Comment out your mount in fstab and run "systemctl daemon-reload". The mount file in /var/run/systemd/generator/ will be removed. (Is there a better way?) Unmount the volume. # hdparm -Y /dev/sdb # echo 1 > /sys/block/sdb/device/delete Replace the disk. Create partitions etc. You might have to restart smartd, if using it. Make Btrfs forget the old device. Will otherwise think the old disk is still there. (Is there a better way?) # rmmod btrfs; modprobe btrfs # btrfs device scan # mount -o degraded /dev/sda7 /mnt/data # btrfs device usage /mnt/data # btrfs replace start /dev/sdbX /mnt/data # btrfs replace status /mnt/data Convert single or dup chunks to raid1 # btrfs balance start -fv -dconvert=raid1,soft -mconvert=raid1,soft -sconvert=raid1,soft /mnt/data Unmount, restore fstab, reload systemd again, mount. > > For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 >> x86_64 x86_64 x86_64 GNU/Linux > > > I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action >> 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq >> 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 >> (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] >> [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - >> auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 >> 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, >> flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, >> flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > > > There's still 1 in Current_Pending_Sector line of smartctl output as of now, > so it probably won't heal by itself. > > -- > > With Best Regards, > Marat Khalili > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html