Progress of device deletion?
Dear list, Is there a way to find out what btrfs device delete is doing, and how far it has come? I know there's no status command, but I'm thinking it might be possible to obtain something via some other channel, such as btrfs fi df and btrfs fi show, which I have been using to try and figure out what is happening. In this context, I've been trying to remove two drives from a filesystem of four, using the following command: btrfs dev del /dev/sdg1 /dev/sdh1 /home/pub It has been running for almost 36 hours by now, however, and I'm kinda wondering what's happening. :) I've been trying to monitor it using btrfs fi df /home/pub and btrfs fi show. Before starting to remove the devices, they gave the following output: $ sudo btrfs fi df /home/pub Data, RAID0: total=10.9TB, used=3.35TB System, RAID1: total=32.00MB, used=200.00KB Metadata, RAID1: total=5.00GB, used=3.68GB $ sudo btrfs fi show Label: none uuid: 40d346bb-2c77-4a78-8803-1e441bf0aff7 Total devices 4 FS bytes used 3.35TB devid5 size 2.73TB used 2.71TB path /dev/sdh1 devid4 size 2.73TB used 2.71TB path /dev/sdg1 devid3 size 2.73TB used 2.71GB path /dev/sdd1 devid2 size 2.73TB used 2.71GB path /dev/sde1 The Data part of the df output has since been decreasing (which I have been using as sign of progress), but only until it hit 3.36 TB: $ sudo btrfs fi df /home/pub/video/ Data, RAID0: total=3.36TB, used=3.35TB System, RAID1: total=32.00MB, used=200.00KB Metadata, RAID1: total=5.00GB, used=3.68GB It has been sitting there for quite some hours by now. show, on its hand, displays the following: $ sudo btrfs fi show Label: none uuid: 40d346bb-2c77-4a78-8803-1e441bf0aff7 Total devices 4 FS bytes used 3.35TB devid5 size 2.73TB used 965.00GB path /dev/sdh1 devid4 size 2.73TB used 2.71TB path /dev/sdg1 devid3 size 2.73TB used 968.03GB path /dev/sdd1 devid2 size 2.73TB used 969.03GB path /dev/sde1 This I find quite weird. Why is the usage of sdd1 and sde1 decreasing, when those are not the disks I'm trying to remove, while sdg1 sits there at its original usage, when it is one of those I have requested to have removed? By the way, since the Data part hit 3.36TB, those usages of sdd1, sde1 and sdh1 have been fluctuating up and down between around 850GB up to around those values shown right now. Is there any way I can find out what's going on? -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Changing allocation mode
On Wed, 27 Feb 2013, Liu Bo wrote: On Sat, Feb 23, 2013 at 01:46:03AM +0100, Fredrik Tolf wrote: If I were transferring the data to a new filesystem on mdraid, the procedure I would use for that last portion of the data would be to remove one disk only from either of the old mdraid mirror arrays (putting that array in degraded mode), and then create a new mirror in degraded mode with only that disk, add that mirror to the new filesystem, expand it, copy the last data, and then delete the old mirrors, moving the rest of the disks to the new filesystem. That sounds like using seed device, although seed disk is designed for another different purpose. It does? I must admit I don't see quite how that would be applicable. -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalancing RAID1
On Mon, 18 Feb 2013, Stefan Behrens wrote: On Fri, 15 Feb 2013 22:56:19 +0100 (CET), Fredrik Tolf wrote: The oops cut can be found here: http://www.dolda2000.com/~fredrik/tmp/btrfs-oops This scrub issue is fixed since Linux 3.8-rc1 with commit 4ded4f6 Btrfs: fix BUG() in scrub when first superblock reading gives EIO I see, thanks! Rebooting the system did get me running again, allowing me to remove the missing device from filesystem. However, I encountered a couple of somewhat strange happenings as I did that. I don't know if they're considered bugs or not, but I thought I had best report them. To begin with, the act of removing the missing device from the filesystem itself caused the resynchronization to the new device to happen in blocking mode, so the btrfs device delete missing operation took about a day to finish. My expectation would have been that the device removal would have been a fast operation and that I would have had to scrub the filesystem or something in order to resynchronize, but I can see how this would be intented behavior. However, what's weirder is that while the resynchronization was underway, I couldn't mount subvolumes on other mountpoints. The mount commands blocked (disk-slept) until the entire synchronization was done, and I don't think this was intended behavior, because I had the kernel saying the following while it happened: Feb 16 06:01:27 nerv kernel: [ 3482.512106] INFO: task mount:3525 blocked for more than 120 seconds. Feb 16 06:01:28 nerv kernel: [ 3482.518484] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Feb 16 06:01:28 nerv kernel: [ 3482.526324] mount D 88003e220e40 0 3525 3524 0x Feb 16 06:01:28 nerv kernel: [ 3482.533587] 88003e220e40 0082 a0067470 88003e2300c0 Feb 16 06:01:28 nerv kernel: [ 3482.541088] 00013b40 88001126dfd8 00013b40 88001126dfd8 Feb 16 06:01:28 nerv kernel: [ 3482.548584] 00013b40 88003e220e40 00013b40 88001126c010 Feb 16 06:01:28 nerv kernel: [ 3482.556280] Call Trace: Feb 16 06:01:28 nerv kernel: [ 3482.558776] [81396132] ? __mutex_lock_common+0x10d/0x175 Feb 16 06:01:28 nerv kernel: [ 3482.565078] [81396260] ? mutex_lock+0x1a/0x2c Feb 16 06:01:28 nerv kernel: [ 3482.570661] [a05a38c2] ? btrfs_scan_one_device+0x40/0x133 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.577752] [a0564e8b] ? btrfs_mount+0x1c4/0x4d8 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.584080] [810e56cb] ? pcpu_next_pop+0x37/0x43 Feb 16 06:01:28 nerv kernel: [ 3482.589709] [810e52c0] ? cpumask_next+0x18/0x1a Feb 16 06:01:28 nerv kernel: [ 3482.595226] [811012aa] ? alloc_pages_current+0xbb/0xd8 Feb 16 06:01:28 nerv kernel: [ 3482.601345] [81113778] ? mount_fs+0x6c/0x149 Feb 16 06:01:28 nerv kernel: [ 3482.606595] [811291f7] ? vfs_kern_mount+0x67/0xdd Feb 16 06:01:28 nerv kernel: [ 3482.612292] [a056516b] ? btrfs_mount+0x4a4/0x4d8 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.618673] [810e52c0] ? cpumask_next+0x18/0x1a Feb 16 06:01:28 nerv kernel: [ 3482.624178] [811012aa] ? alloc_pages_current+0xbb/0xd8 Feb 16 06:01:28 nerv kernel: [ 3482.630347] [81113778] ? mount_fs+0x6c/0x149 Feb 16 06:01:28 nerv kernel: [ 3482.635580] [811291f7] ? vfs_kern_mount+0x67/0xdd Feb 16 06:01:28 nerv kernel: [ 3482.641258] [811292e0] ? do_kern_mount+0x49/0xd6 Feb 16 06:01:29 nerv kernel: [ 3482.646855] [81129a98] ? do_mount+0x72b/0x791 Feb 16 06:01:29 nerv kernel: [ 3482.652186] [81129b86] ? sys_mount+0x88/0xc3 Feb 16 06:01:29 nerv kernel: [ 3482.657464] [8139d229] ? system_call_fastpath+0x16/0x1b Furthermore, it struck me that the consequences of having to mount a filesystem with missing deviced with -o degraded can be a bit strange. I realize what the intentions of the behavior is, of course, but I think it might cause quite some difficulties when trying to mount a degraded btrfs filesystem as root on a system that you don't have physical access to, like a hosted server, because it might be hard to manipulate the boot process so as to pass that mountflag to the initrd. Note that this is not a problem with md-raid; it will simply assemble its arrays in degraded mode automatically, without intervention. I'm not necessarily saying that's better, but I thought I should bring up the point. -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Changing allocation mode
Dear list, I'm still in the process of transferring all the data I have to the btrfs filesystem I have had your help in debugging in a previous thread, and I have a slight question, if you will humour me. I have the data I want to transfer on an old ReiserFS partition, consisting of 2 mdraid mirrors, one of which consists of two 1.5 TB disks, and the other of two 3 TB disks. The btrfs I'm copying the data to consists of two 3 TB disks only that I have put in RAID-1 mode, and the data on the old filesystem is only slightly larger than 3 TB. I am now at the point where I have transferred just under 3 TB. If I were transferring the data to a new filesystem on mdraid, the procedure I would use for that last portion of the data would be to remove one disk only from either of the old mdraid mirror arrays (putting that array in degraded mode), and then create a new mirror in degraded mode with only that disk, add that mirror to the new filesystem, expand it, copy the last data, and then delete the old mirrors, moving the rest of the disks to the new filesystem. Is there a way to mirror this procedure in btrfs? I'm not yet quite so familiar with all btrfs concepts that I know quite what I'm talking about, but I'm guessing that what I want to do is to merely temporarily set the allocator to allocate new btrfs on a single disk only, and then add a single disk to the filesystem. And then copy the rest of the data, abandon the old filesystem and add another disk and rebalance those singly-allocated extents to RAID-1 mode. Have I described a conceptionable idea in saying so? And if so, how does one actually do that? I don't know if I'm just blind, but I haven't found any btrfs command to change the allocation algorithm without having to rebalance the existing data, which seems a bit unnecessary in this case. Thanks for any help you can offer! -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalancing RAID1
On Fri, 15 Feb 2013, Martin Steigerwald wrote: So or so I think a kernel bug is involved here. Well, *some* kernel bug is certainly involved. :) I did wipe the filesystem off the device and reinserted it as a new device into the filesystem. After that, btrfs fi show gave me the following: $ sudo ./btrfs fi show Label: none uuid: 40d346bb-2c77-4a78-8803-1e441bf0aff7 Total devices 3 FS bytes used 2.66TB devid3 size 2.73TB used 0.00 path /dev/sdi1 devid2 size 2.73TB used 2.67TB path /dev/sde1 *** Some devices missing I then proceeded to try to remove the missing devices with btrfs dev del missing /mnt, but it made no difference whatever, with the kernel saying the following: Feb 15 07:12:29 nerv kernel: [262110.799823] btrfs: no missing devices found to remove This seems odd enough, seeing as how btrfs fi show says there are missing devices, and the kernel contradicting that. Either way, I tried to start a scrub on the filesystem, too, seeing if that would make a difference, but that oopsed the kernel. :) The oops cut can be found here: http://www.dolda2000.com/~fredrik/tmp/btrfs-oops So with that, I'm certainly going to reboot the machine. :) -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalancing RAID1
On Wed, 13 Feb 2013, Chris Murphy wrote: On Feb 12, 2013, at 11:18 PM, Fredrik Tolf fred...@dolda2000.com wrote: That's not typical for actual media problems, in my experience. :) Quite typical, because these drives don't support SCTERC which almost certainly means their error timeouts are well above that of the linux SCSI layer which is 30 seconds. Their timeouts are likely around 2 minutes. So in fact they never report back a URE because the command timer times out and resets the drive. That's interesting to read. I haven't ever actually experienced missing a bad sector reported by a hard drive, though; and not for a lack of experience with bad sectors. Either way, though, with the assumption that it actually was a cable problem rather than bad medium... However, in your case, with both the kernel message ICRC ABRT, and the following SMART entry, this is your cable problem. ... I'd still like to solve the problem as it is, so that I know what to do the next time I get some device error. So the question is whether the cable problem has actually been fixed, and if you're still getting ICRC errors from the kernel. I'm not getting any block-layer errors from the kernel. The errors I posted originally are the only ones I'm getting. As this is hdi, I'm wondering how many drives are connected, and if this could be power induced rather than just cable induced. With the general change, I actually decreased the number of drives in the system from 10 to 8, so unless the new drives are incredibly more power-hungry than the old ones, that shouldn't be a problem. Once that's solved, you should do a scrub, rather than a rebalance. Oh, will scrubbing actually rebalance the array? I was under the impression that it only checked for bad checksums. I'm still wondering what those errors actually mean, though. I'm still getting them occasionally, even when I'm not rebalancing (just not as often). I'm also very curious about what it means that it's still complaining about sdd rather than sdi. It's worth noting that I still haven't un- and remounted the filesystem since the drive disconnected. I assumed that I shouldn't need to and that the multiple-device layer of btrfs should handle the situation correctly. Is that assumption correct? -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalancing RAID1
On Thu, 14 Feb 2013, Chris Murphy wrote: So the question is whether the cable problem has actually been fixed, and if you're still getting ICRC errors from the kernel. I'm not getting any block-layer errors from the kernel. The errors I posted originally are the only ones I'm getting. Previously you reported: Feb 12 16:36:51 nerv kernel: [36769.574831] ata6.00: status: { DRDY ERR } Feb 12 16:36:52 nerv kernel: [36769.578867] ata6.00: error: { ICRC ABRT } These are not block errors. You should not proceed until you're certain this isn't still intermittently occurring. Sorry for being unclear. By block-layer errors I intended to mean hardware/driver errors, as those are, as opposed to filesystem errors, but I guess that's not the vernacular use of the term. To try to be clearer, then: I am not getting ICRC errors anymore, or any driver-related errors whatsoever. I was only getting them when sdd was originally lost, and have not been getting any of them since. The errors I am currently getting, and the ones I was getting during the rebalance, are those I reported in the original mail; that is: Feb 14 08:32:30 nerv kernel: [180511.760850] lost page write due to I/O error on /dev/sdd1 Feb 14 08:32:30 nerv kernel: [180511.764690] btrfs: bdev /dev/sdd1 errs: wr 288650, rd 26, flush 1, corrupt 0, gen 0 I am only getting those messages from the kernel, and nothing else. Currently, those two messages are the only ones I'm getting at all (except with slightly different numeric parameters, of course); while I was trying to rebalance, I also got messages looking like this: Feb 12 22:57:16 nerv kernel: [59596.948464] btrfs: relocating block group 2879804932096 flags 17 Feb 12 22:57:45 nerv kernel: [59626.618280] btrfs_end_buffer_write_sync: 8 callbacks suppressed Feb 12 22:57:45 nerv kernel: [59626.621893] btrfs_dev_stat_print_on_error: 8 callbacks suppressed Feb 12 22:57:48 nerv kernel: [59629.569278] btrfs: found 46 extents I hope that clears it up. Once that's solved, you should do a scrub, rather than a rebalance. Oh, will scrubbing actually rebalance the array? I was under the impression that it only checked for bad checksums. Scrubbing does not balance the volume. Based on the information you supplied I don't really see the reason for a rebalance. Maybe my terminology is wrong again, then, because I do see a reason to get the data properly replicated across the drives, which it doesn't seem to be now. That's what I meant by rebalancing. What you do next depends on what your goal is for this data, on these two disks, using btrfs. If the idea is to trust the data on the volume; you still have the source data so I'd mkfs.btrfs on the disks and start over. If the idea is to experiment and learn, you might want to do a btrfsck, followed by a scrub. I'm still keeping the original data just in case, of course. However, my primary goal right now is to learn how to manage redundancy reliably with btrfs. I mean, with md, I can easily handle a device failure and fix it up without having to remount or reboot; and I've assumed that I should be able to do that with btrfs as well (please correct me if that assumption is invalid, though). Btrfs is stable on stable hardware. Your hardware most definitely was not stable during a series of writes. So I'd say all bets are off. That doesn't mean it can't be fixed, but the very fact you're still getting errors indicates something is still wrong. Isn't btrfs' RAID1 supposed to be stable as long as only one disk fails, though? This: Feb 12 22:57:45 nerv kernel: [59626.644110] lost page write due to I/O error on /dev/sdd1 Are not btrfs errors. I see. I thought that was a btrfs error, but I was wrong then. Since I'm not actually getting any driver errors, though, and it's referring to sdd, doesn't that just mean, as I suspect, that btrfs is still trying to use the old defunct sdd instead of sdi as the drive became named after it was redetected? This: Feb 12 16:36:51 nerv kernel: [36769.574831] ata6.00: status: { DRDY ERR } Feb 12 16:36:52 nerv kernel: [36769.578867] ata6.00: error: { ICRC ABRT } Just to be overly redundant: I'm not getting those anymore, and I only ever got them before the drive was redetected as sdi. -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rebalancing RAID1
Dear list, I'm sorry if this is a dumb n3wb question, but I couldn't find anything about it, so please bear with me. I just decided to try BtrFS for the first time, to replace an old ReiserFS data partition currently on a mdadm mirror. To do so, I'm using two 3 TB disks that were initially detected as sdd and sde, on which I have a single large GPT partition, so the devices I'm using for btrfs are sdd1 and sde1. I created a filesystem on them using RAID1 from the start (mkfs.btrfs -d raid -m raid1 /dev/sd{d,e}1), and started copying the data from the old partition onto it during the night. As it happened, I immediately got reason to try out BtrFS recovery because sometime during the copying operation /dev/sdd had some kind of cable failure and was removed from the system. A while later, however, it was apparently auto-redetected, this time as /dev/sdi, and BtrFS seems to have inserted it back into the filesystem somehow. The current situation looks like this: $ sudo ./btrfs fi show Label: none uuid: 40d346bb-2c77-4a78-8803-1e441bf0aff7 Total devices 2 FS bytes used 1.64TB devid1 size 2.73TB used 1.64TB path /dev/sdi1 devid2 size 2.73TB used 2.67TB path /dev/sde1 Btrfs v0.20-rc1-56-g6cd836d As you can see, /dev/sdi1 has much less space used, which I can only assume is because extents weren't allocated on it while it was off-line. I'm now trying to remedy this, but I'm not sure if I'm doing it right. What I'm doing is to run btrfs fi bal start /mnt , and it gives me a ton of kernel messages that look like this: Feb 12 22:57:16 nerv kernel: [59596.948464] btrfs: relocating block group 2879804932096 flags 17 Feb 12 22:57:45 nerv kernel: [59626.618280] btrfs_end_buffer_write_sync: 8 callbacks suppressed Feb 12 22:57:45 nerv kernel: [59626.621893] lost page write due to I/O error on /dev/sdd1 Feb 12 22:57:45 nerv kernel: [59626.621893] btrfs_dev_stat_print_on_error: 8 callbacks suppressed Feb 12 22:57:45 nerv kernel: [59626.621893] btrfs: bdev /dev/sdd1 errs: wr 66339, rd 26, flush 1, corrupt 0, gen 0 Feb 12 22:57:45 nerv kernel: [59626.644110] lost page write due to I/O error on /dev/sdd1 [Lots of the above, and occasionally a couple of lines like these] Feb 12 22:57:48 nerv kernel: [59629.569278] btrfs: found 46 extents Feb 12 22:57:50 nerv kernel: [59631.685067] btrfs_dev_stat_print_on_error: 5 callbacks suppressed This barrage of messages combined with the fact that the rebalance is going quite slowly (btrfs fi bal stat indicates about 1 extent per minute, where an extent seems to be about 1 GB; which is several factors slower than it took to copy the data onto the filesystem) leads me to think that something is wrong. Is it, or should I just wait 2 days for it to complete, ignoring the error? Also, why does it say that the errors are occuring /dev/sdd1? Is it just remembering the whole filesystem by that name since that's how I mounted it, or is it still trying to access the old removed instance of that disk and is that, then, why it's giving all these errors? Thanks for reading! -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebalancing RAID1
On Tue, 12 Feb 2013, Chris Murphy wrote: On Feb 12, 2013, at 4:01 PM, Fredrik Tolf fred...@dolda2000.com wrote: mkfs.btrfs -d raid -m raid1 /dev/sd{d,e}1 Is that a typo? -d raid isn't valid. Ah yes, sorry. That was a typo. What do you get for: btrfs fi df /mnt $ sudo ./btrfs fi df /mnt Data, RAID1: total=2.66TB, used=2.66TB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=388.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=4.00GB, used=3.66GB Metadata: total=8.00MB, used=0.00 Please report the result for each drive: smartctl -a /dev/sdX As they're a bit long for mail, so see here: http://www.dolda2000.com/~fredrik/tmp/smart-hde http://www.dolda2000.com/~fredrik/tmp/smart-hdi There's not a whole lot to see, though. smartctl -l scterc /dev/sdX Warning: device does not support SCT Error Recovery Control command Also, why does it say that the errors are occuring /dev/sdd1? Is it just remembering the whole filesystem by that name since that's how I mounted it, or is it still trying to access the old removed instance of that disk and is that, then, why it's giving all these errors? I suspect bad sectors at the moment. Doesn't seem that way to me; partly because of the SMART data, and partly because of the errors that were logged as the drive failed: Feb 12 16:36:49 nerv kernel: [36769.546522] ata6.00: Ata error. fis:0x21 Feb 12 16:36:49 nerv kernel: [36769.550454] ata6: SError: { Handshk } Feb 12 16:36:51 nerv kernel: [36769.554129] ata6.00: failed command: WRITE FPDMA QUEUED Feb 12 16:36:51 nerv kernel: [36769.559375] ata6.00: cmd 61/00:00:00:ec:2e/04:00:cd:00:00/40 tag 0 ncq 524288 out Feb 12 16:36:51 nerv kernel: [36769.559375] res 41/84:d0:00:98:2e/84:00:cd:00:00/40 Emask 0x10 (ATA bus error) Feb 12 16:36:51 nerv kernel: [36769.574831] ata6.00: status: { DRDY ERR } Feb 12 16:36:52 nerv kernel: [36769.578867] ata6.00: error: { ICRC ABRT } That's not typical for actual media problems, in my experience. :) What kernel version? Oh, sorry, it's 3.7.1. The system is otherwise a pretty much vanilla Debian Squeeze (curreny Stable) that I've just compiled a newer kernel (and btrfs-tools) for. Thanks for replying! -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html