Re: btrfs balance enospc
Does/should a balance imply removal of missing devices (as long as the minimum number of devices are still available)? That's a really good question. As a user I would expect it to balance over remaining devices assuming you still have a complete picture. Doing a device delete missing after a balance should be just some pool metadata updates at that point. Anyway... I solved my problem by moving/deleting files to free up space to the point that balance no longer complained about enospc. I suppose btrfs needs extra working space to do a balance... above and beyond the actual size of the existing data/metadata to be moved? I had a total of three devices, with what appeared to be plenty of space on the two that were to be remaining, but balance/remove was still complaining to be out of disk space. It would be a good idea for some metrics to be calculated upon start of a removal or balance to tell the user hey you need to free up XXX more bytes in order for this operation to be successful. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
On Sep 17, 2014, at 11:51 AM, Mark Murawski markm-li...@intellasoft.net wrote: Does/should a balance imply removal of missing devices (as long as the minimum number of devices are still available)? That's a really good question. As a user I would expect it to balance over remaining devices assuming you still have a complete picture. Doing a device delete missing after a balance should be just some pool metadata updates at that point. Anyway... I solved my problem by moving/deleting files to free up space to the point that balance no longer complained about enospc. Another option in such a case is to add a new device. It can be small, even a 2GB loop device or USB stick would do it in a bind. Then delete the device when you're done. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
Mark Murawski posted on Wed, 17 Sep 2014 13:51:51 -0400 as excerpted: Does/should a balance imply removal of missing devices (as long as the minimum number of devices are still available)? That's a really good question. As a user I would expect it to balance over remaining devices assuming you still have a complete picture. Doing a device delete missing after a balance should be just some pool metadata updates at that point. A balance does not imply removal of missing devices. And at this point I'd say it shouldn't, tho perhaps some day after the code is somewhat more stable it could. In fact, until recently kernelspace btrfs (which does all the work in a balance, userspace is simply the way you tell it what to do) didn't even properly detect dynamically added/removed devices, resulting in definitely unintuitive behavior where the balance would still queue up chunks to be rewritten to the missing device, that would obviously never be written because the device was missing and wasn't coming back! (!!!) AFAIK (I'm a sysadmin and list regular, not a developer) that arguably pathological behavior has been fixed now, at least in theory, and the kernel should properly detect missing devices and should no longer try to write to them when doing a balance, so now, at least in theory and assuming good copies of all data and metadata on the remaining device from the original pair, a balance to it and a just added device in raid1 mode should leave only the device metadata for btrfs device delete missing to fix up afterward. However, as of now, there's still at least two bug reports being traced down in the dynamic device detection code (see the current thread where btrfs fi show on a two-device filesystem is pointing to the wrong place for one of the devices, and another where show says a device is missing, that isn't), and possibly others yet to be found, so it's not yet a good idea to have btrfs doing automatic device delete missing on balance. After the bug fixes are in and the code churn in that area calmed down for a couple kernel cycles, perhaps then we can debate whether a balance should automatically delete missing devices when appropriate, or not, but certainly now isn't isn't the time. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
Playing around with this filesystem I hot-removed a device from the array and put in a replacement. Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid1 size 9.31GiB used 8.90GiB path /dev/sdc6 devid3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa removed /dev/sdc Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa *** Some devices missing cartman {~} root# btrfs device add /dev/sdi6 / cartman {~} root# btrfs fi show Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 3 FS bytes used 7.43GiB devid3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa devid4 size 10.00GiB used 0.00 path /dev/sdi6 *** Some devices missing cartman {~} root# btrfs filesystem balance start / Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2411, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2412, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2413, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2414, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2415, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2416, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2417, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2418, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2419, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2420, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 Sep 16 12:47:14 localhost kernel: BTRFS info (device sdd6): found 59023 extents Sep 16 12:47:14 localhost kernel: use_block_rsv: 4 callbacks suppressed Sep 16 12:47:14 localhost kernel: [ cut here ] Sep 16 12:47:14 localhost kernel: WARNING: CPU: 1 PID: 5109 at fs/btrfs/extent-tree.c:7273 btrfs_alloc_free_block+0x455/0x4a0() Sep 16 12:47:14 localhost kernel: BTRFS: block rsv returned -28 Sep 16 12:47:14 localhost kernel: Modules linked in: Sep 16 12:47:14 localhost kernel: CPU: 1 PID: 5109 Comm: tail Tainted: G W 3.16.1 #2 Sep 16 12:47:14 localhost kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-MA74GM-S2/GA-MA74GM-S2, BIOS F1 04/17/2008 Sep 16 12:47:14 localhost kernel: 819e3610 817e4409 88005a9eba68 Sep 16 12:47:14 localhost kernel: 8106f6f2 8800379fe980 880073a7 1000 Sep 16 12:47:14 localhost kernel: 88001fc635a0 8800747b6000 8106f7d5 819f5978 Sep 16 12:47:14 localhost kernel: Call Trace: Sep 16 12:47:14 localhost kernel: [817e4409] ? dump_stack+0x49/0x6a Sep 16 12:47:14 localhost kernel: [8106f6f2] ? warn_slowpath_common+0x82/0xb0 Sep 16 12:47:14 localhost kernel: [8106f7d5] ? warn_slowpath_fmt+0x45/0x50 Sep 16 12:47:14 localhost kernel: [8135f074] ? ___ratelimit+0x94/0x100 Sep 16 12:47:14 localhost kernel: [81296625] ? btrfs_alloc_free_block+0x455/0x4a0 Sep 16 12:47:14 localhost kernel: [810992b7] ? set_next_entity+0x37/0x80 Sep 16 12:47:14 localhost kernel: [812ca111] ? read_extent_buffer+0xb1/0x110 Sep 16 12:47:14 localhost kernel: [81091de9] ? finish_task_switch+0x49/0xe0 Sep 16 12:47:14 localhost kernel: [81280d9f] ? btrfs_copy_root+0xef/0x2a0 Sep 16 12:47:14 localhost kernel: [812f1853] ? create_reloc_root+0x1e3/0x2a0 Sep 16 12:47:14 localhost kernel: [812f7848] ? btrfs_init_reloc_root+0xb8/0xd0 Sep 16 12:47:14 localhost kernel: [812a708f] ? record_root_in_trans+0xaf/0x110 Sep 16 12:47:14 localhost kernel: [812a8496] ? btrfs_record_root_in_trans+0x46/0x80 Sep 16 12:47:14 localhost kernel: [812a98fc] ? start_transaction+0x8c/0x4f0 Sep 16 12:47:14 localhost kernel: [812b1168] ? btrfs_dirty_inode+0x58/0xe0 Sep 16 12:47:14 localhost kernel: [8113b382] ? touch_atime+0x152/0x160 Sep 16 12:47:14 localhost kernel: [810e3eb5] ? generic_file_read_iter+0x545/0x5a0 Sep 16 12:47:14 localhost kernel: [810a1d49] ? remove_wait_queue+0x19/0x60 Sep 16 12:47:14 localhost kernel: [810a1bc4] ? prepare_to_wait+0x24/0x90 Sep 16 12:47:14
Re: btrfs balance enospc
The smart stats on the disk are fine. The /dev/sdc messages are from me playing around and pulling out the drive. btrfs fi show, shows the drive as missing, yet it's still trying to write to it. Basically my goal is to remove this drive and stick it in another box and I can't get btrfs to move all the data off of it due to enospc. I'm gonna try and move some data off and remove/balance again. On 09/16/14 13:26, Chris Murphy wrote: Better to use btrfs replace. But sequence wise you should do btrfs device delete missing, which should then effectively do a balance to the newly added device. So while the sequence isn't really correct, that's probably not why you're getting this failure. Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2411, rd 0, flush 38, corrupt 137167, gen 25 Please post results of smartctl -x /dev/sdc I'd expect with Btrfs having problems writing to a device, that there'd be libata messages related to this also. Do you have earlier kernel messages indicating the drive or controller are reporting errors? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs balance enospc
From: li...@colorremedies.com Date: Tue, 16 Sep 2014 11:26:16 -0600 On Sep 16, 2014, at 10:51 AM, Mark Murawski markm-li...@intellasoft.net wrote: Playing around with this filesystem I hot-removed a device from the array and put in a replacement. Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid 1 size 9.31GiB used 8.90GiB path /dev/sdc6 devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa removed /dev/sdc Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa *** Some devices missing cartman {~} root# btrfs device add /dev/sdi6 / cartman {~} root# btrfs fi show Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 3 FS bytes used 7.43GiB devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa devid 4 size 10.00GiB used 0.00 path /dev/sdi6 *** Some devices missing cartman {~} root# btrfs filesystem balance start / Better to use btrfs replace. But sequence wise you should do btrfs device delete missing, which should then effectively do a balance to the newly added device. So while the sequence isn't really correct, that's probably not why you're getting this failure. Does/should a balance imply removal of missing devices (as long as the minimum number of devices are still available)? Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2411, rd 0, flush 38, corrupt 137167, gen 25 Please post results of smartctl -x /dev/sdc Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2412, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2413, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2414, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2415, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2416, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2417, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2418, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2419, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2420, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 I'd expect with Btrfs having problems writing to a device, that there'd be libata messages related to this also. Do you have earlier kernel messages indicating the drive or controller are reporting errors? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
On Sep 16, 2014, at 1:54 PM, Mark Murawski markm-li...@intellasoft.net wrote: The smart stats on the disk are fine. The /dev/sdc messages are from me playing around and pulling out the drive. btrfs fi show, shows the drive as missing, yet it's still trying to write to it. That's known. Basically my goal is to remove this drive and stick it in another box and I can't get btrfs to move all the data off of it due to enospc. If you can mount it rw, even degraded, you can make ro snapshots of things you want to keep, and then btrfs send to a new volume. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
On 15/09/14, Mark Murawski wrote: I should have plenty of space for this operation, but it fails [...] This might be useful: https://btrfs.wiki.kernel.org/index.php/Balance_Filters Regards, Leonidas -- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
btrfs balance start -dusage=86 / Done, had to relocate 1 out of 13 chunks cartman {~} root# btrfs fi df / Data, RAID1: total=7.03GiB, used=7.01GiB System, RAID1: total=32.00MiB, used=4.00KiB Metadata, RAID1: total=1.00GiB, used=438.88MiB unknown, single: total=148.00MiB, used=0.00 What's this 'unknown' section? Maybe this is the problem? How do I get rid of it? I still get enospc after a balance with a filter, and then a regular balance: cartman {~} root# btrfs balance start / ERROR: error during balancing '/' - No space left on device On 09/15/14 13:07, Leonidas Spyropoulos wrote: On 15/09/14, Mark Murawski wrote: I should have plenty of space for this operation, but it fails [...] This might be useful: https://btrfs.wiki.kernel.org/index.php/Balance_Filters Regards, Leonidas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
On Sep 15, 2014, at 11:37 AM, Mark Murawski markm-li...@intellasoft.net wrote: btrfs balance start -dusage=86 / Done, had to relocate 1 out of 13 chunks cartman {~} root# btrfs fi df / Data, RAID1: total=7.03GiB, used=7.01GiB System, RAID1: total=32.00MiB, used=4.00KiB Metadata, RAID1: total=1.00GiB, used=438.88MiB unknown, single: total=148.00MiB, used=0.00 What's this 'unknown' section? Maybe this is the problem? How do I get rid of it? It's not a problem, it's cosmetic for now, something introduced in kernel 3.15 but progs doesn't yet give a label for. I still get enospc after a balance with a filter, and then a regular balance: cartman {~} root# btrfs balance start / ERROR: error during balancing '/' - No space left on device Maybe try mount option enospc_debug and retry, see if you get more information in dmesg. I'm not sure if a balance in this case wants to create a new data and metadata chunk (on each device), or if it can start without creating any chunks. If it wants to create new chunks, it's 1GiB for data, and 256MiB for metadata. That's 1.256GiB but you only have 1.25GiB unallocated on each device: size 9.31GiB minus used 8.06GiB. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
Chris Murphy posted on Mon, 15 Sep 2014 14:54:57 -0600 as excerpted: I still get enospc after a balance with a filter, and then a regular balance: cartman {~} root# btrfs balance start / ERROR: error during balancing '/' - No space left on device Maybe try mount option enospc_debug and retry, see if you get more information in dmesg. I'm not sure if a balance in this case wants to create a new data and metadata chunk (on each device), or if it can start without creating any chunks. If it wants to create new chunks, it's 1GiB for data, and 256MiB for metadata. That's 1.256GiB but you only have 1.25GiB unallocated on each device: size 9.31GiB minus used 8.06GiB. Another possibility that has hit a few people: Did you (MM/OP not CM) convert that filesystem from ext* to btrfs? If so, read on. If not, this doesn't apply so you may skip it. Assuming such a conversion, did you delete the subvolume containing the original ext* yet, or not? If not, that may be the problem, because that subvolume must be left intact in ordered to allow rollback to ext* if desired. If you know you won't be rolling back, delete the ext* reserved subvolume as described on the wiki. Meanwhile, after deleting that subvolume, be sure to complete the defrag and balance as suggested on the wiki, because failing to do so can lead to other problems later. Basically, the biggest extent size supported by btrfs is 1 GiB, the size of a btrfs data chunk, while ext* supports larger (unlimited size?) extents. Failing to complete the defrag in particular as suggested can mean large files with extents 1 GiB in size, which gives btrfs balance indigestion since it expects to see only 1 GiB or smaller extents. Several folks who converted from ext3/4 have reported failed balances due to these too large extents, and fixing the problem later can require manually moving one-by-one all files large enough to be candidates for the problem (thus files 1 GiB) out of btrfs and back in, thus resulting in properly chunk-split extents when the file is moved back to btrfs. Everyone who has reported this problem so far, has also reported that the move out and back in process solved the problem for them, but if there's lots of such files it can be a pain, and doing the defrag on the formerly ext* files before starting to use the now btrfs for other things, ESPECIALLY before trying to snapshot affected subvolumes since that locks the problem in place until those snapshots are deleted, is definitely preferred. =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
On Sep 15, 2014, at 6:08 PM, Duncan 1i5t5.dun...@cox.net wrote: Assuming such a conversion, did you delete the subvolume containing the original ext* yet, or not? If not, that may be the problem, because that subvolume must be left intact in ordered to allow rollback to ext* if desired. If you know you won't be rolling back, delete the ext* reserved subvolume as described on the wiki. Good catch, I always forget this. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance enospc
I wish i could follow your procedure, but this wasn't an ext conversion. I made this with mkfs for btrfs with kernel circa 3.8ish On 09/15/14 20:08, Duncan wrote: Chris Murphy posted on Mon, 15 Sep 2014 14:54:57 -0600 as excerpted: I still get enospc after a balance with a filter, and then a regular balance: cartman {~} root# btrfs balance start / ERROR: error during balancing '/' - No space left on device Maybe try mount option enospc_debug and retry, see if you get more information in dmesg. I'm not sure if a balance in this case wants to create a new data and metadata chunk (on each device), or if it can start without creating any chunks. If it wants to create new chunks, it's 1GiB for data, and 256MiB for metadata. That's 1.256GiB but you only have 1.25GiB unallocated on each device: size 9.31GiB minus used 8.06GiB. Another possibility that has hit a few people: Did you (MM/OP not CM) convert that filesystem from ext* to btrfs? If so, read on. If not, this doesn't apply so you may skip it. Assuming such a conversion, did you delete the subvolume containing the original ext* yet, or not? If not, that may be the problem, because that subvolume must be left intact in ordered to allow rollback to ext* if desired. If you know you won't be rolling back, delete the ext* reserved subvolume as described on the wiki. Meanwhile, after deleting that subvolume, be sure to complete the defrag and balance as suggested on the wiki, because failing to do so can lead to other problems later. Basically, the biggest extent size supported by btrfs is 1 GiB, the size of a btrfs data chunk, while ext* supports larger (unlimited size?) extents. Failing to complete the defrag in particular as suggested can mean large files with extents 1 GiB in size, which gives btrfs balance indigestion since it expects to see only 1 GiB or smaller extents. Several folks who converted from ext3/4 have reported failed balances due to these too large extents, and fixing the problem later can require manually moving one-by-one all files large enough to be candidates for the problem (thus files 1 GiB) out of btrfs and back in, thus resulting in properly chunk-split extents when the file is moved back to btrfs. Everyone who has reported this problem so far, has also reported that the move out and back in process solved the problem for them, but if there's lots of such files it can be a pain, and doing the defrag on the formerly ext* files before starting to use the now btrfs for other things, ESPECIALLY before trying to snapshot affected subvolumes since that locks the problem in place until those snapshots are deleted, is definitely preferred. =:^) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html