Re: balancing every night broke balancing so now I can't balance anymore?
On 2017-05-15 04:14, Hugo Mills wrote: On Sun, May 14, 2017 at 04:16:52PM -0700, Marc MERLIN wrote: On Sun, May 14, 2017 at 09:21:11PM +, Hugo Mills wrote: 2) balance -musage=0 3) balance -musage=20 In most cases, this is going to make ENOSPC problems worse, not better. The reason for doign this kind of balance is to recover unused space and allow it to be reallocated. The typical behaviour is that data gets overallocated, and it's metadata which runs out. So, the last thing you want to be doing is reducing the metadata allocation, because that's the scarce resource. Also, I'd usually recommend using limit=n, where n is approximately the amount of data overallcation (allocated space less used space). It's much more controllable than usage. Thanks for that. So, would you just remove the balance -musage=20 altogether? Yes. The advantages to doing that depend also on how much excess free space you have and what your usual usage is. If you're balancing a filesystem for a mail server that has lots of free space, you may indeed want to re-balance metadata chunks regularly because you're likely to be rewriting significant amounts of metadata regularly. As for limit= I'm not sure if it would be helpful since I run this nightly. Anything that doesn't get done tonight due to limit, would be done tomorrow? I'm suggesting limit= on its own. It's a fixed amount of work compared to usage=, which may not do anything at all. For example, it's perfectly possible to have a filesystem which is, say, 30% full, and yet is still fully-allocated filesystem with more than 20% of every chunk used. In that case your usage= wouldn't balance anything, and you'd still be left in the situation of risking ENOSPC from running out of metadata. FWIW, I normally use '-dusage=80 -mlimit=16' for my nightly balances. The usage filter at 80% means you won't waste time re-balancing full or mostly full chunks, and the limit filter of 16 takes on average about 5 minutes on the consumer SSD's I have. All you need to do is ensure that you have enough unallocated space for the metadata to expand into if it needs to. That's the ultimate goal of all this. If you have SSDs, it may also be beneficial to use nossd as a mount option, because that seems to have some pathology in overallocating chunks in normal usage. Hans investigated this in detail a month or two ago. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
Le 15/05/2017 à 10:14, Hugo Mills a écrit : > [...] >> As for limit= I'm not sure if it would be helpful since I run this >> nightly. Anything that doesn't get done tonight due to limit, would be >> done tomorrow? >I'm suggesting limit= on its own. It's a fixed amount of work > compared to usage=, which may not do anything at all. For example, > it's perfectly possible to have a filesystem which is, say, 30% full, > and yet is still fully-allocated filesystem with more than 20% of > every chunk used. In that case your usage= wouldn't balance anything, > and you'd still be left in the situation of risking ENOSPC from > running out of metadata. Hugo, as I don't have any feedback on my approach to address this problem could you have a look at my script or simply the principle : is there any drawback vs using limit in calling balance multiple times raising usage (and using the same value for data and metadata) until you get enough free space ? For reference : https://github.com/jtek/ceph-utils/blob/master/btrfs-auto-rebalance.rb Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
On Sun, May 14, 2017 at 04:16:52PM -0700, Marc MERLIN wrote: > On Sun, May 14, 2017 at 09:21:11PM +, Hugo Mills wrote: > > > 2) balance -musage=0 > > > 3) balance -musage=20 > > > >In most cases, this is going to make ENOSPC problems worse, not > > better. The reason for doign this kind of balance is to recover unused > > space and allow it to be reallocated. The typical behaviour is that > > data gets overallocated, and it's metadata which runs out. So, the > > last thing you want to be doing is reducing the metadata allocation, > > because that's the scarce resource. > > > >Also, I'd usually recommend using limit=n, where n is approximately > > the amount of data overallcation (allocated space less used > > space). It's much more controllable than usage. > > > Thanks for that. > So, would you just remove the balance -musage=20 altogether? Yes. > As for limit= I'm not sure if it would be helpful since I run this > nightly. Anything that doesn't get done tonight due to limit, would be > done tomorrow? I'm suggesting limit= on its own. It's a fixed amount of work compared to usage=, which may not do anything at all. For example, it's perfectly possible to have a filesystem which is, say, 30% full, and yet is still fully-allocated filesystem with more than 20% of every chunk used. In that case your usage= wouldn't balance anything, and you'd still be left in the situation of risking ENOSPC from running out of metadata. All you need to do is ensure that you have enough unallocated space for the metadata to expand into if it needs to. That's the ultimate goal of all this. If you have SSDs, it may also be beneficial to use nossd as a mount option, because that seems to have some pathology in overallocating chunks in normal usage. Hans investigated this in detail a month or two ago. Hugo. -- Hugo Mills | "You know, the British have always been nice to mad hugo@... carfax.org.uk | people." http://carfax.org.uk/ | PGP: E2AB1DE4 | Laura Jesson, Brief Encounter signature.asc Description: Digital signature
Re: balancing every night broke balancing so now I can't balance anymore?
On Sun, May 14, 2017 at 09:21:11PM +, Hugo Mills wrote: > > 2) balance -musage=0 > > 3) balance -musage=20 > >In most cases, this is going to make ENOSPC problems worse, not > better. The reason for doign this kind of balance is to recover unused > space and allow it to be reallocated. The typical behaviour is that > data gets overallocated, and it's metadata which runs out. So, the > last thing you want to be doing is reducing the metadata allocation, > because that's the scarce resource. > >Also, I'd usually recommend using limit=n, where n is approximately > the amount of data overallcation (allocated space less used > space). It's much more controllable than usage. Thanks for that. So, would you just remove the balance -musage=20 altogether? As for limit= I'm not sure if it would be helpful since I run this nightly. Anything that doesn't get done tonight due to limit, would be done tomorrow? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
Le 14/05/2017 à 23:30, Kai Krakow a écrit : > Am Sun, 14 May 2017 22:57:26 +0200 > schrieb Lionel Bouton: > >> I've coded one Ruby script which tries to balance between the cost of >> reallocating group and the need for it.[...] >> Given its current size, I should probably push it on github... > Yes, please... ;-) Most of our BTRFS filesystems are used by Ceph OSD, so here it is : https://github.com/jtek/ceph-utils/blob/master/btrfs-auto-rebalance.rb Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
Am Sun, 14 May 2017 22:57:26 +0200 schrieb Lionel Bouton: > I've coded one Ruby script which tries to balance between the cost of > reallocating group and the need for it. The basic idea is that it > tries to keep the proportion of free space "wasted" by being allocated > although it isn't used below a threshold. It will bring this > proportion down enough through balance that minor reallocation won't > trigger a new balance right away. It should handle pathological > conditions as well as possible and it won't spend more than 2 hours > working on a single filesystem by default. We deploy this as a daily > cron script through Puppet on all our systems and it works very well > (I didn't have to use balance manually to manage free space since we > did that). Note that by default it sleeps a random amount of time to > avoid IO spikes on VMs running on the same host. You can either edit > it or pass it "0" which will be used for the max amount of time to > sleep bypassing this precaution. > > Here is the latest version : https://pastebin.com/Rrw1GLtx > Given its current size, I should probably push it on github... Yes, please... ;-) > I've seen other maintenance scripts mentioned on this list so you > might something simpler or more targeted to your needs by browsing > through the list's history. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
On Sun, May 14, 2017 at 01:15:09PM -0700, Marc MERLIN wrote: > On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: > > On 05/13/2017 10:54 PM, Marc MERLIN wrote: > > > Kernel 4.11, btrfs-progs v4.7.3 > > > > > > I run scrub and balance every night, been doing this for 1.5 years on this > > > filesystem. > > > > What are the exact commands you run every day? > > http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html > (at the bottom) > every night: > 1) scrub > 2) balance -musage=0 > 3) balance -musage=20 In most cases, this is going to make ENOSPC problems worse, not better. The reason for doign this kind of balance is to recover unused space and allow it to be reallocated. The typical behaviour is that data gets overallocated, and it's metadata which runs out. So, the last thing you want to be doing is reducing the metadata allocation, because that's the scarce resource. Also, I'd usually recommend using limit=n, where n is approximately the amount of data overallcation (allocated space less used space). It's much more controllable than usage. Hugo. > 4) balance -dusage=0 > 5) balance -dusage=20 > > > > How did I get into such a misbalanced state when I balance every night? > > > > I don't know, since I don't know what you do exactly. :) > > Now you do :) > > > > My filesystem is not full, I can write just fine, but I sure cannot > > > rebalance now. > > > > Yes, because you have quite some allocated but unused space. If btrfs > > cannot just allocate more chunks, it starts trying a bit harder to reuse > > all the empty spots in the already existing chunks. > > Ok. shouldn't balance fix problems just like this? > I have 60GB-ish free, or in this case that's also >25%, that's a lot > > Speaking of unallocated, I have more now: > Device unallocated:993.00MiB > > This kind of just magically fixed itself during snapshot rotation and > deletion I think. > Sure enough, balance works again, but this feels pretty fragile. > Looking again: > Device size: 228.67GiB > Device allocated: 227.70GiB > Device unallocated:993.00MiB > Free (estimated): 58.53GiB (min: 58.53GiB) > > You're saying that I need unallocated space for new chunks to be > created, which is required by balance. > Should btrfs not take care of keeping some space for me? > Shoudln't a nigthly balance, which I'm already doing, help even more > with this? > > > > Besides adding another device to add space, is there a way around this > > > and more generally not getting into that state anymore considering that > > > I already rebalance every night? > > > > Add monitoring and alerting on the amount of unallocated space. > > > > FWIW, this is what I use for that purpose: > > > > https://packages.debian.org/sid/munin-plugins-btrfs > > https://packages.debian.org/sid/monitoring-plugins-btrfs > > > > And, of course the btrfs-heatmap program keeps being a fun tool to > > create visual timelapses of your filesystem, so you can learn how your > > usage pattern is resulting in allocation of space by btrfs, and so that > > you can visually see what the effect of your btrfs balance attempts is: > > That's interesting, but ultimately, users shoudln't have to micromanage > their filesystem to that level, even btrfs. > > a) What is wrong in my nightly script that I should fix/improve? > b) How do I recover from my current state? > > Thanks, > Marc -- Hugo Mills | You stay in the theatre because you're afraid of hugo@... carfax.org.uk | having no money? There's irony... http://carfax.org.uk/ | PGP: E2AB1DE4 | Slings and Arrows signature.asc Description: Digital signature
Re: balancing every night broke balancing so now I can't balance anymore?
Am Sun, 14 May 2017 13:15:09 -0700 schrieb Marc MERLIN: > On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: > > On 05/13/2017 10:54 PM, Marc MERLIN wrote: > > > Kernel 4.11, btrfs-progs v4.7.3 > > > > > > I run scrub and balance every night, been doing this for 1.5 > > > years on this filesystem. > > > > What are the exact commands you run every day? > > http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html > (at the bottom) > every night: > 1) scrub > 2) balance -musage=0 > 3) balance -musage=20 > 4) balance -dusage=0 > 5) balance -dusage=20 > > > > How did I get into such a misbalanced state when I balance every > > > night? > > > > I don't know, since I don't know what you do exactly. :) > > Now you do :) > > > > My filesystem is not full, I can write just fine, but I sure > > > cannot rebalance now. > > > > Yes, because you have quite some allocated but unused space. If > > btrfs cannot just allocate more chunks, it starts trying a bit > > harder to reuse all the empty spots in the already existing > > chunks. > > Ok. shouldn't balance fix problems just like this? > I have 60GB-ish free, or in this case that's also >25%, that's a lot > > Speaking of unallocated, I have more now: > Device unallocated:993.00MiB > > This kind of just magically fixed itself during snapshot rotation and > deletion I think. > Sure enough, balance works again, but this feels pretty fragile. > Looking again: > Device size: 228.67GiB > Device allocated: 227.70GiB > Device unallocated:993.00MiB > Free (estimated): 58.53GiB (min: 58.53GiB) > > You're saying that I need unallocated space for new chunks to be > created, which is required by balance. > Should btrfs not take care of keeping some space for me? > Shoudln't a nigthly balance, which I'm already doing, help even more > with this? > > > > Besides adding another device to add space, is there a way around > > > this and more generally not getting into that state anymore > > > considering that I already rebalance every night? > > > > Add monitoring and alerting on the amount of unallocated space. > > > > FWIW, this is what I use for that purpose: > > > > https://packages.debian.org/sid/munin-plugins-btrfs > > https://packages.debian.org/sid/monitoring-plugins-btrfs > > > > And, of course the btrfs-heatmap program keeps being a fun tool to > > create visual timelapses of your filesystem, so you can learn how > > your usage pattern is resulting in allocation of space by btrfs, > > and so that you can visually see what the effect of your btrfs > > balance attempts is: > > That's interesting, but ultimately, users shoudln't have to > micromanage their filesystem to that level, even btrfs. > > a) What is wrong in my nightly script that I should fix/improve? You may want to try https://www.spinics.net/lists/linux-btrfs/msg52076.html > b) How do I recover from my current state? That script may work it's way through. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
Le 14/05/2017 à 22:15, Marc MERLIN a écrit : > On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: >> On 05/13/2017 10:54 PM, Marc MERLIN wrote: >>> Kernel 4.11, btrfs-progs v4.7.3 >>> >>> I run scrub and balance every night, been doing this for 1.5 years on this >>> filesystem. >> What are the exact commands you run every day? > > http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html > (at the bottom) > every night: > 1) scrub > 2) balance -musage=0 > 3) balance -musage=20 > 4) balance -dusage=0 > 5) balance -dusage=20 usage=20 is pretty low: this means you don't try to reallocate and regroup together block groups that are filled more than 20%. Constantly using this settings has left lots of allocated block groups that are mostly empty on your filesystem (a little more than 20% used). The rebalance subject is a bit complex. With an empty filesystem you almost don't need it as group creation is sparse and it's OK to have mostly empty groups. When your filesystem begins to fill up you have to raise the usage target to be able to reclaim space (as the fs fills up most of your groups do too) so that new block creation can happen. I've coded one Ruby script which tries to balance between the cost of reallocating group and the need for it. The basic idea is that it tries to keep the proportion of free space "wasted" by being allocated although it isn't used below a threshold. It will bring this proportion down enough through balance that minor reallocation won't trigger a new balance right away. It should handle pathological conditions as well as possible and it won't spend more than 2 hours working on a single filesystem by default. We deploy this as a daily cron script through Puppet on all our systems and it works very well (I didn't have to use balance manually to manage free space since we did that). Note that by default it sleeps a random amount of time to avoid IO spikes on VMs running on the same host. You can either edit it or pass it "0" which will be used for the max amount of time to sleep bypassing this precaution. Here is the latest version : https://pastebin.com/Rrw1GLtx Given its current size, I should probably push it on github... I've seen other maintenance scripts mentioned on this list so you might something simpler or more targeted to your needs by browsing through the list's history. Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: > On 05/13/2017 10:54 PM, Marc MERLIN wrote: > > Kernel 4.11, btrfs-progs v4.7.3 > > > > I run scrub and balance every night, been doing this for 1.5 years on this > > filesystem. > > What are the exact commands you run every day? http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html (at the bottom) every night: 1) scrub 2) balance -musage=0 3) balance -musage=20 4) balance -dusage=0 5) balance -dusage=20 > > How did I get into such a misbalanced state when I balance every night? > > I don't know, since I don't know what you do exactly. :) Now you do :) > > My filesystem is not full, I can write just fine, but I sure cannot > > rebalance now. > > Yes, because you have quite some allocated but unused space. If btrfs > cannot just allocate more chunks, it starts trying a bit harder to reuse > all the empty spots in the already existing chunks. Ok. shouldn't balance fix problems just like this? I have 60GB-ish free, or in this case that's also >25%, that's a lot Speaking of unallocated, I have more now: Device unallocated: 993.00MiB This kind of just magically fixed itself during snapshot rotation and deletion I think. Sure enough, balance works again, but this feels pretty fragile. Looking again: Device size: 228.67GiB Device allocated:227.70GiB Device unallocated: 993.00MiB Free (estimated): 58.53GiB (min: 58.53GiB) You're saying that I need unallocated space for new chunks to be created, which is required by balance. Should btrfs not take care of keeping some space for me? Shoudln't a nigthly balance, which I'm already doing, help even more with this? > > Besides adding another device to add space, is there a way around this > > and more generally not getting into that state anymore considering that > > I already rebalance every night? > > Add monitoring and alerting on the amount of unallocated space. > > FWIW, this is what I use for that purpose: > > https://packages.debian.org/sid/munin-plugins-btrfs > https://packages.debian.org/sid/monitoring-plugins-btrfs > > And, of course the btrfs-heatmap program keeps being a fun tool to > create visual timelapses of your filesystem, so you can learn how your > usage pattern is resulting in allocation of space by btrfs, and so that > you can visually see what the effect of your btrfs balance attempts is: That's interesting, but ultimately, users shoudln't have to micromanage their filesystem to that level, even btrfs. a) What is wrong in my nightly script that I should fix/improve? b) How do I recover from my current state? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking ome page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
On 05/13/2017 10:54 PM, Marc MERLIN wrote: > Kernel 4.11, btrfs-progs v4.7.3 > > I run scrub and balance every night, been doing this for 1.5 years on this > filesystem. What are the exact commands you run every day? > But it has just started failing: > [...] > saruman:~# btrfs fi usage /mnt/btrfs_pool1/ > Overall: > Device size: 228.67GiB > Device allocated: 228.67GiB > Device unallocated: 1.00MiB > Device missing:0.00B > Used: 171.25GiB > Free (estimated): 55.32GiB (min: 55.32GiB) > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve:512.00MiB (used: 0.00B) > > Data,single: Size:221.60GiB, Used:166.28GiB >/dev/mapper/pool1 221.60GiB > > Metadata,single: Size:7.03GiB, Used:4.96GiB >/dev/mapper/pool1 7.03GiB > > System,single: Size:32.00MiB, Used:48.00KiB >/dev/mapper/pool132.00MiB > > Unallocated: >/dev/mapper/pool1 1.00MiB > > How did I get into such a misbalanced state when I balance every night? I don't know, since I don't know what you do exactly. :) > My filesystem is not full, I can write just fine, but I sure cannot > rebalance now. Yes, because you have quite some allocated but unused space. If btrfs cannot just allocate more chunks, it starts trying a bit harder to reuse all the empty spots in the already existing chunks. > Besides adding another device to add space, is there a way around this > and more generally not getting into that state anymore considering that > I already rebalance every night? Add monitoring and alerting on the amount of unallocated space. FWIW, this is what I use for that purpose: https://packages.debian.org/sid/munin-plugins-btrfs https://packages.debian.org/sid/monitoring-plugins-btrfs And, of course the btrfs-heatmap program keeps being a fun tool to create visual timelapses of your filesystem, so you can learn how your usage pattern is resulting in allocation of space by btrfs, and so that you can visually see what the effect of your btrfs balance attempts is: https://github.com/knorrie/btrfs-heatmap/ https://packages.debian.org/sid/btrfs-heatmap https://apps.fedoraproject.org/packages/btrfs-heatmap https://aur.archlinux.org/packages/python-btrfs-heatmap/ -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: balancing every night broke balancing so now I can't balance anymore?
Marc MERLIN posted on Sat, 13 May 2017 13:54:31 -0700 as excerpted: > Kernel 4.11, btrfs-progs v4.7.3 > > I run scrub and balance every night, been doing this for 1.5 years on > this filesystem. > But it has just started failing: > saruman:~# btrfs balance start -musage=0 /mnt/btrfs_pool1 > Done, had to relocate 0 out of 235 chunks > saruman:~# btrfs balance start -dusage=0 > /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks Those aren't failing (as you likely know, but to explain for others following along), there's nothing to do as there's no entirely empty chunks. But... > saruman:~# btrfs balance start -musage=1 /mnt/btrfs_pool1 > ERROR: error during balancing '/mnt/btrfs_pool1': > No space left on device > aruman:~# btrfs balance start -dusage=10 /mnt/btrfs_pool1 > Done, had to relocate 0 out of 235 chunks > saruman:~# btrfs balance start -dusage=20 /mnt/btrfs_pool1 > ERROR: error during balancing '/mnt/btrfs_pool1': > No space left on device ... Errors there. ENOSPC [from dmesg] > BTRFS info (device dm-2): 1 enospc errors during balance > BTRFS info (device dm-2): relocating block group 598566305792 flags data > BTRFS info (device dm-2): 1 enospc errors during balance > BTRFS info (device dm-2): 1 enospc errors during balance > BTRFS info (device dm-2): relocating block group 598566305792 flags data > BTRFS info (device dm-2): 1 enospc errors during balance > saruman:~# btrfs fi show /mnt/btrfs_pool1/ > Label: 'btrfs_pool1' uuid: bc115001-a8d1-445c-9ec9-6050620efd0a > Total devices 1 FS bytes used 169.73GiB > devid1 size 228.67GiB used 228.67GiB path /dev/mapper/pool1 > saruman:~# btrfs fi usage /mnt/btrfs_pool1/ > Overall: > Device size: 228.67GiB > Device allocated: 228.67GiB > Device unallocated: 1.00MiB > Device missing:0.00B > Used: 171.25GiB > Free (estimated): 55.32GiB (min: 55.32GiB) > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve:512.00MiB (used: 0.00B) > > Data,single: Size:221.60GiB, Used:166.28GiB >/dev/mapper/pool1 221.60GiB > > Metadata,single: Size:7.03GiB, Used:4.96GiB >/dev/mapper/pool1 7.03GiB > > System,single: Size:32.00MiB, Used:48.00KiB >/dev/mapper/pool132.00MiB > > Unallocated: >/dev/mapper/pool1 1.00MiB So we see it's fully chunk-allocated, no unallocated space, but gigs and gigs of empty space withing the chunk allocations, data chunks in particular. > How did I get into such a misbalanced state when I balance every night? > > My filesystem is not full, I can write just fine, but I sure cannot > rebalance now. Well, you can write just fine... for now. After accounting for the global reserve coming out of metadata's reported free, there's about 1.5 GiB space in the metadata, and about 55 GiB of space in the data, so you should actually be able to write for some time before running out of either. You just can't rebalance to chunk-defrag and reclaim chunks to unallocated, so they can be used for the other chunk type if necessary. You're correct to be worried about this, but it's not immediately urgent. > Besides adding another device to add space, is there a way around this > and more generally not getting into that state anymore considering that > I already rebalance every night? What you /haven't/ yet said is what your nightly rebalance command, presumably scheduled, with -dusage and -musage, actually is. How did you determine the usage amount to feed to the command, and was it dynamic, presumably determined by some script and changing based on the amount of unutilized space trapped within the data chunks, or static, the same usage command given every nite? The other thing we don't have, and you might not have any idea either if it was simply scheduled and you hadn't been specifically checking, is a trendline of whether the post-balance unallocated space has been reducing over time, while the post-balance unutilized space within the data chunks was growing, or whether it happened all of a sudden. If you've been following current discussion threads here, you may already know one possible specific trigger, as discussed, and more generically, there could be other specific triggers in the same general category. In that thread the specific culprit appeared to be btrfs behavior with the (autodetected based on device rotational value as reported by sysfs) ssd mount option, in particular as it interacted with systemd's journal files, but it would apply to anything else with a similar write pattern. The overall btrfs usage pattern was problematic as much like you apparently were getting but didn't catch before full allocation while he did, btrfs was continuing to allocate new chunks, even tho there was plenty of space left within existing chunks, none of which were entirely empty (so
balancing every night broke balancing so now I can't balance anymore?
Kernel 4.11, btrfs-progs v4.7.3 I run scrub and balance every night, been doing this for 1.5 years on this filesystem. But it has just started failing: saruman:~# btrfs balance start -musage=0 /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks saruman:~# btrfs balance start -dusage=0 /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks saruman:~# btrfs balance start -musage=1 /mnt/btrfs_pool1 ERROR: error during balancing '/mnt/btrfs_pool1': No space left on device aruman:~# btrfs balance start -dusage=10 /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks saruman:~# btrfs balance start -dusage=20 /mnt/btrfs_pool1 ERROR: error during balancing '/mnt/btrfs_pool1': No space left on device There may be more info in syslog - try dmesg | tail BTRFS info (device dm-2): 1 enospc errors during balance BTRFS info (device dm-2): relocating block group 598566305792 flags data BTRFS info (device dm-2): 1 enospc errors during balance BTRFS info (device dm-2): 1 enospc errors during balance BTRFS info (device dm-2): relocating block group 598566305792 flags data BTRFS info (device dm-2): 1 enospc errors during balance saruman:~# btrfs fi show /mnt/btrfs_pool1/ Label: 'btrfs_pool1' uuid: bc115001-a8d1-445c-9ec9-6050620efd0a Total devices 1 FS bytes used 169.73GiB devid1 size 228.67GiB used 228.67GiB path /dev/mapper/pool1 saruman:~# btrfs fi usage /mnt/btrfs_pool1/ Overall: Device size: 228.67GiB Device allocated:228.67GiB Device unallocated:1.00MiB Device missing: 0.00B Used:171.25GiB Free (estimated): 55.32GiB (min: 55.32GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:221.60GiB, Used:166.28GiB /dev/mapper/pool1 221.60GiB Metadata,single: Size:7.03GiB, Used:4.96GiB /dev/mapper/pool1 7.03GiB System,single: Size:32.00MiB, Used:48.00KiB /dev/mapper/pool1 32.00MiB Unallocated: /dev/mapper/pool1 1.00MiB How did I get into such a misbalanced state when I balance every night? My filesystem is not full, I can write just fine, but I sure cannot rebalance now. Besides adding another device to add space, is there a way around this and more generally not getting into that state anymore considering that I already rebalance every night? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html