Christian Rohmann posted on Fri, 04 Sep 2015 10:28:21 +0200 as excerpted: > Hello Ducan, > > thanks a million for taking the time an effort to explain all that. > I understand that all the devices must have been chunk-allocated for > btrfs to tell me all available "space" was used (read "allocated to data > chunks"). > > The filesystem is quite old already with kernels starting at 3.12 (I > believe) and now 4.2 with always the most current version of btrfs-progs > debian has available.
IIRC I wrote the previous reply without knowing the kernel you were on. You had posted the userspace version, which was current, but not the kernel version, so it's good to see that posted now. Since kernel 3.12 was before automatic-empty-chunk-reclaim, empty chunks wouldn't have been reclaimed back then, as I guessed. You're running current 4.2 now and they would be, but only if they're totally empty, and I'm guessing they're not, particularly if you don't run with the autodefrag mount option turned on or do regular manual defrags. (Defrag doesn't directly affect chunks, that's what balance is for, but failing to defrag will mean things are more fragmented, which will cause btrfs to spread out more in the available chunks as it'll try to put new files in as few extents as possible, possibly in empty chunks if they haven't been reclaimed and the space in fuller chunks is too small for the full file, upto the chunk-size, of course.) And of course, only with 4.1 (nominally 3.19 but there were initial problems) was raid6 mode fully code-complete and functional -- before that, runtime worked, it calculated and wrote the parity stripes as it should, but the code to recover from problems wasn't complete, so you were effectively running a slow raid0 in terms of recovery ability, but one that got "magically" updated to raid6 once the recovery code was actually there and working. So I'm guessing you have some 8-strip-stripe chunks at say 20% full or some such. There's 19.19 data TiB used of 22.85 TiB allocated, a spread of over 3 TiB. A full nominal-size data stripe allocation, given 12 devices in raid6, will be 10x1GiB data plus 2x1GiB parity, so there's about 3.5 TiB / 10 GiB extra stripes worth of chunks, 350 stripes or so, that should be freeable, roughly (the fact that you probably have 8- strip, 12-strip, and 4-strip stripes, on the same filesystem, will of course change that a bit, as will the fact that four devices are much smaller than the other eight). > On 09/03/2015 04:22 AM, Duncan wrote: [snipped] > > I am running a full balance now, it's at 94% remaining (running for 48 > hrs already ;-) ). > > Is there any way I should / could "scan" for empty data chunks or almost > empty data chunks which could be freed in order to have more chunks > available for the actual balancing or new chunks that should be used > with a 10 drive RAID6? I understand that btrfs NOW does that somewhat > automagically, but my FS is quite old and used already and there is new > data coming in all the time, so I wand that properly spread across all > the drives. There are balance filters, -dusage=20, for instance, would only rebalance data (-d) chunks with usage under 20%. Of course there's more about balance filters in the manpage and on the wiki. The great thing about -dusage= (and -musage= where appropriate) is that it can often free and deallocate large numbers of chunks at a fraction of the time it'd take to do a full balance. Not only are you only dealing with a fraction of the chunks, but since the ones it picks are for example only 20% full (with usage=20) or less, they take only 20% (or less) of the time to balance that a full chunk would. Additionally, 20% full or less means you reclaim chunks 4:1 or better -- five old chunks are rewritten into a single new one, freeing four! So in a scenario with a whole bunch of chunks at less than say 2/3 full (usage=67, rewrite three into two), this can reclaim a whole lot of chunks in a relatively small amount of time, certainly so compared to a full balance, since rewriting a 100% full chunk takes the full amount of time and doesn't reclaim anything. But, given that the whole reason you're messing with it is to try to even out the stripes across all devices, a full rewrite is eventually in order anyway. However, knowing about the filters would have let you do a -dusage=20 or possibly -dusage=50 before the full balance, leaving the full balance more room to work in and and possibly allowing a more effective balance to the widest stripes possible. Likely above 50 and almost certainly above 67, the returns wouldn't be worth it, since the time taken for the filtered balance would be longer, and an unfiltered balance was planned afterward anyway. Here, I'd have tried something like 20 first, then 50 if I wasn't happy with the results of 20. The thing is, either 20 would give me good results in a reasonably short time, or there'd be so few candidates that it'd be very fast to give me the bad results, thus allowing me to try 50. Same with 50 and 67, tho I'd definitely be unhappy if 50 didn't give me at least a TiB or so freed to unallocated, hopefully some of which would be in the first eight devices, ideally giving the full balance room enough to do full 12-device-stripes, keeping enough free on the original eight devices as it went to go 12-wide until the smaller devices were full, then 8 wide, eliminating the 4-wide stripes entirely. Tho as Hugo suggested, having the original larger eight devices all the way full and thus a good likelihood of all three stripe widths isn't ideal, and it might actually take a couple balances (yes, at a week a piece or whatever =:^() to straighten things out. A good -dusage= filtered balance pre-pass would have likely taken under a day and with luck would have allowed a single full balance to do the job, but it's a bit late for that now... Meanwhile, FWIW that long maintenance time is one of the reasons I'm a strong partitioning advocate. Between the fact that I use SSDs and the fact that my btrfs partitions are all under 50 GiB per partition (which probably wouldn't be practical for you, but half to 1 TiB per device partition might be...), full scrubs typically take under a minute here, and full balances still in the single-digit minutes. Of course, I have other partitions/filesystems too, and to do all of them would take a bit longer, say an hour, but with maintenance time under 10 minutes per filesystem, doing it is not only not a pain, but is actually trivial, where as doing maintenance that's going to take a week is definitely a pain, something you're going to avoid if possible, meaning there's a fair chance a minor problem will be allowed to get far worse before it's addressed, than it would be if the maintenance were a matter of a few hours, say a day at most. But that's just me. I've fine tuned my partitioning layout over multiple multi-year generations and have it setup so I don't have the hassle of "oh, I'm out of space on this partition, gotta symlink to a different one" that a lot of folks point to as the reason they prefer big storage pools like lvm or multi-whole-physical-device btrfs. And obviously, I'm not scaling storage to the double-digit TiB you are, either. So your system, your layout and rules. I'm simply passing on one reason that I'm such a strong partitioning advocate, here. Plus I know you'd REALLY like those 10 minute full-balances right about now! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html