Hugo Mills posted on Thu, 14 Nov 2013 21:00:56 +0000 as excerpted: >> Is there a formula to calculate how much space btrfs _might_ need? > > Not really. I'd expect to need something in the range 250-1500 GiB of > headroom, depending on the size of the filesystem (and on the size of > the metadata).
As a somewhat more concrete answer... While recently doing a bit of research on something else, I came across comments that on a large enough filesystem, data chunks default to 1 GiB, while metadata chunks default to 256 MiB. And we know that data mode defaults to SINGLE, while metadata mode defaults to DUP. So on a default single-device btrfs of several gigs plus, assuming the files being manipulated are under 1 GiB size, keeping an unallocated space reserve of 1.5 GiB should be reasonable. That's enough unallocated space to allocate one more 1 GiB data chunk, plus one more 256 MiB metadata chunk, doubled to a half GiB due to DUP mode. Obviously in the single-mode-metadata case, the metadata requirement would be only a single copy, so 256 MiB for it, 1.25 GiB total unallocated, minimum. btrfs filesystem show is the command used to see what your allocated space for a filesystem looks like, per device. However, it doesn't note UNALLOCATED space, only size and used (aka allocated), so an admin must do the math to figure unallocated. If the files being manipulated are over a gig in size, round up to the nearest whole GiB for the data and add another half GiB to cover the quarter-gig DUP metadata case. If the filesystem is under a gig in size, btrfs defaults to mixed data+metadata, with chunks of 256 MiB if there's space but apparently rather more flexibility in ordered to better utilize all available space. At such "small" sizes[1], full allocation with no more to allocate being common, but one does hope people using such sized filesystems have a good idea what will be going on them, and they won't / need/ to allocate further chunks after the initial filesystem population. And quite in contrast to the multi-TB filesystems, rebalancing such a filesystem in ordered to recover lost space should be relatively fast even on spinning rust. For filesystems of 1 GiB up to say 10 GiB, it's a more open question, altho at that size, there's still a rather good chance that the sysadmin has a reasonably good idea what's going on the filesystem and has planned accordingly, with some "reasonable" level of over-allocation for future- proofing and plan fuzziness, and rebalances should still occur in reasonable time as well, so it shouldn't be a /huge/ problem unless the admin simply isn't tracking the situation. The multi-device situation is another dimension vector. Apparently, except for single mode, btrfs at this point only ever allocates in pairs (plus raid5/6 checksum chunks if applicable, and pairs of pairs in raid10 mode), regardless of the number of devices available, which does simplify calculations to some degree. Btrfs' multi-device default (for >1 GiB per device sizes, anyway) is single data, raid1 metadata. So to reserve space for one chunk of either type, we'd need at least 1 GiB unallocated on ONE device to allow at least one single-mode data chunk allocation, PLUS at least 256 MiB unallocated on each of TWO devices to cover at least one raid1-mode metadata chunk allocation. Thus, with two devices, we'd require at least 1.25 GiB free/unallocated on one device (1 GiB data chunk plus one copy of the 256 MiB metadata chunk), 256 MiB on the other (the second copy of the metadata). For a three+ device filesystem, that would work, OR 256 MiB on each of two (for the raid1 metadata), 1 GiB on a third (for the data). For raid1 data the 1 GiB data chunks must have two copies, each on its own device, and the above multi-device default scenario would modify accordingly: 2-device-case: 1.25 GiB minimum unallocated on each device (one copy each for a data and a metadata chunk). 3-device-case: That OR 1.25/1.0/.25 GiB. 4-device-plus-case: Either of those or 1.0/1.0/.25/.25 GiB. For single metadata plus default single data, we're back to the 1.25 GiB total case, in two separate chunks of 1 GiB and 256 MiB, either on separate devices or the same device. I haven't personally played with the raid0 case as it doesn't fit my use- case, but the wiki documentation suggests that it still allocates chunks only in pairs, striping the data/metadata across the pair. So we're looking at a minimum 1 GiB on each of two separate devices for a raid0 data chunk allocation (which would then allow two gigs of data), a minimum of 256 MiB on each of two separate devices for a raid0 metadata chunk allocation (which would hold a half-gig of metadata). Permutations are, as they say "left as an exercise for the reader." =:^) Apparently raid10 mode is pairs of pairs, so allocates in sets of four. Metadata: 256 MiB on each of four separate devices, 512 MiB metadata capacity. Data: 1 GiB on each of four separate devices, holds 2 GiB worth of data. Again, permutations "left as an exercise for the reader." Finally, there's the mixed data/metadata chunk mode that's the default on <1 GiB filesystems. Default chunk sizes there are 256 MiB, with the same pair-allocation rules for multi-device filesystems as above. But as discussed under the single device case, these filesystems are often capacity-planned and fully allocated from the beginning, with no further chunk allocation necessary once the filesystem is populated. That leaves raid5/6. With the caveat that these raid modes aren't yet ready for normal use (even more so than the still experimental btrfs as a whole, where good backups are STRONGLY RECOMMENDED, with raid5/6 mode, REALLY expect your data to be eaten for breakfast, so do NOT use it in present form for anything but temporary testing!)... raid5 should work like raid0 above, but requiring one more device chunk reserved for the raid5 checksumming, thus reserving in threes with no additional capacity over raid0. raid6 is the same but with yet another reserved, thus reserving in fours. Again, permutations "left as an exercise for the reader." Presumably raid50/60 will be possible with little change in the code once raid5/6 stabilize, since it's a logical combination with raid0, with the required parallel chunk reservation 6 and 8 devices wide respectively, but AFAIK, that's not even supported at all yet, and even if it is, it's hardly worth trying since the raid5/6 component remains so highly unstable at this point. And of course there's N-way mirroring on the roadmap as well, but implementation remains some way out, beyond raid5/6 normalization. When it comes, its parallel chunk reservation characteristics can be predicted based on the raid1 discussion above, extended from it by multiplying by the N in the N-way mirroring, instead of by a hard-coded two, as done in the current raid1 case. (This is actually a case I'm strongly interested in, 3-way-mirroring, perhaps even in the raid10 variant thus requiring six devices minimum, but given btrfs history to date and current progress on raid5/6, I don't expect to see it in anything like normalized form until well into next year, perhaps a year from now, at the earliest.) --- [1] Re < 1 GiB being "small", I still can't help but think of my first computer when I mention that, a 486-class machine with a 130 MB (128 MiB or some such, half the size of my /boot and 1/128th the size of my main memory, today!) hard drive, and that was early 90s, so while I've a bit of computer experience I'm still a relative newcomer compared to many in the *ix community. It was several disk upgrade generations later when I got my first gig-sized drive, and it sure didn't seem "small" at the time! My how times do change! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html