On 2019-06-18 14:45, Hugo Mills wrote:
On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
I've been a btrfs user for quite a number of years now, but it seems
I need the wiseness of the btrfs gurus on this one!

I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
A few days ago, I replaced one of the 3T by a new 10T, running btrfs
replace and then resizing the FS to use all the available space of
the new device.

The filesystem was 90% full before I expanded it so, as expected,
most of the space on the new device wasn't actually allocatable in
raid1, as very few available space was available on the 4 other
devs.

Of course the solution is to run a balance, but as the filesystem is
now quite big, I'd like to avoid running a full rebalance. This
would be quite i/o intensive, would be running for several days, and
putting and unecessary stress on the drives. This also seems
excessive as in theory only some Tb would need to be moved: if I'm
correct, only one of two block groups of a sufficient amount of
chunks to be moved to the new device so that the sum of the amount
of available space on the 4 preexisting devices would at least equal
the available space on the new device, ~7Tb instead of moving ~22T.
I don't need to have a perfectly balanced FS, I just want all the
space to be allocatable.

I tried using the -ddevid option but it only instructs btrfs to work
on the block groups allocated on said device, as it happens, it
tends to move data between the 4 preexisting devices and doesn't fix
my problem. A full balance with -dlimit=100 did no better.

    -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
be a pretty limited change. You'll need to use a larger number than
that if you want it to have a significant visible effect.
Last I checked, that's not how the limit filter works. AFAIUI, it's an upper limit on how full a chunk can be to be considered for the balance operation. So, balancing with only `-dlimit=100` should actually balance all data chunks (but only data chunks, because you haven't asked for metadata balancing).

    The -ddevid=<old_10T> option would be my recommendation. It's got
more chunks on it, so they're likely to have their copies spread
across the other four devices. This should help with the
balance.

    Alternatively, just do a full balance and then cancel it when the
amount of unallocated space is reasonably well spread across the
devices (specifically, the new device's unallocated space is less than
the sum of the unallocated space on the other devices).

Is there a way to ask the block group allocator to prefer writing to
a specific device during a balance? Something like -ddestdevid=N?
This would just be a hint to the allocator and the usual constraints
would always apply (and prevail over the hint when needed).

    No, there isn't. Having control over the allocator (or bypassing
it) would be pretty difficult to implement, I think.

    It would be really great if there was an ioctl that allowed you to
say things like "take the chunks of this block group and put them on
devices 2, 4 and 5 in RAID-5", because you could do a load of
optimisation with reshaping the FS in userspace with that. But I
suspect it's a long way down the list of things to do.

Or is there any obvious solution I'm completely missing?

    I don't think so.

    Hugo.


Reply via email to