Re: Help me understand what is going on with my RAID1 FS

2017-09-10 Thread Ferenc-Levente Juhos
>Problem is that each raid1 block group contains two chunks on two
>separate devices, it can't utilize fully three devices no matter what.
>If that doesn't suit you then you need to add 4th disk. After
>that FS will be able to use all unallocated space on all disks in raid1
>profile. But even then you'll be able to safely lose only one disk
>since BTRFS still will be storing only 2 copies of data.

I hope I didn't say that I want to utilize all three devices fully. It
was clear to me that there will be 2 TB of wasted space.
Also I'm not questioning the chunk allocator for RAID1 at all. It's
clear and it always has been clear that for RAID1 the chunks need to
be allocated on different physical devices.
If I understood Kai's point of view, he even suggested that I might
need to do balancing to make sure that the free space on the three
devices is being used smartly. Hence the questions about balancing.

I mean in worst case it could happen like this:

Again I have disks of sizes 3, 3, 8:
Fig.1
Drive1(8) Drive2(3) Drive3(3)
 -   X1X1
 -   X2X2
 -   X3X3
Here the new drive is completely unused. Even if one X1 chunk would be
on Drive1 it would be still a sub-optimal allocation.

This is the optimal allocation. Will btrfs allocate like this?
Considering that Drive1 has the most free space.
Fig. 2
Drive1(8) Drive2(3) Drive3(3)
X1X1-
X2-   X2
X3X3-
X4-   X4

>From my point of view Fig.2 shows the optimal allocation, by the time
the disks Drive2 and Drive3 are full (3TB) Drive1 must have 6TB
(because it is exclusively holding the mirrors for both Drive2 and 3).
For sure now btrfs can say, since two of the drives are completely
full he can't allocate any more chunks and the remaining 2 TB of space
from Drive1 is wasted. This is clear it's even pointed out by the
btrfs size calculator.

But again if the above statements are true, then df might as well tell
the "truth" and report that I have 3.5 TB space free and not 1.5TB (as
it is reported now). Again here I fully understand Kai's explanation.
Because coming back to my first e-mail, my "problem" was that df is
reporting 1.5 TB free, whereas the whole FS holds 2.5 TB of data.

So the question still remains, is it just that df is intentionally not
smart enough to give a more accurate estimation, or is the assumption
that the allocator picks the drive with most free space mistaken?
If I continue along the lines of what Kai said, and I need to do
re-balance, because the allocation is not like shown above (Fig.2),
then my question is still legitimate. Are there any filters that one
might use to speed up or to selectively balance in my case? or will I
need to do full balance?

On Sun, Sep 10, 2017 at 7:19 PM, Dmitrii Tcvetkov  wrote:
>> @Kai and Dmitrii
>> thank you for your explanations if I understand you correctly, you're
>> saying that btrfs makes no attempt to "optimally" use the physical
>> devices it has in the FS, once a new RAID1 block group needs to be
>> allocated it will semi-randomly pick two devices with enough space and
>> allocate two equal sized chunks, one on each. This new chunk may or
>> may not fall onto my newly added 8 TB drive. Am I understanding this
>> correctly?
> If I remember correctly chunk allocator allocates new chunks on device
> which has the most unallocated space.
>
>> Is there some sort of balance filter that would speed up this sort of
>> balancing? Will balance be smart enough to make the "right" decision?
>> As far as I read the chunk allocator used during balance is the same
>> that is used during normal operation. If the allocator is already
>> sub-optimal during normal operations, what's the guarantee that it
>> will make a "better" decision during balancing?
>
> I don't really see any way that being possible in raid1 profile. How
> can you fill all three devices if you can split data only twice? There
> will be moment when two of three disks are full and BTRFS can't
> allocate new raid1 block group because it has only one drive with
> unallocated space.
>
>>
>> When I say "right" and "better" I mean this:
>> Drive1(8) Drive2(3) Drive3(3)
>> X1X1
>> X2X2
>> X3X3
>> X4X4
>> I was convinced until now that the chunk allocator at least tries a
>> best possible allocation. I'm sure it's complicated to develop a
>> generic algorithm to fit all setups, but it should be possible.
>
>
> Problem is that each raid1 block group contains two chunks on two
> separate devices, it can't utilize fully three devices no matter what.
> If that doesn't suit you then you need to add 4th disk. After
> that FS will be able to use all unallocated space on all disks in raid1
> profile. But even then you'll be able to safely lose only one disk
> since BTRFS still will be storing only 2 

Re: Help me understand what is going on with my RAID1 FS

2017-09-10 Thread Ferenc-Levente Juhos
@Kai and Dmitrii
thank you for your explanations if I understand you correctly, you're
saying that btrfs makes no attempt to "optimally" use the physical
devices it has in the FS, once a new RAID1 block group needs to be
allocated it will semi-randomly pick two devices with enough space and
allocate two equal sized chunks, one on each. This new chunk may or
may not fall onto my newly added 8 TB drive. Am I understanding this
correctly?
> You will probably need to
>run balance once in a while to evenly redistribute allocated chunks
>across all disks.

Is there some sort of balance filter that would speed up this sort of
balancing? Will balance be smart enough to make the "right" decision?
As far as I read the chunk allocator used during balance is the same
that is used during normal operation. If the allocator is already
sub-optimal during normal operations, what's the guarantee that it
will make a "better" decision during balancing?

When I say "right" and "better" I mean this:
Drive1(8) Drive2(3) Drive3(3)
X1X1
X2X2
X3X3
X4X4
I was convinced until now that the chunk allocator at least tries a
best possible allocation. I'm sure it's complicated to develop a
generic algorithm to fit all setups, but it should be possible.

On Sun, Sep 10, 2017 at 5:47 PM, Kai Krakow  wrote:
> Am Sun, 10 Sep 2017 15:45:42 +0200
> schrieb FLJ :
>
>> Hello all,
>>
>> I have a BTRFS RAID1 volume running for the past year. I avoided all
>> pitfalls known to me that would mess up this volume. I never
>> experimented with quotas, no-COW, snapshots, defrag, nothing really.
>> The volume is a RAID1 from day 1 and is working reliably until now.
>>
>> Until yesterday it consisted of two 3 TB drives, something along the
>> lines:
>>
>> Label: 'BigVault'  uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db
>> Total devices 2 FS bytes used 2.47TiB
>> devid1 size 2.73TiB used 2.47TiB path /dev/sdb
>> devid2 size 2.73TiB used 2.47TiB path /dev/sdc
>>
>> Yesterday I've added a new drive to the FS and did a full rebalance
>> (without filters) over night, which went through without any issues.
>>
>> Now I have:
>>  Label: 'BigVault'  uuid: a37ad5f5-a21b-41c7-970b-13b6c4db33db
>> Total devices 3 FS bytes used 2.47TiB
>> devid1 size 2.73TiB used 1.24TiB path /dev/sdb
>> devid2 size 2.73TiB used 1.24TiB path /dev/sdc
>> devid3 size 7.28TiB used 2.48TiB path /dev/sda
>>
>> # btrfs fi df /mnt/BigVault/
>> Data, RAID1: total=2.47TiB, used=2.47TiB
>> System, RAID1: total=32.00MiB, used=384.00KiB
>> Metadata, RAID1: total=4.00GiB, used=2.74GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> But still df -h is giving me:
>> Filesystem   Size  Used Avail Use% Mounted on
>> /dev/sdb 6.4T  2.5T  1.5T  63% /mnt/BigVault
>>
>> Although I've heard and read about the difficulty in reporting free
>> space due to the flexibility of BTRFS, snapshots and subvolumes, etc.,
>> but I only have a single volume, no subvolumes, no snapshots, no
>> quotas and both data and metadata are RAID1.
>>
>> My expectation would've been that in case of BigVault Size == Used +
>> Avail.
>>
>> Actually based on http://carfax.org.uk/btrfs-usage/index.html I
>> would've expected 6 TB of usable space. Here I get 6.4 which is odd,
>> but that only 1.5 TB is available is even stranger.
>>
>> Could anyone explain what I did wrong or why my expectations are
>> wrong?
>>
>> Thank you in advance
>
> Btrfs reports estimated free space from the free space of the smallest
> member as it can only guarantee that. In your case this is 2.73 minus
> 1.24 free which is roughly around 1.5T. But since this free space
> distributes across three disks with one having much more free space, it
> probably will use up that space at half the rate of actual allocation.
> But due to how btrfs allocates from free space in chunks, that may not
> be possible - thus the low unexpected value. You will probably need to
> run balance once in a while to evenly redistribute allocated chunks
> across all disks.
>
> It may give you better estimates if you combine sdb and sdc into one
> logical device, e.g. using raid0 or jbod via md or lvm.
>
>
> --
> Regards,
> Kai
>
> Replies to list-only preferred.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html