Re: Replacing disk strange (buggy?) behaviour - RAID1

Andrei Borzenkov Tue, 20 Apr 2021 11:19:58 -0700

On 19.04.2021 18:22, Jonah Sabean wrote:
> I'm running Ubuntu 21.04 (technically not a stable "release" yet, but
> it will be in a few days, so if this is an ubuntu specific issue I'd
> like to report it before it is!).
> 
> The btrfs volume in question is two 8TB hard disks that were in RAID1
> at the time the filesystem was created. Kernel version is Ubuntu's
> 5.11.0-14-generic with btrfs-progs version 5.10.1-1build1 in the
> hirsute repos currently. This array is mostly non-changing archived
> data, if that even matters.
> 
> I replaced a missing disk (sda is the replacement disk) last night
> while in a degraded mount (left it all night to complete) with `btrfs
> replace start 1 /dev/sda1 /mnt/btrfs` (1 was the missing disk in btrfs
> fi show) and it appears to have worked fine. However, when I ran
> `btrfs fi usage` it returned:
> 
> Overall:
>     Device size:                  14.55TiB
>     Device allocated:              2.41TiB
>     Device unallocated:           12.14TiB
>     Device missing:                  0.00B
>     Used:                          1.60TiB
>     Free (estimated):              8.63TiB      (min: 6.61TiB)
>     Free (statfs, df):            12.14TiB
>     Data ratio:                       1.50
>     Metadata ratio:                   1.43
>     Global reserve:              512.00MiB      (used: 0.00B)
>     Multiple profiles:                 yes      (data, metadata, system)
> 
> Data,single: Size:820.00GiB, Used:3.25MiB (0.00%)
>    /dev/sdb1     820.00GiB
> 
> Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
>    /dev/sda1     819.00GiB
>    /dev/sdb1     819.00GiB
> 
> Metadata,single: Size:4.00GiB, Used:864.00KiB (0.02%)
>    /dev/sdb1       4.00GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:1.69GiB (56.23%)
>    /dev/sda1       3.00GiB
>    /dev/sdb1       3.00GiB
> 
> System,single: Size:32.00MiB, Used:144.00KiB (0.44%)
>    /dev/sdb1      32.00MiB
> 
> System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%)
>    /dev/sda1       8.00MiB
>    /dev/sdb1       8.00MiB
> 
> Unallocated:
>    /dev/sda1       6.47TiB
>    /dev/sdb1       5.67TiB
> 
> So a small amount of actual data and metadata was still single on the
> disk I was rebuilding from (sdb), but it had massively allocated
> "single" chunks in the process (relatively equal to what I had in
> actual data), and to a lesser extent, metadata too.


Mounting raid1 btrfs writable in degraded mode creates chunks with
single profile. This is long standing issue. What is rather surprising
that you apparently have chunk size 819GiB which is suspiciously close
to 10% of 8TiB. btrfs indeed limits chunk size to 10% of total space,
but it should not exceed 10GiB. Could it be specific Ubuntu issue?

So when you wrote data in degraded mode it had to allocate new chunk
with "single" profile.

> Why didn't it free
> those up as it replaced the missing disk and duplicated the data in
> RAID1? 

Device replacement restored mirrored data (chunks with "raid1" profile)
on the new device. It had no reasons to touch chunks with "single"
profile because from btrfs point of view these chunks never had any data
on replaced device so there is nothing to write there.

> Shouldn't it all be RAID1 once it's complete,

No. btrfs replace restores content of missing device. It is not
replacement for profile conversion.

> why even have
> such small amounts remain single? Easy fix I thought, as at first
> glance I didn't realize 800GiB was allocated single, only paying
> attention to the small amounts used, so I did a soft convert to fix
> this.
> sudo btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/btrfs
> 
> Convert was pretty quick... took just a few minutes, but of course now
> it's all allocated just as raid1 now (with presumably 0 actual data in
> most of them):

Correct. To convert profile btrfs must allocate new chunks in new
profile and copy data over.
...
> 
> My questions are:
> 1. Why did it have so much 'single' allocated chunks to begin with?

It does not look like "chunks", rather it really looks like "chunk".
Output of

btrfs inspect-internal dump-tree -d /dev/xxx

may be interesting.

> Everything was RAID1 all up until the disk replacement, so it clearly
> did this during the `btrfs replace` process. 

No, it did it during degraded writable mount.

> Did I do this wrong, or
> is there a bug?

There is misfeature that btrfs creates "single" chunks during degraded
mount. Ideally it should create degraded raid1 chunks.

> 2. Would the btrfs replace have failed if the filesystem was more full
> and those chunks were not possible to allocate (it basically allocated
> double the amount of data I have after all, so if the fs was 50%+
> full...)?

btrfs replace duplicates data that was on missing device. If you were
able to write this data while device was present, btrfs replace cannot
fail due to missing space (of course if device is at least as large).

> 3. How do I prevent this from happening in the future, should I need
> to replace a disk?

Do not write anything in degraded mode.

> Is this possibly an Ubuntu related issue (perhaps
> how the btrfs progs is older relative to the kernel?).
> 
> The 7GiB metadata isn't so bad, however I did proceed to run
> btrfs balance start -dusage=0 /mnt/btrfs
> 
> Is it possible to run balance with `-dusage=0` along with the convert
> to do that all in one balance? Obviously, that doesn't solve the
> actual issue to begin with, I'm just curious as I did it in two steps.
> 
> FWIW: The `dusage=0` filter freed up pretty much everything as I
> expected it to, and it looks pretty much identical to how it did
> before the disk replacement:
> Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
>   /dev/sda1     819.00GiB
>   /dev/sdb1     819.00GiB
> 
> I'm willing to do the process all over again as all this data is on
> another system, I just would like assurance I don't run into this same
> issue twice.
> 
> Thanks,
> -Jonah
>

Re: Replacing disk strange (buggy?) behaviour - RAID1

Reply via email to