Re: fatal database corruption with btrfs "out of space" with ~50 GB left

Tomasz Chmielewski Wed, 14 Feb 2018 20:20:27 -0800

On 2018-02-15 10:47, Qu Wenruo wrote:

On 2018年02月14日 22:19, Tomasz Chmielewski wrote:

Just FYI, how dangerous running btrfs can be - we had a fatal,
unrecoverable MySQL corruption when btrfs decided to do one of these"I
have ~50 GB left, so let's do out of space (and corrupt some files at
the same time, ha ha!)".


I'm recently looking into unexpected corruption problem of btrfs.

Would you please provide some extra info about how the corruptionhappened?


1) Is there any power reset?
   Btrfs should be bullet proof, but in fact it's not, so I'm here to
   get some clue.


No power reset.

2) Are MySQL files set with nodatacow?
   If so, data corruption is more or less expected, but should be
   handled by checkpoint of MySQL.


Yes, MySQL files were using "nodatacow".

I've seen many cases of "filesystem full" with ext4, but none lead todatabase corruption (i.e. the database would always recover afterreleasing some space)

On the other hand, I've seen a handful of "out of space" with gigabytesof free space with btrfs, which lead to some light, heavy orunrecoverable MySQL or mongo corruption.

Can it be because of of how "predictable" out of space situations arewith btrfs and other filesystems?

- in short, ext4 will report out of space when there is 0 bytes left(perhaps slightly faster for non-root users) - the application trying towrite data will see "out of space" at some point, and it can stay likethis for hours (i.e. until some data is removed manually)

- on the other hand, btrfs can report out of space when there is still10, 50 or 100 GB left, meaning, any capacity planning is close toimpossible; also, the application trying to write data can be seeing thefs as transitioning between "out of space" and "data writtensuccessfully" many times per minute/second?

3) Is the filesystem metadata corrupted? (AKA, btrfs check reporterror)
   If so, that should be the problem I'm looking into.

I don't think so, there are no scary things in dmesg. However, I didn'tunmount the filesystem to run btrfs check.

4) Metadata/data ratio?
   "btrfs fi usage" could have quite good result about it.
   And "btrfs fi df" also helps.

Here it is - however, that's after removing some 80 GB data, so mostlikely doesn't reflect when the failure happened.


# btrfs fi usage /var/lib/lxd
Overall:
    Device size:                 846.25GiB
    Device allocated:            840.05GiB
    Device unallocated:            6.20GiB
    Device missing:                  0.00B
    Used:                        498.26GiB
    Free (estimated):            167.96GiB      (min: 167.96GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:411.00GiB, Used:246.14GiB
   /dev/sda3     411.00GiB
   /dev/sdb3     411.00GiB

Metadata,RAID1: Size:9.00GiB, Used:2.99GiB
   /dev/sda3       9.00GiB
   /dev/sdb3       9.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
   /dev/sda3      32.00MiB
   /dev/sdb3      32.00MiB

Unallocated:
   /dev/sda3       3.10GiB
   /dev/sdb3       3.10GiB



# btrfs fi df /var/lib/lxd
Data, RAID1: total=411.00GiB, used=246.15GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=9.00GiB, used=2.99GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
        Total devices 2 FS bytes used 249.15GiB
        devid    1 size 423.13GiB used 420.03GiB path /dev/sda3
        devid    2 size 423.13GiB used 420.03GiB path /dev/sdb3



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fatal database corruption with btrfs "out of space" with ~50 GB left

Reply via email to