As the investigation about unexpected btrfs corruption goes on, here we
expose an strange v1 space cache corruption.

The script is updated to gist:
https://gist.github.com/adam900710/d37f38070f7fc4d858ffe856c516b426

The script itself is pretty straight forward:

0) Create a btrfs with large enough data chunk
   Original single data chunk created by mkfs is not large enough.
   Do a full balance to create a large enough data chunk, so space cache
   will live in a data chunk which also has its own cache.

1) Does some fsstress load along with dm-log-writes.
   The load is pretty small. Just -n 200 could reproduce it.

   dm-log-writes will record all the operations to later analyse.

2) Use dm-log-writes to replay to each FLUSH and FUA operations and do
   fsck
   In the script, it does this manually, just to check both FUA and
   FLUSH.
   In fact we can use --check fua option to do it in one line.

   Although btrfs check won't return error as it detects invalid free
   space cache and just ignore them, but we can get free space cache
   related error prompt.

Then we can get some free space cache corruption in both flush and fua
operations.
And some of them can even survive across *several* transaction.

Further more, when such corruption happens, space cache file extent
seems to be CoWed, instead of being overwritten.
In my test environment, the whole 64K file extent of metadata block
group cache just get CoWed.
(In previous trans, its bytenr is XXX by in next trans it's YYY, and the
inode size doesn't change at all, but nbytes seems is increasing)

Although kernel and btrfs check can both report such problem due to free
space bytes difference, but that's already the last defensing line.
The corrupted free space cache passes both generation and csum check.

I'll keep digging while advice from anyone who is familiar with free
space cache would really help in this case.

Thanks,
Qu

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to