Hi.

I have discovered case when replacement of missing devices causes
metadata corruption. Does anybody know anything about this?

I use 4.4.5 kernel with latest global spare patches.

If we have RAID6 (may be reproducible on RAID5 too) and try to replace
one missing drive by other and after this try to remove another drive
and replace it, plenty of errors are shown in the log:

[  748.641766] BTRFS error (device sdf): failed to rebuild valid
logical 7366459392 for dev /dev/sde
[  748.678069] BTRFS error (device sdf): failed to rebuild valid
logical 7381139456 for dev /dev/sde
[  748.693559] BTRFS error (device sdf): failed to rebuild valid
logical 7290974208 for dev /dev/sde
[  752.039100] BTRFS error (device sdf): bad tree block start
13048831955636601734 6919258112
[  752.647869] BTRFS error (device sdf): bad tree block start
12819300352 6919290880
[  752.658520] BTRFS error (device sdf): bad tree block start
31618367488 6919290880
[  752.712633] BTRFS error (device sdf): bad tree block start
31618367488 6919290880

After device replacement finish, scrub shows uncorrectable errors.
Btrfs check complains about errors too:
root@test:~/# btrfs check -p /dev/sdc
Checking filesystem on /dev/sdc
UUID: 833fef31-5536-411c-8f58-53b527569fa5
checksum verify failed on 9359163392 found E4E3BDB6 wanted 00000000
checksum verify failed on 9359163392 found E4E3BDB6 wanted 00000000
checksum verify failed on 9359163392 found 4D1F4197 wanted DE0E50EC
bytenr mismatch, want=9359163392, have=9359228928

Errors found in extent allocation tree or chunk allocation
checking free space cache [.]
checking fs roots [.]
checking csums
checking root refs
found 1049788420 bytes used err is 0
total csum bytes: 1024000
total tree bytes: 1179648
total fs tree bytes: 16384
total extent tree bytes: 16384
btree space waste bytes: 124962
file data blocks allocated: 1049755648
 referenced 1049755648

After first replacement metadata seems not spread across all devices:
Label: none  uuid: 3db39446-6810-47bf-8732-d5a8793500f3
        Total devices 4 FS bytes used 1002.00MiB
        devid    1 size 8.00GiB used 1.28GiB path /dev/sdc
        devid    2 size 8.00GiB used 1.28GiB path /dev/sdd
        devid    3 size 8.00GiB used 1.28GiB path /dev/sdf
        devid    4 size 8.00GiB used 1.25GiB path /dev/sdg

# btrfs device usage /mnt/
/dev/sdc, ID: 1
   Device size:             8.00GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:        256.00MiB
   System,RAID6:           32.00MiB
   Unallocated:             6.72GiB

/dev/sdd, ID: 2
   Device size:             8.00GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:        256.00MiB
   System,RAID6:           32.00MiB
   Unallocated:             6.72GiB

/dev/sdf, ID: 3
   Device size:             8.00GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:        256.00MiB
   System,RAID6:           32.00MiB
   Unallocated:             6.72GiB

/dev/sdg, ID: 4
   Device size:             8.00GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:        256.00MiB
   Unallocated:             6.75GiB


Steps to reproduce:
1) Create and mount RAID6
2) remove drive belonging to RAID, try write and let kernel code close
the device
3) replace missing device by 'btrfs replace start' command
4) remove drive in another slot, try write, wait for closing of it
5) start replacing of missing drive -> ERRORS.

If full balance after step 3) was done, no errors appeared.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to