Hi Patrik,
Thanks for posting a test case. more below. On 03/26/2016 07:51 PM, Patrik Lundquist wrote:
So with the lessons learned: # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde # mount /dev/sdb /mnt; dmesg | tail # touch /mnt/test1; sync; btrfs device usage /mnt Only raid10 profiles. # echo 1 >/sys/block/sde/device/delete We lost a disk. # touch /mnt/test2; sync; dmesg | tail We've got write errors. # btrfs device usage /mnt No 'single' profiles because we haven't remounted yet. # reboot # wipefs -a /dev/sde; reboot # mount -o degraded /dev/sdb /mnt; dmesg | tail # btrfs device usage /mnt Still only raid10 profiles. # touch /mnt/test3; sync; btrfs device usage /mnt Now we've got 'single' profiles. Replace now or get hosed.
Since you are replacing the failed device without mount/unmount/reboot, so this should work. And you would need those parts of hot spare/auto replace patches only if the test case had unmount/mount or reboot at this stage.
# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs 0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 We didn't inherit the /dev/sde error count. Is that a bug?
No. Its other way, it would have been a bug if the replace-target inherited the error counters.
# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft -sconvert=raid10,soft -vf /mnt; dmesg | tail # btrfs device usage /mnt Back to only 'raid10' profiles. # umount /mnt; mount /dev/sdb /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 11 [/dev/sde].read_io_errs 0 [/dev/sde].flush_io_errs 2 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 The old counters are back. That's good, but wtf?
No. I doubt if they are old counters. The steps above didn't show old error counts, but since you have created a file test3 so there will be some write_io_errors, which we don;t see after the balance. So I doubt if they are old counter but instead they are new flush errors.
# btrfs device stats -z /dev/sde Give /dev/sde a clean bill of health. Won't warn when mounting again.
Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html