Re: Possible Raid Bug

Anand Jain Sun, 27 Mar 2016 20:54:43 -0700


Hi Patrik,


Thanks for posting a test case. more below.

On 03/26/2016 07:51 PM, Patrik Lundquist wrote:

So with the lessons learned:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

We've got write errors.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail
# btrfs device usage /mnt

Still only raid10 profiles.

# touch /mnt/test3; sync; btrfs device usage /mnt

Now we've got 'single' profiles. Replace now or get hosed.


 Since you are replacing the failed device without mount/unmount/reboot,
 so this should work.

 And you would need those parts of hot spare/auto replace patches only
 if the test case had unmount/mount or reboot at this stage.

# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs    0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

We didn't inherit the /dev/sde error count. Is that a bug?


  No. Its other way, it would have been a bug if the replace-target
  inherited the error counters.

# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft
-sconvert=raid10,soft -vf /mnt; dmesg | tail

# btrfs device usage /mnt

Back to only 'raid10' profiles.

# umount /mnt; mount /dev/sdb /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   11
[/dev/sde].read_io_errs    0
[/dev/sde].flush_io_errs   2
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

The old counters are back. That's good, but wtf?


 No. I doubt if they are old counters. The steps above didn't
 show old error counts, but since you have created a file
 test3 so there will be some write_io_errors, which we don;t
 see after the balance. So I doubt if they are old counter
 but instead they are new flush errors.

# btrfs device stats -z /dev/sde

Give /dev/sde a clean bill of health. Won't warn when mounting again.





Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Possible Raid Bug

Reply via email to