On 4.10.19 г. 10:50 ч., Anand Jain wrote:
> In open_fs_devices() we identify alien device but we don't reset its
> the device::name. So progs device list does not show the device missing
> as shown in the script below.
>
> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdb
> sleep 3 # avoid racing with udev's useless scans if needed
> btrfs dev add -f /dev/sdb /btrfs
> mount -o degraded /dev/sdc /btrfs1
>
> No missing device:
> btrfs fi show -m /btrfs1
> Label: none uuid: 3eb7cd50-4594-458f-9d68-c243cc49954d
> Total devices 2 FS bytes used 128.00KiB
> devid 1 size 12.00GiB used 1.26GiB path /dev/sdc
> devid 2 size 12.00GiB used 1.26GiB path /dev/sdb
>
> Signed-off-by: Anand Jain <anand.j...@oracle.com>
> ---
> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
> missing state in the userland instead of obtaining it from the kernel.
> I objected on the patch, but still those patches got merged, this bug is
> one of its side effects. Ironically I wrote patches to read device_state
> from the kernel using ioctl, procfs and sysfs but didn't get the due
> attention till a merger.
>
> fs/btrfs/volumes.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 06ec3577c6b4..05ade8c7342b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -803,10 +803,10 @@ static int btrfs_open_one_device(struct
> btrfs_fs_devices *fs_devices,
> disk_super = (struct btrfs_super_block *)bh->b_data;
> devid = btrfs_stack_device_id(&disk_super->dev_item);
> if (devid != device->devid)
> - goto error_brelse;
> + goto free_alien;
>
> if (memcmp(device->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE))
> - goto error_brelse;
> + goto free_alien;
>
Imo a better approach is to return a particular error code and do the
deletion in open_fs_devices. Otherwise it's not apparent why you use
list_for_each_entry_safe in one function to delete something in a
different one (whose name by the way doesn't suggest a deletion is going
on). Looking at the error I think enodev/enxio is appropriate.
> device->generation = btrfs_super_generation(disk_super);
>
> @@ -845,6 +845,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices
> *fs_devices,
>
> return 0;
>
> +free_alien:
> + fs_devices->num_devices--;
> + list_del(&device->dev_list);
> + btrfs_free_device(device);
> +
> error_brelse:
> brelse(bh);
> blkdev_put(bdev, flags);
> @@ -1329,11 +1334,13 @@ static int open_fs_devices(struct btrfs_fs_devices
> *fs_devices,
> fmode_t flags, void *holder)
> {
> struct btrfs_device *device;
> + struct btrfs_device *tmp_device;
> struct btrfs_device *latest_dev = NULL;
>
> flags |= FMODE_EXCL;
>
> - list_for_each_entry(device, &fs_devices->devices, dev_list) {
> + list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
> + dev_list) {
> /* Just open everything we can; ignore failures here */
> if (btrfs_open_one_device(fs_devices, device, flags, holder))
> continue;
>