On 10/05/2017 04:11 AM, Liu Bo wrote:
On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote:
From: Anand Jain <anand.j...@oracle.com>

Write and flush errors are critical errors, upon which the device fd
must be closed and marked as failed.


Can we defer the job of closing device to umount?

 Originally I think it was designed to handle the bad disk
 same as missing device. as below [1]. This patch just does that.
 But I am curious to know if there is any issue to close failed
 device ?

[1]
-----
static int read_one_dev(struct btrfs_fs_info *fs_info,
                        struct extent_buffer *leaf,
                        struct btrfs_dev_item *dev_item)
::
            /*
             * this happens when a device that was properly setup
             * in the device info lists suddenly goes bad.
             * device->bdev is NULL, and so we have to set
             * device->missing to one here
             */
------

 So here I misuse the device missing scenario (device->bdev = NULL)
 to the failed device scenario. (code comment mentioned it).

 Secondly as in the patch 1/2 commit log and also Austin mentioned it,
 if failed device close is deferred it will also defer the cleanup of
 the failed device at the block layer.

 But yes block layer can still clean up when btrfs closes the
 device at the time of replace, also where in the long run, pulling
 out of the failed disk would (planned) trigger a udev notification
 into the btrfs sysfs interface and it can close the device.

 Anything that I have missed ?

 Further, jfyi closing of failed device isn't related to the
 looping for device failure though.

We can go mark the device failed and skip it while doing read/write,
and umount can do the cleanup work.

 In servers a need of umount is considered as downtime, things
 shouldn't wait for umount to cleanup.

That way we don't need a dedicated thread looping around to detect a
rare situation.

 I think its better to make it event based instead of thread looping,
 pls take a look..

   [PATCH v8.1 2/2] btrfs: mark device failed for write and flush errors

Thanks, Anand


Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to