On 10/14/2017 02:46 AM, Liu Bo wrote:
On Sun, Oct 08, 2017 at 10:23:58PM +0800, Anand Jain wrote:


On 10/07/2017 07:56 AM, Liu Bo wrote:
On Thu, Oct 05, 2017 at 09:56:59PM +0800, Anand Jain wrote:


On 10/05/2017 04:11 AM, Liu Bo wrote:
On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote:
From: Anand Jain <anand.j...@oracle.com>

Write and flush errors are critical errors, upon which the device fd
must be closed and marked as failed.


Can we defer the job of closing device to umount?

   Originally I think it was designed to handle the bad disk
   same as missing device. as below [1]. This patch just does that.

   But I am curious to know if there is any issue to close failed
   device ?

    Anything on this ?

No big deal, it's just different with md's behavior, but surely it
doesn't make a big difference whether we close device fd here or not.

 Ok.



[1]
-----
static int read_one_dev(struct btrfs_fs_info *fs_info,
                          struct extent_buffer *leaf,
                          struct btrfs_dev_item *dev_item)
::
              /*
               * this happens when a device that was properly setup
               * in the device info lists suddenly goes bad.
               * device->bdev is NULL, and so we have to set
               * device->missing to one here
               */
------

   So here I misuse the device missing scenario (device->bdev = NULL)
   to the failed device scenario. (code comment mentioned it).


This is OK to me, we can unify these stuff later, for now, ->missing
is to bypass this failed disk for r/w.

   Secondly as in the patch 1/2 commit log and also Austin mentioned it,
   if failed device close is deferred it will also defer the cleanup of
   the failed device at the block layer.

   But yes block layer can still clean up when btrfs closes the
   device at the time of replace, also where in the long run, pulling
   out of the failed disk would (planned) trigger a udev notification
   into the btrfs sysfs interface and it can close the device.

   Anything that I have missed ?


I'm more inclined to do cleanup by making udev call 'device remove',
which is supposed to do all the validation checks you need to done
here.

    Validation check like what ?

OK, just realize that I was talking about what we should do upon
device getting pulled out, in that case, validation check about raid
profile is needed, such as can we keep the raid profile or do we need
to degrade raid proflie, and 'device delete' can do that.

 OK.

Although I agree with the fact that write/flush errors are critical
ones,


it's still too strict if it got only a few write/flush errors,
given cable issues or timeout from some layers may also lead to those
errors.  A compromised way is to have a threshold to give drives
another chance.


 Fundamentally IO retries are property of the IO module to deal with it.
 It knows what type of IO error is worth a retry. If FS receives an IO
 has failed it should trust IO module has done its best effort. Adding
 additional retires at the FS layer is _not_ a good idea - it would
 affect the time to fail-over, which is very crucial for HA (high
 availability) configurations.

 Instead to deal with spurious IO error its planned to create an offline
 device state, as I wrote about it here [1].

   [1]
   https://www.spinics.net/lists/linux-btrfs/msg49643.html

 I removed offline part here to keep the changes specific to the trigger
 as in here, and it can be added later.


Thanks, Anand

thanks,

-liubo


Thanks, Anand

   Further, jfyi closing of failed device isn't related to the
   looping for device failure though.

We can go mark the device failed and skip it while doing read/write,
and umount can do the cleanup work.

   In servers a need of umount is considered as downtime, things
   shouldn't wait for umount to cleanup.


OK.

That way we don't need a dedicated thread looping around to detect a
rare situation.

   I think its better to make it event based instead of thread looping,
   pls take a look..

     [PATCH v8.1 2/2] btrfs: mark device failed for write and flush errors


Event based is good, for anything about failing disk, we would have to
follow what md is doing eventually, i.e. let udev handle it.

thanks,
-liubo

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to