Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

Yauhen Kharuzhy Tue, 29 Mar 2016 17:50:32 -0700

On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
> Write and Flush errors are considered as critical errors,
> upon which the device will be brought offline and marked as
> failed. Write and Flush errors are identified using device
> error statistics.
> 
> Signed-off-by: Anand Jain <anand.j...@oracle.com>
> 
> btrfs: check for failed device and hot replace
> 
> This patch creates casualty_kthread to check for the failed
> devices, and triggers device replace.
> 
> Signed-off-by: Anand Jain <anand.j...@oracle.com>
> ---
>  fs/btrfs/ctree.h   |   2 +
>  fs/btrfs/disk-io.c | 161 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/disk-io.h |   2 +
>  fs/btrfs/volumes.c |   1 +
>  fs/btrfs/volumes.h |   4 ++
>  5 files changed, 169 insertions(+), 1 deletion(-)


btrfs_check_and_handle_casualty() tries to perfom auto-replacement
only once after each failure. If no hotspare was added in system before 
failure, only one
remaining way to replace drive is to perform replace manually. This sounds
reasonable, so just clarification: are you sure that we shouldn't start
autoreplacement if hotspare will be added after drive failure?

V1 of the patchset tried to perform autoreplace endlessly until replace
drive is added.



-- 
Yauhen Kharuzhy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

Reply via email to