Re: Global hotspare functionality

Austin S. Hemmelgarn Tue, 29 Mar 2016 13:01:18 -0700

On 2016-03-29 15:24, Yauhen Kharuzhy wrote:

On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote:


  No. No. No please don't do that, it would lead to trouble in handing
  slow devices. I purposely didn't do it.


Hmm. Can you explain please? Sometimes admins may want to have
autoreplacement working automatically if drive was failed and removed
before unmounting and remounting again. The simplest way to achieve this —
add spare and always mount FS with 'degraded' option (we need to use
this option in any case if we have root fs on RAID, for instance, to
avoiding non-bootable state). So, if the autoreplacement code will check for
missing drives also, this will working without user intervention. To
allow user to decide if he wants autoreplacement, we can add mount
option like '(no)hotspare' (I have done this already for our project and
will send patch after rebasing onto your new series). Yes, there are
side effects exists if you want to make some experiments with missing
drives in FS, but you can disable autoreplacement for such case.

If you know about any pitfalls in such scenarios, please point me to
them, I am newbie in FS-related kernel things.

If a disk is particularly slow to start up for some reason (maybe it'sgoing bad, maybe it's just got a slow interconnect (think SD cards),maybe it's just really cold so the bearings seizing up), then this wouldpotentially force it out of the array when it shouldn't be.

That said, having things set to always allow degraded mounts is_extremely dangerous_. If the user does not know anything failed, theyalso can't know they need to get anything fixed. While notificationcould be used, it also introduces a period of time where the user is atrisk of data loss without them having explicitly agreed to this risk (bymanually telling it to mount degraded).

I could possibly understand doing this for something that needs to beguaranteed to come on line when powered on, but **only** if it notifiesresponsible parties that there was a problem **and** it is explicitlydocumented, and even then I'd be wary of doing this unless there wassomething in place to handle the possibility of false positives (yes,they do happen), and to make certain that the failed hardware gotreplaced as soon as possible.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Global hotspare functionality

Reply via email to