Re: Mis-Design of Btrfs?

Ric Wheeler Fri, 15 Jul 2011 09:52:13 -0700

On 07/15/2011 05:23 PM, da...@lang.hm wrote:

On Fri, 15 Jul 2011, Chris Mason wrote:
Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400:
On 07/15/2011 12:34 PM, Chris Mason wrote:
By bubble up I mean that if you have multiple layers capable of doing
retries, the lowest levels would retry first.  Basically by the time we
get an -EIO_ALREADY_RETRIED we know there's nothing that lower level can
do to help.
the problem with doing this is that it can end up stalling the box forsignificant amounts of time while all the retries happen.
we already see this happening today where a disk read failure is retriedmultiple times by the disk, multiple times by the raid controller, and thenmultiple times by Linux, resulting is multi-minute stalls when you hit a diskerror in some cases.
having the lower layers do the retries automatically runs the risk of makingthis even worse.
This needs to be able to be throttled by some layer that can see the entirepicture (either by cutting off the retries after a number, after some time, orby spacing out the retries to allow other queries to get in and let the box douseful work in the meantime)
David Lang

That should not be an issue - we have a "fast fail" path for IO that shouldavoid retrying just for those reasons (i.e., for multi-path or when recovering aflaky drive).

This is not a scheme for unbounded retries. If you have a 3 disk mirror inRAID1, you would read the data no more than 2 extra times and almost never morethan once. That should be *much* faster than the multiple-second long timeoutyou see when waiting for SCSI timeout to fire, etc.


Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Mis-Design of Btrfs?

Reply via email to