Re: How does btrfs handle bad blocks in raid1?

Chris Murphy Tue, 14 Jan 2014 13:38:27 -0800

On Jan 14, 2014, at 2:19 PM, Roman Mamedov <r...@romanrm.net> wrote:

> On Tue, 14 Jan 2014 14:05:11 -0700
> Chris Murphy <li...@colorremedies.com> wrote:
> 
>> 
>> On Jan 14, 2014, at 12:37 PM, Roman Mamedov <r...@romanrm.net> wrote:
>>> 
>>> I vaguely remember having some drives that were not able to remap a single
>>> block on write, but doing that successfully if I overwrote a sizable area
>>> around (and including) that block, or overwrite the whole drive. And after
>>> that they worked without issue not exhibiting further bad blocks.
>> 
>> Presumably the SMART self-assessment for this drive was FAIL? 
> 
> No of course not, why?

Reserve sectors are fundamental to ECC. If there are no more reserves, the 
status should be a failed drive, it can no longer do its own relocation of data 
experiencing transient read errors in this case.

> SMART goes to FAIL only if one of the attributes falls
> below threshold, in this case that would be Reallocated Sector Count having
> too much sectors. But nope, it either had zero, or in single-digit numbers.

It sounds like we aren't talking about the same thing. I'm considering 
persistent write failure as a result of no more reserve sectors being available.

> I don't ever remember seeing a SMART FAIL drive that would function in any
> usual sense of that word.

Oh I have. For a week a drive was reporting failure imminent before we got 
around to replacing it. It hadn't actually failed at that time still.

>> And if so what's the point of the work around when we only have a pass/fail
>> level granularity for drive health?
> 
> Not sure what you're referring to here. As said above, the FAIL/PASS status is
> largely useless, and the more important indicators are the values and dynamics
> in Reallocated sector count, Current pending sectors, Reported uncorrectable,
> etc.

Well, not totally useless, if it flags the user with an hour's notice in Gnome, 
they can do some minimal backup. I've seen that happen on OS X Server (client 
doesn't produce SMART warnings in user space).

> 
>> a way to send a command to the firmware to persistently increase the reserve
>> sectors at the expensive of available space - in effect it reduces the LBA
>> count by e.g. 10MB, thereby increasing the reserve pool by 10MB.
> 
> Yes please that, and also a pony. :)

That seems a lot easier to implement than anything else being discussed. 

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How does btrfs handle bad blocks in raid1?

Reply via email to