Re: Any hope of pool recovery?

Chris Murphy Wed, 01 Jul 2015 19:32:47 -0700

On Wed, Jul 1, 2015 at 7:38 PM, Donald Pearson
<donaldwhpear...@gmail.com> wrote:


> Here's the drive vomiting in my logs after it got halfway through the
> dd image attempt.
>
> Jul  1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jul  1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] Sense Key : Medium
> Error [current]
> Jul  1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] Add. Sense:
> Unrecovered read error
> Jul  1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a
> 5b f1 e0 00 01 00 00
> Jul  1 17:05:51 san01 kernel: blk_update_request: critical medium
> error, dev sdg, sector 1515975136
> Jul  1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jul  1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] Sense Key : Medium
> Error [current]
> Jul  1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] Add. Sense:
> Unrecovered read error
> Jul  1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a
> 5b f2 e0 00 01 00 00

This looks like a typical URE. There are a number of reasons why a
sector can be bad, but basically the drive ECC has given up being able
to correct the problem, and it reports the command, the error, and the
sector involved. What *should* happen is Btrfs reconstructs the data
(or metadata) on that sector, and then writes it (since kernel 3.19)
back to the bad sector LBA. The drive tries to write to that bad
sector, and verifies. If there is a persistent failure then that LBA
is mapped to a different physical sector and the bad one is removed
(has no LBA) - there will be no kernel messages for this it's all
handled in the drive itself.

But this sounds like a dd read of the raw device, where Btrfs is not
involved (because you can't mount the volume) so none of this
correction happens. What I wonder though it in the much earlier logs,
if this same problem happened when the volume was mounted, did Btrfs
try to fix the problem and were there problems fixing it?

So it might be useful if there's something in /var/log/messages or
journalctl -bX at the time the original problem was first developing.

Bad sectors are completely ordinary. They're not really common, out of
maybe 50 drives I've had two exhibit this. But the drive's are
designed to take this into account, and so are hardware, and linux
kernel md raid, and LVM raid, and Btrfs, and ZFS. So... it's kinda
important to know more about this edge case to find out where the
problem is.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Any hope of pool recovery?

Reply via email to