On Wed, Jul 1, 2015 at 7:38 PM, Donald Pearson <donaldwhpear...@gmail.com> wrote:
> Here's the drive vomiting in my logs after it got halfway through the > dd image attempt. > > Jul 1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] FAILED Result: > hostbyte=DID_OK driverbyte=DRIVER_SENSE > Jul 1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] Sense Key : Medium > Error [current] > Jul 1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] Add. Sense: > Unrecovered read error > Jul 1 17:05:51 san01 kernel: sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a > 5b f1 e0 00 01 00 00 > Jul 1 17:05:51 san01 kernel: blk_update_request: critical medium > error, dev sdg, sector 1515975136 > Jul 1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] FAILED Result: > hostbyte=DID_OK driverbyte=DRIVER_SENSE > Jul 1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] Sense Key : Medium > Error [current] > Jul 1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] Add. Sense: > Unrecovered read error > Jul 1 17:05:57 san01 kernel: sd 0:0:6:0: [sdg] CDB: Read(10) 28 00 5a > 5b f2 e0 00 01 00 00 This looks like a typical URE. There are a number of reasons why a sector can be bad, but basically the drive ECC has given up being able to correct the problem, and it reports the command, the error, and the sector involved. What *should* happen is Btrfs reconstructs the data (or metadata) on that sector, and then writes it (since kernel 3.19) back to the bad sector LBA. The drive tries to write to that bad sector, and verifies. If there is a persistent failure then that LBA is mapped to a different physical sector and the bad one is removed (has no LBA) - there will be no kernel messages for this it's all handled in the drive itself. But this sounds like a dd read of the raw device, where Btrfs is not involved (because you can't mount the volume) so none of this correction happens. What I wonder though it in the much earlier logs, if this same problem happened when the volume was mounted, did Btrfs try to fix the problem and were there problems fixing it? So it might be useful if there's something in /var/log/messages or journalctl -bX at the time the original problem was first developing. Bad sectors are completely ordinary. They're not really common, out of maybe 50 drives I've had two exhibit this. But the drive's are designed to take this into account, and so are hardware, and linux kernel md raid, and LVM raid, and Btrfs, and ZFS. So... it's kinda important to know more about this edge case to find out where the problem is. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html