Re: Btrfs suddenly unmountable, open_ctree failed

Chris Murphy Mon, 23 Jun 2014 21:16:20 -0700

On Jun 23, 2014, at 8:58 PM, Mike Hartman <m...@hartmanipulation.com> wrote:


> I have a dd image, but not a btrfs-image. I ran the btrfs-image
> command, but it threw the same errors as everything else and generated
> a 0 byte file.
> 
> I agree that it SOUNDS like some kind of media failure, but if so it
> seems odd to me that I was able to dd the entire partition with no
> read errors. Even if there was something wrong with the drive that
> prevented writing you'd think the ability to read it all would result
> in a recoverable image.

I've read of too many SSD failure cases to trust a graceful failure of an SSD. 
I guess I don't really trust an HDD either but at least they don't self 
destruct upon reaching end of life as apparently some SSDs do:
http://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte

Anyway, it could be ECC failure where it says pass but it's actually corrupt, 
in which case it's silent data corruption which neither triggers ECC errors or 
read failures. You just get bad data. And really bad luck if this happens with 
Btrfs metadata that isn't DUP but is fundamental for mounting and/or repairing 
the system so it can be mounted.

Of course it could just be a bug so it's worth trying David's integration 
branch.


        • Firmware Version: 0006

Firmware 0007 is current for this SSD.


        • 173 Wear_Leveling_Count     PO--CK   086   086   000    -    728
        •   1  0x018  6      49304625928  Logical Sectors Written
        • 202 Perc_Rated_Life_Used    ---RC-   086   086   000    -    14

Those are all reasonable.

        • 181 Non4k_Aligned_Access    -O---K   100   100   000    -    36 0 35

Probably unrelated, but that's a curious attribute and value.

        • 199 UDMA_CRC_Error_Count    -OS-CK   100   100   000    -    15

That's not good in that it means interface problems have happened at some 
point. But they can happen and just not get caught, which results in 
corruption. Drive ECC will not correct these problems. So how many good writes 
on the way to the drive but were corrupted by the time they got there? 
Obviously there's no way to know this from available information.

 6  0x008  4             9821  Number of Hardware Resets

Why is the hardware being reset so many times?


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs suddenly unmountable, open_ctree failed

Reply via email to