Martin Monperrus posted on Fri, 24 Apr 2015 19:44:47 +0200 as excerpted:

> Hi Duncan,
> 
>> The kernel log (dmesg, also logged to syslog/journald on most systems)
>> from during the scrub should capture more information on those errors.
> Thanks. The dmesg log indeed contains the file path (see below).
> 
> The error is in /home/martin/XXXXX. It is related to a low-level error
> ("failed command: READ DMA").
> 
> Beyond this corrupted file, is my disk dead?
> Can I repair the file system or re-create a new one on the same disk?

A direct answer is beyond my knowledge level, certainly without SMART 
status information, etc.  What I do know is that assuming the rest of the 
device is responding fine, most drives keep a number of reserved sectors 
available and will automatically substitute them in on a *write* to an 
affected dead sector.

So if the device in general appears to be working fine, and assuming the 
SMART status still passes, I'd backup everything else on that partition, 
unmount it, then do something like a badblocks destructive write (-w) 
test to the partition.  If it comes back clean, I'd consider the device 
usable again.

Also note that if you run smartctl -A (attributes) on the device before 
attempting anything else and check the raw value for ID 5 (reallocated 
sector count), then check again after doing something like that badblocks 
-w, you can see if it actually relocated any sectors.  Finally, note that 
while it's possible to have a one-off, once a drive starts reallocating 
sectors it often fails relatively quickly as that can indicate a failing 
media layer and once it starts to go, often it doesn't stop.  So once you 
see that value move from zero, do keep an eye on it and if you notice the 
value starting to climb, get the data off that thing as soon as possible.

And of course it should go without saying, but I'll repeat the sysadmin's 
data value rule of thumb anyway, for the benefit of others reading as 
well.  If you care about the data, by definition, you have a (tested) 
backup (a corollary rule states that an untested backup isn't a backup at 
all).  If you don't have a backup, by definition you do NOT care about 
that data, /regardless/ of any claims to the contrary.  Unfortunately, 
many (most?) people end up learning this the hard way, finding out too 
late how much more value the data had than they thought, and thus that 
they /should/ have cared about it more (more backups, more testing of 
them) than they did.

(For those who end up in that situation...)  On the flip side there's the 
big picture.  During hurricane Katrina a data hosting firm in New Orleans 
made (tech) headlines by blogging live their struggle to stay powered and 
online.  I was one of thousands watching that, along with the mainstream 
news about the flooding, looting and dying going on.  Obviously losing a 
bit of data ends up pretty far down the list when you're wet and cold and 
just lost your house and possibly members of your family!  A bit of data 
loss might hurt a bit, but in the big picture, if you're still healthy, 
and have a job and a home and family, it's /not/ the end of the world.  A 
bit of perspective helps! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to