Martin Monperrus posted on Fri, 24 Apr 2015 19:44:47 +0200 as excerpted: > Hi Duncan, > >> The kernel log (dmesg, also logged to syslog/journald on most systems) >> from during the scrub should capture more information on those errors. > Thanks. The dmesg log indeed contains the file path (see below). > > The error is in /home/martin/XXXXX. It is related to a low-level error > ("failed command: READ DMA"). > > Beyond this corrupted file, is my disk dead? > Can I repair the file system or re-create a new one on the same disk?
A direct answer is beyond my knowledge level, certainly without SMART status information, etc. What I do know is that assuming the rest of the device is responding fine, most drives keep a number of reserved sectors available and will automatically substitute them in on a *write* to an affected dead sector. So if the device in general appears to be working fine, and assuming the SMART status still passes, I'd backup everything else on that partition, unmount it, then do something like a badblocks destructive write (-w) test to the partition. If it comes back clean, I'd consider the device usable again. Also note that if you run smartctl -A (attributes) on the device before attempting anything else and check the raw value for ID 5 (reallocated sector count), then check again after doing something like that badblocks -w, you can see if it actually relocated any sectors. Finally, note that while it's possible to have a one-off, once a drive starts reallocating sectors it often fails relatively quickly as that can indicate a failing media layer and once it starts to go, often it doesn't stop. So once you see that value move from zero, do keep an eye on it and if you notice the value starting to climb, get the data off that thing as soon as possible. And of course it should go without saying, but I'll repeat the sysadmin's data value rule of thumb anyway, for the benefit of others reading as well. If you care about the data, by definition, you have a (tested) backup (a corollary rule states that an untested backup isn't a backup at all). If you don't have a backup, by definition you do NOT care about that data, /regardless/ of any claims to the contrary. Unfortunately, many (most?) people end up learning this the hard way, finding out too late how much more value the data had than they thought, and thus that they /should/ have cared about it more (more backups, more testing of them) than they did. (For those who end up in that situation...) On the flip side there's the big picture. During hurricane Katrina a data hosting firm in New Orleans made (tech) headlines by blogging live their struggle to stay powered and online. I was one of thousands watching that, along with the mainstream news about the flooding, looting and dying going on. Obviously losing a bit of data ends up pretty far down the list when you're wet and cold and just lost your house and possibly members of your family! A bit of data loss might hurt a bit, but in the big picture, if you're still healthy, and have a job and a home and family, it's /not/ the end of the world. A bit of perspective helps! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html