On Mon, Feb 04, 2013 at 01:03:07AM +1100, Joel Sing wrote: > On Mon, 4 Feb 2013, Erling Westenvik wrote: > > On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote: > > > I hate to say it but I am sure your hard disk is dying. Replace it > > > ASAP > > > > No no, that's all right. Death is an inevitable part of life. I know > > the disk is dying and I'm going to replace it (or just throw away > > the machine which is a piece of junk anyway) but I'd love to get out > > of it the amendments to it's last will before it passes out > > completely. > > > > When a NON-ENCRYPTED disk has damaged areas one may still be able to > > access the undamaged areas upon a reboot - possibly by mounting it > > as a secondary disk on a working system and using various recovery > > tools, etc. > > > > However: the last time I had an ENCRYPTED disk with damaged areas, > > the whole disk got rendered useless. It wouldn't respond to > > keydisk/passphrase and hence there was no way to access "undamaged" > > data. > > > > The machine is still powered on. It still return ping but not ssh. > > When typing on the keyboard, characters get echo'ed on the screen. > > Do I have any options besides rebooting and praying? > > None. Well, aside from a custom kernel. > > One of the current "features" with softraid (regardless of discipline) > is that if a drive reports an I/O error, we mark the given chunk as > being offline. In the case of disciplines that have redundant data, > this is exactly what we want, since it should force failover to an > online chunk. However, in the case of disciplines that do not have > dedundancy, the single chunk failure results in the entire volume > going offline. > > I suspect this is what has happened. You have not mentioned how the > crypto volume is used, however I'm going to guess that you either have > your entire system on it, or at least some critical parts of your > system. Since it has gone offline things have stopped working and > there is no way to recover from this without rebooting. > > I plan on changing softraid so that disciplines without redundant data > simply pass the failure from the underlying chunk up to userland, but > leave the volume state alone - after all, you can attempt to recover > data from a online volume, which is much more useful than losing the > lot in one hit.
Ok, I'm getting it. Thanks. I always seem to forget to mention something important. Sorry for that. The setup is based on an article on undeadly.org by Stephan Sperling: http://undeadly.org/cgi?action=article&sid=20110530221728 That's a fdisk partition spanning the whole of one physical disk (wd0) and three disklabel partitions a, b and d on that with partition d being the crypto volume and keying material stored on an USB key disk. On a couple of other encrypted machines I have, I've startet to use the new boot code (which workes great but which I so far haven't been able to make work with a key disk). Hopefully some of your comments above - especially the last paragraph about volumes going offline - will make it into the relevant documentation. I suspect problems like mine are likely to arise more frequently as more and more people will start to use softraid.