Re: Advice for handling softraid reporting i/o error
Thank you to everyone @misc that provided support and advice. Especially Joel and Barry. It turned out that the machine came back on quite fine after rebooting and without having to perform anything special. It went through fsck without complaining and I've been able to make backups of my small pieces of precious data. I will continue to use the machine for experiments and learning only, however. Sorry for the noise. If it was not for the softraid specific complaints from the machine, I would not have taken the issue up here @misc. Cheers, Erling On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote: I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling
Re: Advice for handling softraid reporting i/o error
On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote: I hate to say it but I am sure your hard disk is dying. Replace it ASAP No no, that's all right. Death is an inevitable part of life. I know the disk is dying and I'm going to replace it (or just throw away the machine which is a piece of junk anyway) but I'd love to get out of it the amendments to it's last will before it passes out completely. When a NON-ENCRYPTED disk has damaged areas one may still be able to access the undamaged areas upon a reboot - possibly by mounting it as a secondary disk on a working system and using various recovery tools, etc. However: the last time I had an ENCRYPTED disk with damaged areas, the whole disk got rendered useless. It wouldn't respond to keydisk/passphrase and hence there was no way to access undamaged data. The machine is still powered on. It still return ping but not ssh. When typing on the keyboard, characters get echo'ed on the screen. Do I have any options besides rebooting and praying? On Sun, Feb 3, 2013 at 5:43 AM, Erling Westenvik erling.westen...@gmail.com wrote: I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling
Re: Advice for handling softraid reporting i/o error
On Mon, 4 Feb 2013, Erling Westenvik wrote: On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote: I hate to say it but I am sure your hard disk is dying. Replace it ASAP No no, that's all right. Death is an inevitable part of life. I know the disk is dying and I'm going to replace it (or just throw away the machine which is a piece of junk anyway) but I'd love to get out of it the amendments to it's last will before it passes out completely. When a NON-ENCRYPTED disk has damaged areas one may still be able to access the undamaged areas upon a reboot - possibly by mounting it as a secondary disk on a working system and using various recovery tools, etc. However: the last time I had an ENCRYPTED disk with damaged areas, the whole disk got rendered useless. It wouldn't respond to keydisk/passphrase and hence there was no way to access undamaged data. The machine is still powered on. It still return ping but not ssh. When typing on the keyboard, characters get echo'ed on the screen. Do I have any options besides rebooting and praying? None. Well, aside from a custom kernel. One of the current features with softraid (regardless of discipline) is that if a drive reports an I/O error, we mark the given chunk as being offline. In the case of disciplines that have redundant data, this is exactly what we want, since it should force failover to an online chunk. However, in the case of disciplines that do not have dedundancy, the single chunk failure results in the entire volume going offline. I suspect this is what has happened. You have not mentioned how the crypto volume is used, however I'm going to guess that you either have your entire system on it, or at least some critical parts of your system. Since it has gone offline things have stopped working and there is no way to recover from this without rebooting. I plan on changing softraid so that disciplines without redundant data simply pass the failure from the underlying chunk up to userland, but leave the volume state alone - after all, you can attempt to recover data from a online volume, which is much more useful than losing the lot in one hit. On Sun, Feb 3, 2013 at 5:43 AM, Erling Westenvik erling.westen...@gmail.com wrote: I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling -- Reason is not automatic. Those who deny it cannot be conquered by it. Do not count on them. Leave them alone. -- Ayn Rand
Re: Advice for handling softraid reporting i/o error
On Mon, Feb 04, 2013 at 01:03:07AM +1100, Joel Sing wrote: On Mon, 4 Feb 2013, Erling Westenvik wrote: On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote: I hate to say it but I am sure your hard disk is dying. Replace it ASAP No no, that's all right. Death is an inevitable part of life. I know the disk is dying and I'm going to replace it (or just throw away the machine which is a piece of junk anyway) but I'd love to get out of it the amendments to it's last will before it passes out completely. When a NON-ENCRYPTED disk has damaged areas one may still be able to access the undamaged areas upon a reboot - possibly by mounting it as a secondary disk on a working system and using various recovery tools, etc. However: the last time I had an ENCRYPTED disk with damaged areas, the whole disk got rendered useless. It wouldn't respond to keydisk/passphrase and hence there was no way to access undamaged data. The machine is still powered on. It still return ping but not ssh. When typing on the keyboard, characters get echo'ed on the screen. Do I have any options besides rebooting and praying? None. Well, aside from a custom kernel. One of the current features with softraid (regardless of discipline) is that if a drive reports an I/O error, we mark the given chunk as being offline. In the case of disciplines that have redundant data, this is exactly what we want, since it should force failover to an online chunk. However, in the case of disciplines that do not have dedundancy, the single chunk failure results in the entire volume going offline. I suspect this is what has happened. You have not mentioned how the crypto volume is used, however I'm going to guess that you either have your entire system on it, or at least some critical parts of your system. Since it has gone offline things have stopped working and there is no way to recover from this without rebooting. I plan on changing softraid so that disciplines without redundant data simply pass the failure from the underlying chunk up to userland, but leave the volume state alone - after all, you can attempt to recover data from a online volume, which is much more useful than losing the lot in one hit. Ok, I'm getting it. Thanks. I always seem to forget to mention something important. Sorry for that. The setup is based on an article on undeadly.org by Stephan Sperling: http://undeadly.org/cgi?action=articlesid=20110530221728 That's a fdisk partition spanning the whole of one physical disk (wd0) and three disklabel partitions a, b and d on that with partition d being the crypto volume and keying material stored on an USB key disk. On a couple of other encrypted machines I have, I've startet to use the new boot code (which workes great but which I so far haven't been able to make work with a key disk). Hopefully some of your comments above - especially the last paragraph about volumes going offline - will make it into the relevant documentation. I suspect problems like mine are likely to arise more frequently as more and more people will start to use softraid.
Advice for handling softraid reporting i/o error
I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling
Re: Advice for handling softraid reporting i/o error
Oh, and the machine runs 5.1 or 5.2 release... On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote: I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling
Re: Advice for handling softraid reporting i/o error
On Sat, Feb 02, 2013 at 07:23:03PM -0600, Amit Kulkarni wrote: post a exact dmesg i.e atleast the head -10 (top 10 lines). softraid guys (jsing@) will then know what changed exactly and when. Thanks. The reason I ask is because I have no way of getting into ddb unless I reboot - which is what I try to avoid if there exists other ways into the system in it's current powered on state. The last time I had a hardware failure on an encrypted disk it rendered the whole disk useless. Does the output I provided below indicate whether the failure affects vital system blocks? Sorry if my questions are stupid. I'm trying my best here. On Sat, Feb 2, 2013 at 6:16 PM, Erling Westenvik erling.westen...@gmail.com wrote: Oh, and the machine runs 5.1 or 5.2 release... On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote: I have an old laptop configured with softraid encryption using a USB keydisk. The machine was never intended to be used for anything more than just testing. However, I started putting a few cvs repositories on it and slowly the machine became somewhat important. Today, when doing a cvs import of a little programming project on my web server, the ssh connection died in the middle of the transfer. I have not tried to restart it. This is whats on the screen right now. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 3187832; cn 820 tn 230 sn 42), retrying wd0: transfer error, downgrading to Ultra-DMA mode 4 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying softraid0: i/o error on block 6890352 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Kind of self explaining: old machine with faulty disk! I do have backups but would like to have a copy of some recent commits. Switching console gives me a login prompt but after entering a user name and pressing enter the machine just hangs. The machine will answer to ping but not ssh. My question is: Do I have any options other than trying to reboot? Optionally into single user mode? Cheers, Erling