Re: Advice for handling softraid reporting i/o error

2013-02-04 Thread Erling Westenvik
Thank you to everyone @misc that provided support and advice. Especially
Joel and Barry.

It turned out that the machine came back on quite fine after rebooting
and without having to perform anything special. It went through fsck
without complaining and I've been able to make backups of my small
pieces of precious data. I will continue to use the machine for
experiments and learning only, however.

Sorry for the noise. If it was not for the softraid specific complaints
from the machine, I would not have taken the issue up here @misc.

Cheers,

Erling


On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote:
 I have an old laptop configured with softraid encryption using a USB
 keydisk. The machine was never intended to be used for anything more
 than just testing. However, I started putting a few cvs repositories
 on it and slowly the machine became somewhat important.
 
 Today, when doing a cvs import of a little programming project on my
 web server, the ssh connection died in the middle of the transfer. I
 have not tried to restart it. This is whats on the screen right now.
 
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 3187832; cn 820 tn 230 sn 42), retrying
 wd0: transfer error, downgrading to Ultra-DMA mode 4
 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
 wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
 wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
 softraid0: i/o error on block 6890352
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 
 Kind of self explaining: old machine with faulty disk! I do have backups
 but would like to have a copy of some recent commits.
 
 Switching console gives me a login prompt but after entering a user name
 and pressing enter the machine just hangs. The machine will answer to
 ping but not ssh.
 
 My question is:
 
 Do I have any options other than trying to reboot? Optionally into
 single user mode?
 
 Cheers,
 
 Erling



Re: Advice for handling softraid reporting i/o error

2013-02-03 Thread Erling Westenvik
On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote:
 I hate to say it but I am sure your hard disk is dying. Replace it ASAP

No no, that's all right. Death is an inevitable part of life. I know the
disk is dying and I'm going to replace it (or just throw away the
machine which is a piece of junk anyway) but I'd love to get out of it
the amendments to it's last will before it passes out completely.

When a NON-ENCRYPTED disk has damaged areas one may still be able to
access the undamaged areas upon a reboot - possibly by mounting it as a
secondary disk on a working system and using various recovery tools,
etc.

However: the last time I had an ENCRYPTED disk with damaged areas, the
whole disk got rendered useless. It wouldn't respond to
keydisk/passphrase and hence there was no way to access undamaged
data.

The machine is still powered on. It still return ping but not ssh. When
typing on the keyboard, characters get echo'ed on the screen. Do I have
any options besides rebooting and praying?

 On Sun, Feb 3, 2013 at 5:43 AM, Erling Westenvik
 erling.westen...@gmail.com wrote:
  I have an old laptop configured with softraid encryption using a USB
  keydisk. The machine was never intended to be used for anything more
  than just testing. However, I started putting a few cvs repositories
  on it and slowly the machine became somewhat important.
 
  Today, when doing a cvs import of a little programming project on my
  web server, the ssh connection died in the middle of the transfer. I
  have not tried to restart it. This is whats on the screen right now.
 
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  3187832; cn 820 tn 230 sn 42), retrying
  wd0: transfer error, downgrading to Ultra-DMA mode 4
  wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
  wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
  (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
  wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
  (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
  wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
  (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
  softraid0: i/o error on block 6890352
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 
  Kind of self explaining: old machine with faulty disk! I do have backups
  but would like to have a copy of some recent commits.
 
  Switching console gives me a login prompt but after entering a user name
  and pressing enter the machine just hangs. The machine will answer to
  ping but not ssh.
 
  My question is:
 
  Do I have any options other than trying to reboot? Optionally into
  single user mode?
 
  Cheers,
 
  Erling



Re: Advice for handling softraid reporting i/o error

2013-02-03 Thread Joel Sing
On Mon, 4 Feb 2013, Erling Westenvik wrote:
 On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote:
  I hate to say it but I am sure your hard disk is dying. Replace it ASAP

 No no, that's all right. Death is an inevitable part of life. I know the
 disk is dying and I'm going to replace it (or just throw away the
 machine which is a piece of junk anyway) but I'd love to get out of it
 the amendments to it's last will before it passes out completely.

 When a NON-ENCRYPTED disk has damaged areas one may still be able to
 access the undamaged areas upon a reboot - possibly by mounting it as a
 secondary disk on a working system and using various recovery tools,
 etc.

 However: the last time I had an ENCRYPTED disk with damaged areas, the
 whole disk got rendered useless. It wouldn't respond to
 keydisk/passphrase and hence there was no way to access undamaged
 data.

 The machine is still powered on. It still return ping but not ssh. When
 typing on the keyboard, characters get echo'ed on the screen. Do I have
 any options besides rebooting and praying?

None. Well, aside from a custom kernel.

One of the current features with softraid (regardless of discipline) is that 
if a drive reports an I/O error, we mark the given chunk as being offline. In 
the case of disciplines that have redundant data, this is exactly what we 
want, since it should force failover to an online chunk. However, in the case 
of disciplines that do not have dedundancy, the single chunk failure results 
in the entire volume going offline.

I suspect this is what has happened. You have not mentioned how the crypto 
volume is used, however I'm going to guess that you either have your entire 
system on it, or at least some critical parts of your system. Since it has 
gone offline things have stopped working and there is no way to recover from 
this without rebooting.

I plan on changing softraid so that disciplines without redundant data simply 
pass the failure from the underlying chunk up to userland, but leave the 
volume state alone - after all, you can attempt to recover data from a online 
volume, which is much more useful than losing the lot in one hit.

  On Sun, Feb 3, 2013 at 5:43 AM, Erling Westenvik
 
  erling.westen...@gmail.com wrote:
   I have an old laptop configured with softraid encryption using a USB
   keydisk. The machine was never intended to be used for anything more
   than just testing. However, I started putting a few cvs repositories
   on it and slowly the machine became somewhat important.
  
   Today, when doing a cvs import of a little programming project on my
   web server, the ssh connection died in the middle of the transfer. I
   have not tried to restart it. This is whats on the screen right now.
  
   -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
   3187832; cn 820 tn 230 sn 42), retrying
   wd0: transfer error, downgrading to Ultra-DMA mode 4
   wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
   wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
   (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
   wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
   (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
   wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
   (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
   softraid0: i/o error on block 6890352
   -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  
   Kind of self explaining: old machine with faulty disk! I do have
   backups but would like to have a copy of some recent commits.
  
   Switching console gives me a login prompt but after entering a user
   name and pressing enter the machine just hangs. The machine will answer
   to ping but not ssh.
  
   My question is:
  
   Do I have any options other than trying to reboot? Optionally into
   single user mode?
  
   Cheers,
  
   Erling



-- 

Reason is not automatic. Those who deny it cannot be conquered by it.
 Do not count on them. Leave them alone. -- Ayn Rand



Re: Advice for handling softraid reporting i/o error

2013-02-03 Thread Erling Westenvik
On Mon, Feb 04, 2013 at 01:03:07AM +1100, Joel Sing wrote:
 On Mon, 4 Feb 2013, Erling Westenvik wrote:
  On Sun, Feb 03, 2013 at 11:11:17AM +0530, Girish Venkatachalam wrote:
   I hate to say it but I am sure your hard disk is dying. Replace it
   ASAP
 
  No no, that's all right. Death is an inevitable part of life. I know
  the disk is dying and I'm going to replace it (or just throw away
  the machine which is a piece of junk anyway) but I'd love to get out
  of it the amendments to it's last will before it passes out
  completely.
 
  When a NON-ENCRYPTED disk has damaged areas one may still be able to
  access the undamaged areas upon a reboot - possibly by mounting it
  as a secondary disk on a working system and using various recovery
  tools, etc.
 
  However: the last time I had an ENCRYPTED disk with damaged areas,
  the whole disk got rendered useless. It wouldn't respond to
  keydisk/passphrase and hence there was no way to access undamaged
  data.
 
  The machine is still powered on. It still return ping but not ssh.
  When typing on the keyboard, characters get echo'ed on the screen.
  Do I have any options besides rebooting and praying?
 
 None. Well, aside from a custom kernel.
 
 One of the current features with softraid (regardless of discipline)
 is that if a drive reports an I/O error, we mark the given chunk as
 being offline. In the case of disciplines that have redundant data,
 this is exactly what we want, since it should force failover to an
 online chunk. However, in the case of disciplines that do not have
 dedundancy, the single chunk failure results in the entire volume
 going offline.
 
 I suspect this is what has happened. You have not mentioned how the
 crypto volume is used, however I'm going to guess that you either have
 your entire system on it, or at least some critical parts of your
 system. Since it has gone offline things have stopped working and
 there is no way to recover from this without rebooting.
 
 I plan on changing softraid so that disciplines without redundant data
 simply pass the failure from the underlying chunk up to userland, but
 leave the volume state alone - after all, you can attempt to recover
 data from a online volume, which is much more useful than losing the
 lot in one hit.

Ok, I'm getting it. Thanks. I always seem to forget to mention something
important. Sorry for that. The setup is based on an article on
undeadly.org by Stephan Sperling:

http://undeadly.org/cgi?action=articlesid=20110530221728

That's a fdisk partition spanning the whole of one physical disk (wd0)
and three disklabel partitions a, b and d on that with partition d being
the crypto volume and keying material stored on an USB key disk.

On a couple of other encrypted machines I have, I've startet to use the
new boot code (which workes great but which I so far haven't been able
to make work with a key disk).

Hopefully some of your comments above - especially the last paragraph
about volumes going offline - will make it into the relevant
documentation. I suspect problems like mine are likely to arise more
frequently as more and more people will start to use softraid.



Advice for handling softraid reporting i/o error

2013-02-02 Thread Erling Westenvik
I have an old laptop configured with softraid encryption using a USB
keydisk. The machine was never intended to be used for anything more
than just testing. However, I started putting a few cvs repositories
on it and slowly the machine became somewhat important.

Today, when doing a cvs import of a little programming project on my
web server, the ssh connection died in the middle of the transfer. I
have not tried to restart it. This is whats on the screen right now.

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
3187832; cn 820 tn 230 sn 42), retrying
wd0: transfer error, downgrading to Ultra-DMA mode 4
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
(wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
(wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
(wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
softraid0: i/o error on block 6890352
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Kind of self explaining: old machine with faulty disk! I do have backups
but would like to have a copy of some recent commits.

Switching console gives me a login prompt but after entering a user name
and pressing enter the machine just hangs. The machine will answer to
ping but not ssh.

My question is:

Do I have any options other than trying to reboot? Optionally into
single user mode?

Cheers,

Erling



Re: Advice for handling softraid reporting i/o error

2013-02-02 Thread Erling Westenvik
Oh, and the machine runs 5.1 or 5.2 release...

On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote:
 I have an old laptop configured with softraid encryption using a USB
 keydisk. The machine was never intended to be used for anything more
 than just testing. However, I started putting a few cvs repositories
 on it and slowly the machine became somewhat important.
 
 Today, when doing a cvs import of a little programming project on my
 web server, the ssh connection died in the middle of the transfer. I
 have not tried to restart it. This is whats on the screen right now.
 
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 3187832; cn 820 tn 230 sn 42), retrying
 wd0: transfer error, downgrading to Ultra-DMA mode 4
 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
 wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
 (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
 wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
 wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
 (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
 softraid0: i/o error on block 6890352
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 
 Kind of self explaining: old machine with faulty disk! I do have backups
 but would like to have a copy of some recent commits.
 
 Switching console gives me a login prompt but after entering a user name
 and pressing enter the machine just hangs. The machine will answer to
 ping but not ssh.
 
 My question is:
 
 Do I have any options other than trying to reboot? Optionally into
 single user mode?
 
 Cheers,
 
 Erling



Re: Advice for handling softraid reporting i/o error

2013-02-02 Thread Erling Westenvik
On Sat, Feb 02, 2013 at 07:23:03PM -0600, Amit Kulkarni wrote:
 post a exact dmesg i.e atleast the head -10 (top 10 lines). softraid guys
 (jsing@) will then know what changed exactly and when.

Thanks. The reason I ask is because I have no way of getting into ddb
unless I reboot - which is what I try to avoid if there exists other
ways into the system in it's current powered on state.

The last time I had a hardware failure on an encrypted disk it rendered
the whole disk useless. Does the output I provided below indicate
whether the failure affects vital system blocks?

Sorry if my questions are stupid. I'm trying my best here.

 
 On Sat, Feb 2, 2013 at 6:16 PM, Erling Westenvik erling.westen...@gmail.com
  wrote:
 
  Oh, and the machine runs 5.1 or 5.2 release...
 
  On Sun, Feb 03, 2013 at 01:13:16AM +0100, Erling Westenvik wrote:
   I have an old laptop configured with softraid encryption using a USB
   keydisk. The machine was never intended to be used for anything more
   than just testing. However, I started putting a few cvs repositories
   on it and slowly the machine became somewhat important.
  
   Today, when doing a cvs import of a little programming project on my
   web server, the ssh connection died in the middle of the transfer. I
   have not tried to restart it. This is whats on the screen right now.
  
   -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
   3187832; cn 820 tn 230 sn 42), retrying
   wd0: transfer error, downgrading to Ultra-DMA mode 4
   wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
   wd0d: uncorrectable data error reading fsbn 6890352 of 6890352-6890479
   (wd0 bn 1 3187832; cn 820 tn 230 sn 42), retrying
   wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
   (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
   wd0d: uncorrectable data error reading fsbn 6890391 of 6890352-6890479
   (wd0 bn 1 3187871; cn 820 tn 231 sn 18), retrying
   softraid0: i/o error on block 6890352
   -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  
   Kind of self explaining: old machine with faulty disk! I do have backups
   but would like to have a copy of some recent commits.
  
   Switching console gives me a login prompt but after entering a user name
   and pressing enter the machine just hangs. The machine will answer to
   ping but not ssh.
  
   My question is:
  
   Do I have any options other than trying to reboot? Optionally into
   single user mode?
  
   Cheers,
  
   Erling