softraid mirror I/O error, cannot recover data

2013-04-04 Thread Juha Erkkila
Hello misc,

I am having a problem with a mirroring softraid configuration.
Every time I try to access a particular partition in softraid
volume I start to get I/O errors so that softraid totally "breaks",
that is, becomes non-operative.

Note that it does very much look like that both of these
disks are actually defective.

$ disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: WDC WD3200AAKS-0
duid: 800258c4df44247f
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 38913
total sectors: 625142448
boundstart: 63
boundend: 625137345
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:   530082   63  4.2BSD   2048 163841 # /
  b:  8401995   530145swap   
  c:6251424480  unused   
  d:616205205  8932140RAID   

$ disklabel sd1
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: WDC WD3200AAKS-0
duid: ec6572ba6b33d733
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 38913
total sectors: 625142448
boundstart: 63
boundend: 625137345
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:   530082   63  4.2BSD   2048 163841 
  b:  8401995   530145swap   
  c:6251424480  unused   
  d:616205205  8932140RAID   

The situation now:

$ bioctl -i sd3
Volume  Status   Size Device  
softraid0 0 Degraded 315496794624 sd3 RAID1
  0 Online   315496794624 0:0.0   noencl 
  1 Offline 0 0:1.0   noencl 

$ disklabel sd3
# /dev/rsd3c:
type: SCSI
disk: SCSI disk
label: SR RAID 1
duid: 80ebde8831825f63
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 38356
total sectors: 616204677
boundstart: 0
boundend: 616204677
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  c:6162046770  unused   
  d: 125949440  4.2BSD   2048 163841 # /tmp
  e:125821088 12594944  4.2BSD   2048 163841 # /usr
  f:  8385920138416032  4.2BSD   2048 163841 # /var
  g:125821088146801952  4.2BSD   2048 163841 # /home
  h:343581632272623040  4.2BSD   4096 327681 


I can use the partitions /dev/sd3{d,e,f,g} just fine.  However, whenever
I try to do any of the following:

  * fsck /dev/sd3h
  * dd if=/dev/sd3h of=/dev/null
  * dump /dev/sd3h
  * bioctl -R /dev/someotherraidvolume sd3

after a few seconds I will get I/O errors from the softraid volume.
These errors are such that after it *any* operation to sd3,
for example "bioctl -i sd3", or "disklabel sd3" will simply
return an I/O error, and the system must be rebooted.

Note that I said that I think both disks are defective.  When
the system is booted with softraid disabled and then the disks
(sd0 and sd1) are read with
"dd if=/dev/rsd0c of=/dev/null conv=noerror,sync",
dd reports several I/O errors (a few such errors every now and then).
Possibly some softraid-related metadata is corrupted?

Any ideas on how to recover data from /dev/sd3h?  Do the above
dd-command to some file and then use scan_ffs?  Might that work?

This is 5.3, built from source.

Juha

dmesg (btw, "Logitech Logitech Cordless RumblePad 2" works great with 
mupen64plus,
a bit of configuration was needed though ;-)

OpenBSD 5.3 (GENERIC) #0: Sun Mar 17 21:03:56 EET 2013
bu...@iso.turnipsi.no-ip.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: AMD Sempron(tm) 140 Processor ("AuthenticAMD" 686-class, 1024KB L2 cache) 
2.71 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW,SSE3,MWAIT,CX16,POPCNT,LAHF,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,WDT,ITSC
real mem  = 1878126592 (1791MB)
avail mem = 1836462080 (1751MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 09/14/09, BIOS32 rev. 0 @ 0xf0010, SMBIOS 
rev. 2.5 @ 0xf06d0 (63 entries)
bios0: vendor American Megatrends Inc. version "1501" date 09/14/2009
bios0: ASUSTeK Computer INC. M4A78 PRO
acpi0 at bios0: rev 0
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET SSDT
acpi0: wakeup devices PCE2(S4) PCE3(S4) PCE4(S4) PCE5(S4) PCE6(S4) PCE7(S4) 
PCE9(S4) PCEA(S4) PCEB(S4) PCEC(S4) SBAZ(S4) UAR1(S4) PS2K(S4) PS2M(S4) 
P0PC(S4) UHC1(S4) UHC2(S4) UHC3(S4) USB4(S4) UHC5(S4) UHC6(S4) UHC7(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD erratum 721 detected and fixed
cpu0: apic clock running at 200MHz
ioapi

Re: softraid mirror I/O error, cannot recover data

2013-04-06 Thread Stuart Henderson
On 2013-04-04, Juha Erkkila  wrote:
> Hello misc,
>
> I am having a problem with a mirroring softraid configuration.
> Every time I try to access a particular partition in softraid
> volume I start to get I/O errors so that softraid totally "breaks",
> that is, becomes non-operative.
>
> Note that it does very much look like that both of these
> disks are actually defective.

> Note that I said that I think both disks are defective.  When
> the system is booted with softraid disabled and then the disks
> (sd0 and sd1) are read with
> "dd if=/dev/rsd0c of=/dev/null conv=noerror,sync",
> dd reports several I/O errors (a few such errors every now and then).

You might want to try different cables, though really it sounds
like it's probably time to restore from your backups.



Re: softraid mirror I/O error, cannot recover data

2013-04-09 Thread Juha Erkkila
On Sat, Apr 06, 2013 at 10:43:13AM +, Stuart Henderson wrote:
> On 2013-04-04, Juha Erkkila  wrote:
> > Hello misc,
> >
> > I am having a problem with a mirroring softraid configuration.
> > Every time I try to access a particular partition in softraid
> > volume I start to get I/O errors so that softraid totally "breaks",
> > that is, becomes non-operative.
> >
> > Note that it does very much look like that both of these
> > disks are actually defective.
> 
> > Note that I said that I think both disks are defective.  When
> > the system is booted with softraid disabled and then the disks
> > (sd0 and sd1) are read with
> > "dd if=/dev/rsd0c of=/dev/null conv=noerror,sync",
> > dd reports several I/O errors (a few such errors every now and then).
> 
> You might want to try different cables, though really it sounds
> like it's probably time to restore from your backups.

Well I do have backups, but I did not want to go that route because
my most recent backup did not have my latest files, and I wanted
to recover those.

Anyway, I did find out a way to recover the data.  Basically I just
read the other disk in the softraid mirror, stripping the softraid
metadata, like this:

$ dd if=/dev/sd0d of=shifted-raid.img conv=noerror,sync skip=528

Some read errors occured, though, but oh well.  And then:

$ vnconfig vnd0 shifted-raid.img
$ fsck /dev/vnd0h
$ dump -0a -f vnd0h.dump /dev/vnd0h

Worked like a charm ;-)  I guess there might have been an easier
way, but I could not figure it out.

Before giving I/O errors softraid reported ``softraid0: retrying
read on block 591205648''.  Perhaps this will help someone else
who runs into this somewhere, later.

Juha