My HP Proliant D140 G3 running 4.9 CURRENT.MP with one modification
(lowered UDMA mode, see just below) lost its softraid RAID1 volume - it
will not be brought online at boot.  Can someone tell me if it's
possible, and how, to bring this volume up from what I think is a
softraid partition that still has good (enough) data in it?  I think
the failure that caused the drive to go offline is a disk controller,
not the disk itself.

Thank you, I hope.

This is the sequence of events:

Days ago:
* BIOS detects disk as UDMA mode 5
* kernel autodetects them as UDMA mode 6, i changed kernel to use mode
5, change is reflected in dmesg below
* i set up softraid volume in RAID1 on 2 identical devices on 2
identical disks
        * wd0d and wd1d, both 1.0TB, on 1.5TB WDC WD15EARS-00Z5B1
drives, see disklabels below
* i moved drives between machines
* i booted up machine, softraid saw one disk as roaming or something on
boot and brought up the softraid volume degraded - from 1 disk only
* i started softraid rebuild from the one device that was online in the
volume
        * i forget which one was being rebuilt from which
* rebuild got to about 75%

Today:
* machine started throwing disk errors (just samples):

wd1(pciide1:1:0): timeout
        type: ata
        c_bcount: 16384
        c_skip: 0
pciide1:1:0: bus-master DMA error: missing interrupt, status=0x21

wd0(pciide1:0:0): timeout
        type: ata
        c_bcount: 16384
        c_skip: 0
pciide1:0:0: bus-master DMA error: missing interrupt, status=0x21


wd0d: device timeout writing fsbn 698343248 of 698343248-698343279 (wd0
bn 708833693; cn 44122 tn 218 sn 29), retrying
wd0: soft error (corrected)
wd0(pciide1:0:0): timeout
        type: ata
        c_bcount: 16384
        c_skip: 0
pciide1:0:0: bus-master DMA error: missing interrupt, status=0x21

wd0d: device timeout reading fsbn 544078240 of 544078240-544078271 (wd0
bn 554568685; cn 34520 tn 77 sn 34), retrying
wd0: soft error (corrected)
wd0(pciide1:0:0): timeout
        type: ata
        c_bcount: 16384
        c_skip: 0
pciide1:0:0: bus-master DMA error: missing interrupt, status=0x21



* rebooted
* more disk errors (just samples):

login: wd0d: uncorrectable data error reading fsbn 195963120 of
195963120-195963151 (wd0 bn 206453565; cn 12851 tn 35 sn 45), retrying
wd0d: uncorrectable data error reading fsbn 195963120 of
195963120-195963151 (wd0 bn 206453565; cn 12851 tn 35 sn 45), retrying
wd0d: uncorrectable data error reading fsbn 195963120 of
195963120-195963151 (wd0 bn 206453565; cn 12851 tn 35 sn 45), retrying
wd0: soft error (corrected)


* disk device being rebuilt from went offline (!!!) <-- i think this
partition device still has good (enough) data
* console filled with I/O errors from postfix and others trying to
write to disk
* i shutdown the machine

* now when i boot:

softraid0 at root
softraid0: trying to bring up sd0 degraded
softraid0: sd0 offline, will not be brought online
root on wd0a swap on wd0b dump on wd0b


* disklabel and sd0 output

# disklabel wd0
# /dev/rwd0c:
type: ESDI
disk: ESDI/IDE disk
label: WDC WD15EARS-00Z
duid: 56a7443e9e93163c
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 182401
total sectors: 2930277168
boundstart: 64
boundend: 2930277168
drivedata: 0

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
  a:          2104448               64  4.2BSD   2048 16384    1 # /
  b:          8385933          2104512    swap
  c:       2930277168                0  unused
  d:       2097157230         10490445    RAID
  e:          2104448       2107647680  4.2BSD   2048 16384    1


# disklabel wd1
# /dev/rwd1c:
type: ESDI
disk: ESDI/IDE disk
label: WDC WD15EARS-00Z
duid: ab1a569ef35a3c17
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 182401
total sectors: 2930277168
boundstart: 64
boundend: 2930277168
drivedata: 0

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
  a:          2104448               64  4.2BSD   2048 16384    1
  b:          8385933          2104512    swap
  c:       2930277168                0  unused
  d:       2097157230         10490445    RAID
  e:          2104480       2107647680  4.2BSD   2048 16384    1

# dmesg
OpenBSD 4.9 (GENERIC.MP) #794: Wed Mar  2 07:19:02 MST 2011
    dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz ("GenuineIntel" 686-class)
1.60 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,
CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,TM2
,SSSE3,CX16,xTPR,PDCM,DCA
real mem  = 2146054144 (2046MB)
avail mem = 2100785152 (2003MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 12/31/99, BIOS32 rev. 0 @
0xfd361, SMBIOS rev. 2.31 @ 0xdc010 (57 entries)
bios0: vendor HP version "O08" date 06/03/2009
bios0: HP ProLiant DL140 G3
acpi0 at bios0: rev 0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP SPMI APIC MCFG BOOT SPCR SSDT
acpi0: wakeup devices BPD0(S5) BMF3(S5) P0P4(S5) P0P6(S5) PEX0(S5)
PEX1(S5) PEX2(S5) PEX3(S5) USB1(S5) USB2(S5) USB3(S5) EUSB(S5) PCIB(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 265MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz ("GenuineIntel" 686-class)
1.60 GHz
cpu1:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,
CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,TM2
,SSSE3,CX16,xTPR,PDCM,DCA
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0: apid 3 pa 0xfec80000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-31
acpiprt0 at acpi0: bus 1 (P0P2)
acpiprt1 at acpi0: bus 2 (BMD0)
acpiprt2 at acpi0: bus 3 (BPD0)
acpiprt3 at acpi0: bus -1 (BPD1)
acpiprt4 at acpi0: bus -1 (BPD2)
acpiprt5 at acpi0: bus 11 (BMF3)
acpiprt6 at acpi0: bus 16 (P0P4)
acpiprt7 at acpi0: bus 18 (P0P6)
acpiprt8 at acpi0: bus 0 (PCI0)
acpiprt9 at acpi0: bus 30 (PEX0)
acpiprt10 at acpi0: bus 31 (PEX1)
acpiprt11 at acpi0: bus -1 (PEX2)
acpiprt12 at acpi0: bus -1 (PEX3)
acpiprt13 at acpi0: bus 32 (PCIB)
acpicpu0 at acpi0
acpicpu1 at acpi0
acpibtn0 at acpi0: PWRB
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1000 0xdc000/0x4000!
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 5000X Host" rev 0x31
ppb0 at pci0 dev 2 function 0 "Intel 5000 PCIE x8" rev 0x31
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci2 at ppb1 bus 2
ppb2 at pci2 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci3 at ppb2 bus 3
ppb3 at pci1 dev 0 function 3 "Intel 6321ESB PCIE-PCIX" rev 0x01
pci4 at ppb3 bus 11
ppb4 at pci0 dev 3 function 0 "Intel 5000 PCIE" rev 0x31
pci5 at ppb4 bus 12
ppb5 at pci0 dev 4 function 0 "Intel 5000 PCIE x16" rev 0x31: apic 2
int 16 (irq 0)
pci6 at ppb5 bus 16
ppb6 at pci0 dev 5 function 0 "Intel 5000 PCIE" rev 0x31: apic 2 int 16
(irq 0)
pci7 at ppb6 bus 17
ppb7 at pci0 dev 6 function 0 "Intel 5000 PCIE" rev 0x31: apic 2 int 16
(irq 0)
pci8 at ppb7 bus 18
ppb8 at pci0 dev 7 function 0 "Intel 5000 PCIE" rev 0x31: apic 2 int 16
(irq 0)
pci9 at ppb8 bus 19
pchb1 at pci0 dev 16 function 0 "Intel 5000 Error Reporting" rev 0x31
pchb2 at pci0 dev 16 function 1 "Intel 5000 Error Reporting" rev 0x31
pchb3 at pci0 dev 16 function 2 "Intel 5000 Error Reporting" rev 0x31
pchb4 at pci0 dev 17 function 0 "Intel 5000 Reserved" rev 0x31
pchb5 at pci0 dev 19 function 0 "Intel 5000 Reserved" rev 0x31
pchb6 at pci0 dev 21 function 0 "Intel 5000 FBD" rev 0x31
pchb7 at pci0 dev 22 function 0 "Intel 5000 FBD" rev 0x31
ppb9 at pci0 dev 28 function 0 "Intel 6321ESB PCIE" rev 0x09
pci10 at ppb9 bus 30
bge0 at pci10 dev 0 function 0 "Broadcom BCM5721" rev 0x11, BCM5750 B1
(0x4101): apic 2 int 16 (irq 7), address 00:1c:c4:3c:0f:2b
brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0
ppb10 at pci0 dev 28 function 1 "Intel 6321ESB PCIE" rev 0x09
pci11 at ppb10 bus 31
bge1 at pci11 dev 0 function 0 "Broadcom BCM5721" rev 0x11, BCM5750 B1
(0x4101): apic 2 int 17 (irq 11), address 00:1c:c4:3c:0f:2c
brgphy1 at bge1 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0
uhci0 at pci0 dev 29 function 0 "Intel 6321ESB USB" rev 0x09: apic 2
int 23 (irq 5)
uhci1 at pci0 dev 29 function 1 "Intel 6321ESB USB" rev 0x09: apic 2
int 23 (irq 5)
uhci2 at pci0 dev 29 function 2 "Intel 6321ESB USB" rev 0x09: apic 2
int 23 (irq 5)
ehci0 at pci0 dev 29 function 7 "Intel 6321ESB USB" rev 0x09: apic 2
int 23 (irq 5)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb11 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xd9
pci12 at ppb11 bus 32
vga1 at pci12 dev 2 function 0 "Matrox MGA G200e (ServerEngines)" rev
0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 6321ESB LPC" rev 0x09: PM
disabled
pciide0 at pci0 dev 31 function 1 "Intel 6321ESB IDE" rev 0x09: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 ignored (disabled)
pciide1 at pci0 dev 31 function 2 "Intel 6321ESB SATA" rev 0x09: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide1: using apic 2 int 19 (irq 10) for native-PCI interrupt
wd0 at pciide1 channel 0 drive 0: <WDC WD15EARS-00Z5B1>
wd0: 16-sector PIO, LBA48, 1430799MB, 2930277168 sectors
wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5
wd1 at pciide1 channel 1 drive 0: <WDC WD15EARS-00Z5B1>
wd1: 16-sector PIO, LBA48, 1430799MB, 2930277168 sectors
wd1(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 5
ichiic0 at pci0 dev 31 function 3 "Intel 6321ESB SMBus" rev 0x09: apic
2 int 19 (irq 10)
iic0 at ichiic0
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
mtrr: Pentium Pro MTRR support
uhidev0 at uhub3 port 1 configuration 1 interface 0 "ServerEngines SE
USB Device" rev 1.10/0.01 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 modifier keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub3 port 1 configuration 1 interface 1 "ServerEngines SE
USB Device" rev 1.10/0.01 addr 2
uhidev1: iclass 3/1
ums0 at uhidev1: 8 buttons, Z dir
wsmouse0 at ums0 mux 0
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
softraid0: trying to bring up sd0 degraded
softraid0: sd0 offline, will not be brought online
root on wd0a swap on wd0b dump on wd0b
#

Reply via email to