hello,
a bit over a week ago one of my remote boxes stopped responding, once
I got out there I noticed it had panicked, it doesn't have a serial
console so I grabbed some pictures and tried to bring it back alive.
The box came back but my softraid never recovered, this is all
documented in this thread:
http://marc.info/?t=145476742800004&r=1&w=2
The end result seemed that the metadata had been destroyed and
softraid couldn't recover. I cut my losses, recreated the softraid and
started copying the data back. After having copied (with rsync) for
about a day (it's roughly 8TB of data, so takes some time) the machine
panicked again. Once again it seems my softraid is unrecoverable. I'm
not sure if I have some broken hardware (drives), but I don't see any
indication of this in any logs, and to the extent SMART can be trusted
it hasn't not found any errors on any drives either. Any hints on
tracking this down further? If it is indeed flaky hardware it would be
good to be able to narrow down what is bad, and if that's causing the
panics or if I've just snagged a bug.
One thing perhaps worth mentioning is that once I'd grabbed all the
relevant info trying to boot the box from the ddb prompt is not
possible it locks up hard and has to be reset to be brought back. I
left the "boot dump" over night and eight hours later nothing. Last
time I tried "boot reboot" and it too hung until I reset the machine.
transcribed panic, trace, ps, uvm, bcstats, registers as well as dmesg
below. I've also put the pictures online in case I made any
transcription errors.
http://www.huldtgren.com/panics/20160211/
thanks,
.jh
uvm_fault(0xffffffff8193f240, 0x38, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at sr_validate_io+0x36: movl 0x38(%r9),%r10d
ddb{1}> trace
sr_validate_io() at sr_validate_io+0x36
sr_raid5_rw() at sr_raid5_rw+0x40
sr_raid_recreate_wu() at sr_raid_recreate_wu+0x2c
sr_wu_done_callback() at sr_wu_done_callback+0x17a
taskq_thread() at taskq_thread+0x6c
end trace frame: 0x0, count: -5
ddb{1}> mach ddbcpu 0
Stopped at Debugger+0x9: leave
ddb{0}> trace
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x42
intr_handler() at intr_handler+0xac
Xintr_ioapic_edge26() at Xintr_ioapic_edge26+0xc9
--- interrupt ---
acpicpu_idle() at acpicup_idle+0x22d
cpu_idle_cycle() at cpu_idle_cycle+0x10
end trace frame: 0x0, count: -9
ddb{1}> ps
TID PPID PGRP UID S FLAGS WAIT COMMAND
11955 20247 11955 0 3 0x100083 ttyin ksh
20247 17094 20247 1000 3 0x10008b pause ksh
17094 12950 12950 1000 3 0x90 select sshd
12950 14152 12950 0 3 0x92 poll sshd
25055 15716 15716 0 3 0x1 biowait rsync
2981 15716 15716 0 3 0x100083 select ssh
15716 12399 15716 0 3 0x83 select rsync
24843 30167 24843 0 3 0x100083 poll top
30167 26494 30167 0 3 0x10008b pause ksh
26494 18198 26494 1000 3 0x10008b pause ksh
18198 11183 11183 1000 3 0x90 select sshd
11183 14152 11183 0 3 0x92 poll sshd
17762 10653 10653 0 3 0x1 bqwait rsync
27221 10653 10653 0 3 0x100083 select ssh
10653 24283 10653 0 3 0x83 select rsync
24283 21067 10653 0 3 0x10008b pause ksh
21067 21976 21067 1000 3 0x10008b pause ksh
21976 29573 29573 1000 3 0x90 select sshd
29573 14152 29573 0 3 0x92 poll sshd
12399 18909 12399 0 3 0x10008b pause ksh
18909 1465 18909 1000 3 0x10008b pause ksh
1465 17133 17133 1000 3 0x90 select sshd
17133 14152 17133 0 3 0x92 poll sshd
*25002 0 0 0 7 0x14200 srdis
6512 1 1 0 3 0x100082 ttyopn getty
30348 1 30348 0 3 0x100083 ttyin getty
29827 1 29827 0 3 0x100083 ttyin getty
19630 1 19630 0 3 0x100083 ttyin getty
23836 1 23836 0 3 0x100083 ttyin getty
13712 1 13712 0 3 0x100083 ttyin getty
10592 1 10592 0 3 0x100098 poll cron
21899 1 32625 0 3 0x80 nanosleep smartd
395 1 395 99 3 0x100090 poll sndiod
18412 1 18412 110 3 0x100090 poll sndiod
10383 7847 7847 95 3 0x100090 kqread smtpd
18817 7847 7847 95 3 0x100090 kqread smtpd
27786 7847 7847 95 3 0x100090 kqread smtpd
7668 7847 7847 95 3 0x100090 kqread smtpd
11261 7847 7847 95 3 0x100090 kqread smtpd
22058 7847 7847 103 3 0x100090 kqread smtpd
7847 1 7847 0 3 0x100080 kqread smtpd
14152 1 14152 0 3 0x80 select sshd
6278 0 0 0 3 0x14200 acct acct
25633 22697 13290 83 3 0x100090 poll ntpd
22697 13290 13290 83 3 0x100090 poll ntpd
13290 1 13290 0 3 0x100080 poll ntpd
25391 26413 26413 74 3 0x100090 bpf pflogd
26413 1 26413 0 3 0x80 netio pflogd
30086 28459 28459 73 2 0x100090 syslogd
28459 1 28459 0 3 0x100080 netio syslogd
23580 0 0 0 3 0x14200 pgzero zerothread
20233 0 0 0 3 0x14200 aiodoned aiodoned
23607 0 0 0 3 0x14200 bqwait update
27258 0 0 0 3 0x14200 cleaner cleaner
24738 0 0 0 3 0x14200 reaper reaper
7021 0 0 0 3 0x14200 pgdaemon
pagedaemon
22416 0 0 0 3 0x14200 bored crypto
22107 0 0 0 3 0x14200 pftm pfpurge
6058 0 0 0 3 0x14200 usbtsk usbtask
2695 0 0 0 3 0x14200 usbatsk usbatsk
7442 0 0 0 3 0x14200 bored i915
18399 0 0 0 3 0x40014200 acpi0 acpi0
17628 0 0 0 3 0x40014200 idle1
31251 0 0 0 3 0x14200 bored sensors
27152 0 0 0 3 0x14200 bored softnet
31412 0 0 0 3 0x14200 bored systqmp
28803 0 0 0 3 0x14200 biowait systq
20636 0 0 0 7 0x40014200 idle0
6619 0 0 0 3 0x14200 bored sbar
1 0 1 0 3 0x82 wait init
0 -1 0 0 3 0x10200 scheduler swapper
ddb{1}> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
3987405 VM pages: 13148 active, 51 inactive, 1431 wired, 3785494 free
(473202 zero)
min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
pages 0 anon, 0 vnode, 0 vtext
freemin=132913, free-target=177217, inactive-target=0, wired-max=1329136
faults=796094835, traps=1013108540, intrs=740034249,
ctxswitch=1748677990 fpuswitch=1438992
softint=502553412, syscalls=-1748437559, kmapent=22
fault counts
noram=0, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=5541(5542), anget(retries)=1770007(0),
amapcopy=174102653
neighbor anon/obj pg=178470/895842, gets(lock/unlock)=249162/5542
cases: anon=1583821, anoncow=186186, obj=232267, prcopy=16894,
przero=79407
daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1, nanon=0, nanonneeded=0 nfreeanon=0
swpages=2767434, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0xffffffff819088a0
ddb{1}> show bcstats
Current Buffer Cache status:
numbufs 10540 busymapped 129, delwri 99
kvaslots 6553 avail kva slots 6424
bufpages 167072, dirtypages 1524
pendingreads 0, pendingwrites 129
ddb{1}> show registers
rdi 0xffff8000007cc500
rsi 0xffff800032d90e08
rbp 0xffff800032d90d90
rbx 0xffff8000007cc500
rdx 0xffffffff8170c2a9 xhci_hubd+0x2269
rcx 0xffff8000007d6000
rax 0
r8 0xffff800000776000
r9 0
r10 0x1
r11 0x210
r12 0xffff8000007cc500
r13 0
r14 0
r15 0
rip 0xffffffff811178a6 sr_validate_io+0x36
cs 0x8
rflags 0x10202 __ALIGN_SIZE+0x36
rsp 0xffff800032d90d80
ss 0x10
sr_validate_io+0x36: movl 0x38(%r9),%r10d
ddb{1}>
OpenBSD 5.9 (GENERIC.MP) #1870: Mon Feb 8 17:34:23 MST 2016
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16838438912 (16058MB)
avail mem = 16323923968 (15567MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xee8c0 (29 entries)
bios0: vendor American Megatrends Inc. version "P2.10" date 05/12/2015
bios0: ASRock Z97 Extreme4
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT SSDT SSDT SSDT MCFG HPET SSDT SSDT
AAFT UEFI SSDT
acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4)
PEG2(S4) PS2K(S4) PS2M(S4) UAR1(S4) USB1(S3) PXSX(S4) RP01(S4) PXSX(S4)
RP02(S4) PXSX(S4) RP03(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Pentium(R) CPU G3258 @ 3.20GHz, 3199.47 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,ERMS,INVPCID,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Pentium(R) CPU G3258 @ 3.20GHz, 3199.07 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,ERMS,INVPCID,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG1)
acpiprt3 at acpi0: bus -1 (PEG2)
acpiprt4 at acpi0: bus 1 (RP01)
acpiprt5 at acpi0: bus 2 (RP04)
acpiprt6 at acpi0: bus 3 (RP06)
acpiprt7 at acpi0: bus 4 (RP07)
acpiec0 at acpi0: not present
acpicpu0 at acpi0: C2(500@67 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(500@67 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpibat0 at acpi0: BAT0 not present
acpibat1 at acpi0: BAT1 not present
acpibat2 at acpi0: BAT2 not present
dwiic at acpi0 not configured
dwiic at acpi0 not configured
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: SLPB
acpibtn2 at acpi0: LID0
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: DD1F
cpu0: Enhanced SpeedStep 3199 MHz: speeds: 3201, 3200, 3000, 2900, 2700,
2500, 2300, 2200, 2000, 1800, 1700, 1500, 1300, 1100, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1280x1024
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi
azalia0: No codecs found
xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x00: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 9 Series MEI" rev 0x00 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel I218-V" rev 0x00: msi, address
d0:50:99:5a:09:1c
ehci0 at pci0 dev 26 function 0 "Intel 9 Series USB" rev 0x00: apic 8 int 16
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x00: msi
azalia1: codecs: Realtek ALC1150
audio0 at azalia1
ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xd0
pci1 at ppb0 bus 1
ppb1 at pci0 dev 28 function 3 "Intel 9 Series PCIE" rev 0xd0: msi
pci2 at ppb1 bus 2
ahci0 at pci2 dev 0 function 0 "ASMedia ASM1061 AHCI" rev 0x02: msi,
AHCI 1.2
ahci0: port 0: 6.0Gb/s
ahci0: port 1: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Hitachi HDS72302, MN6O> SCSI3
0/direct fixed naa.5000cca36ac10354
sd0: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd1 at scsibus1 targ 1 lun 0: <ATA, Hitachi HDS72302, MN6O> SCSI3
0/direct fixed naa.5000cca36ac1b3d2
sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors
ppb2 at pci0 dev 28 function 5 "Intel 9 Series PCIE" rev 0xd0: msi
pci3 at ppb2 bus 3
ahci1 at pci3 dev 0 function 0 "ASMedia ASM1061 AHCI" rev 0x01: msi,
AHCI 1.2
ahci1: port 0: 6.0Gb/s
scsibus2 at ahci1: 32 targets
sd2 at scsibus2 targ 0 lun 0: <ATA, SanDisk SDSSDA12, Z220> SCSI3
0/direct fixed naa.5001b44e915e1444
sd2: 114473MB, 512 bytes/sector, 234441648 sectors, thin
ppb3 at pci0 dev 28 function 6 "Intel 9 Series PCIE" rev 0xd0: msi
pci4 at ppb3 bus 4
xhci1 at pci4 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
usb2 at xhci1: USB revision 3.0
uhub2 at usb2 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
ehci1 at pci0 dev 29 function 0 "Intel 9 Series USB" rev 0x00: apic 8 int 23
usb3 at ehci1: USB revision 2.0
uhub3 at usb3 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel Z97 LPC" rev 0x00
ahci2 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x00: msi,
AHCI 1.3
ahci2: port 0: 6.0Gb/s
ahci2: port 1: 6.0Gb/s
ahci2: port 2: 6.0Gb/s
ahci2: port 3: 6.0Gb/s
ahci2: port 4: 6.0Gb/s
ahci2: port 5: 6.0Gb/s
scsibus3 at ahci2: 32 targets
sd3 at scsibus3 targ 0 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3
0/direct fixed naa.5000c500733462fd
sd3: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd4 at scsibus3 targ 1 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3
0/direct fixed naa.50014ee2b64e9980
sd4: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd5 at scsibus3 targ 2 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3
0/direct fixed naa.50014ee2608d2ab9
sd5: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd6 at scsibus3 targ 3 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3
0/direct fixed naa.50014ee2b62ccc07
sd6: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd7 at scsibus3 targ 4 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3
0/direct fixed naa.5000c5007334c0a0
sd7: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd8 at scsibus3 targ 5 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3
0/direct fixed naa.5000c50073337feb
sd8: 1907729MB, 512 bytes/sector, 3907029168 sectors
ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x00: apic
8 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x51: 8GB DDR3 SDRAM PC3-12800
spdmem1 at iic0 addr 0x53: 8GB DDR3 SDRAM PC3-12800
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
uhub4 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub5 at uhub3 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
softraid0: trying to bring up sd9 degraded
softraid0: sd9 was not shutdown properly
softraid0: sd9 is offline, will not be brought online
root on sd2a (10906fca8f4bb5bf.a) swap on sd2b dump on sd2b
WARNING: / was not properly unmounted