hello,

a bit over a week ago one of my remote boxes stopped responding, once
I got out there I noticed it had panicked, it doesn't have a serial
console so I grabbed some pictures and tried to bring it back alive.
The box came back but my softraid never recovered, this is all
documented in this thread:

http://marc.info/?t=145476742800004&r=1&w=2

The end result seemed that the metadata had been destroyed and
softraid couldn't recover. I cut my losses, recreated the softraid and
started copying the data back. After having copied (with rsync) for
about a day (it's roughly 8TB of data, so takes some time) the machine
panicked again. Once again it seems my softraid is unrecoverable. I'm
not sure if I have some broken hardware (drives), but I don't see any
indication of this in any logs, and to the extent SMART can be trusted
it hasn't not found any errors on any drives either. Any hints on
tracking this down further? If it is indeed flaky hardware it would be
good to be able to narrow down what is bad, and if that's causing the
panics or if I've just snagged a bug.

One thing perhaps worth mentioning is that once I'd grabbed all the
relevant info trying to boot the box from the ddb prompt is not
possible it locks up hard and has to be reset to be brought back. I
left the "boot dump" over night and eight hours later nothing. Last
time I tried "boot reboot" and it too hung until I reset the machine.

transcribed panic, trace, ps, uvm, bcstats, registers as well as dmesg
below. I've also put the pictures online in case I made any
transcription errors.

http://www.huldtgren.com/panics/20160211/

thanks,

.jh


uvm_fault(0xffffffff8193f240, 0x38, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      sr_validate_io+0x36:    movl    0x38(%r9),%r10d
ddb{1}> trace
sr_validate_io() at sr_validate_io+0x36
sr_raid5_rw() at sr_raid5_rw+0x40
sr_raid_recreate_wu() at sr_raid_recreate_wu+0x2c
sr_wu_done_callback() at sr_wu_done_callback+0x17a
taskq_thread() at taskq_thread+0x6c
end trace frame: 0x0, count: -5
ddb{1}> mach ddbcpu 0
Stopped at      Debugger+0x9:   leave
ddb{0}> trace
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x42
intr_handler() at intr_handler+0xac
Xintr_ioapic_edge26() at Xintr_ioapic_edge26+0xc9
--- interrupt ---
acpicpu_idle() at acpicup_idle+0x22d
cpu_idle_cycle() at cpu_idle_cycle+0x10
end trace frame: 0x0, count: -9
ddb{1}> ps
  TID    PPID   PGRP     UID    S          FLAGS        WAIT  COMMAND
 11955  20247   11955      0    3       0x100083        ttyin           ksh
 20247  17094   20247   1000    3       0x10008b        pause           ksh
 17094  12950   12950   1000    3           0x90        select  sshd
 12950  14152   12950      0    3           0x92        poll  sshd
 25055  15716   15716      0    3            0x1        biowait  rsync
  2981  15716   15716      0    3       0x100083        select          ssh
 15716  12399   15716      0    3           0x83        select  rsync
 24843  30167   24843      0    3       0x100083        poll            top
 30167  26494   30167      0    3       0x10008b        pause           ksh
 26494  18198   26494   1000    3       0x10008b        pause           ksh
 18198  11183   11183   1000    3           0x90        select  sshd
 11183  14152   11183      0    3           0x92        poll  sshd
 17762  10653   10653      0    3            0x1        bqwait  rsync
 27221  10653   10653      0    3       0x100083        select          ssh
 10653  24283   10653      0    3           0x83        select  rsync
 24283  21067   10653      0    3       0x10008b        pause           ksh
 21067  21976   21067   1000    3       0x10008b        pause           ksh
 21976  29573   29573   1000    3           0x90        select  sshd
 29573  14152   29573      0    3           0x92        poll  sshd
 12399  18909   12399      0    3       0x10008b        pause           ksh
 18909   1465   18909   1000    3       0x10008b        pause           ksh
  1465  17133   17133   1000    3           0x90        select  sshd
 17133  14152   17133      0    3           0x92        poll  sshd
*25002      0       0      0    7        0x14200 srdis
  6512      1       1      0    3       0x100082        ttyopn  getty
 30348      1   30348      0    3       0x100083        ttyin  getty
 29827      1   29827      0    3       0x100083        ttyin  getty
 19630      1   19630      0    3       0x100083        ttyin  getty
 23836      1   23836      0    3       0x100083        ttyin  getty
 13712      1   13712      0    3       0x100083        ttyin  getty
 10592      1   10592      0    3       0x100098        poll  cron
 21899      1   32625      0    3           0x80        nanosleep  smartd
   395      1     395     99    3       0x100090        poll  sndiod
 18412      1   18412    110    3       0x100090        poll  sndiod
 10383   7847    7847     95    3       0x100090        kqread  smtpd
 18817   7847    7847     95    3       0x100090        kqread  smtpd
 27786   7847    7847     95    3       0x100090        kqread  smtpd
  7668   7847    7847     95    3       0x100090        kqread  smtpd
 11261   7847    7847     95    3       0x100090        kqread  smtpd
 22058   7847    7847    103    3       0x100090        kqread  smtpd
  7847      1    7847      0    3       0x100080        kqread  smtpd
 14152      1   14152      0    3           0x80        select  sshd
  6278      0       0      0    3        0x14200        acct  acct
 25633  22697   13290     83    3       0x100090        poll  ntpd
 22697  13290   13290     83    3       0x100090        poll  ntpd
 13290      1   13290      0    3       0x100080        poll  ntpd
 25391  26413   26413     74    3       0x100090        bpf  pflogd
 26413      1   26413      0    3           0x80        netio  pflogd
 30086  28459   28459     73    2       0x100090  syslogd
 28459      1   28459      0    3       0x100080        netio  syslogd
 23580      0       0      0    3        0x14200        pgzero  zerothread
 20233      0       0      0    3        0x14200        aiodoned  aiodoned
 23607      0       0      0    3        0x14200        bqwait  update
 27258      0       0      0    3        0x14200        cleaner  cleaner
 24738      0       0      0    3        0x14200        reaper  reaper
7021 0 0 0 3 0x14200 pgdaemon pagedaemon
 22416      0       0      0    3        0x14200        bored  crypto
 22107      0       0      0    3        0x14200        pftm  pfpurge
  6058      0       0      0    3        0x14200        usbtsk  usbtask
  2695      0       0      0    3        0x14200        usbatsk  usbatsk
  7442      0       0      0    3        0x14200        bored  i915
 18399      0       0      0    3     0x40014200        acpi0  acpi0
 17628      0       0      0    3     0x40014200  idle1
 31251      0       0      0    3        0x14200        bored  sensors
 27152      0       0      0    3        0x14200        bored  softnet
 31412      0       0      0    3        0x14200        bored  systqmp
 28803      0       0      0    3        0x14200        biowait  systq
 20636      0       0      0    7     0x40014200  idle0
  6619      0       0      0    3        0x14200        bored  sbar
     1      0       1      0    3           0x82        wait  init
     0     -1       0      0    3        0x10200        scheduler  swapper
ddb{1}> show uvm
Current UVM status:
 pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
3987405 VM pages: 13148 active, 51 inactive, 1431 wired, 3785494 free (473202 zero)
 min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
 pages 0 anon, 0 vnode, 0 vtext
 freemin=132913, free-target=177217, inactive-target=0, wired-max=1329136
faults=796094835, traps=1013108540, intrs=740034249, ctxswitch=1748677990 fpuswitch=1438992
 softint=502553412, syscalls=-1748437559, kmapent=22
 fault counts
  noram=0, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=5541(5542), anget(retries)=1770007(0), amapcopy=174102653
  neighbor anon/obj pg=178470/895842, gets(lock/unlock)=249162/5542
cases: anon=1583821, anoncow=186186, obj=232267, prcopy=16894, przero=79407
 daemon and swap counts:
  woke=0, revs=0, scans=0, obscans=0, anscans=0
  busy=0, freed=0, reactivate=0, deactivate=0
  pageouts=0, pending=0, nswget=0
  nswapdev=1, nanon=0, nanonneeded=0 nfreeanon=0
  swpages=2767434, swpginuse=0, swpgonly=0 paging=0
 kernel pointers:
   objs(kern)=0xffffffff819088a0
ddb{1}> show bcstats
Current Buffer Cache status:
numbufs 10540 busymapped 129, delwri 99
kvaslots 6553 avail kva slots 6424
bufpages 167072, dirtypages 1524
pendingreads 0, pendingwrites 129
ddb{1}> show registers
rdi             0xffff8000007cc500
rsi             0xffff800032d90e08
rbp             0xffff800032d90d90
rbx             0xffff8000007cc500
rdx             0xffffffff8170c2a9      xhci_hubd+0x2269
rcx             0xffff8000007d6000
rax                              0
r8              0xffff800000776000
r9                               0
r10                            0x1
r11                          0x210
r12             0xffff8000007cc500
r13                              0
r14                              0
r15                              0
rip             0xffffffff811178a6      sr_validate_io+0x36
cs                             0x8
rflags                     0x10202      __ALIGN_SIZE+0x36
rsp             0xffff800032d90d80
ss                            0x10
sr_validate_io+0x36:    movl  0x38(%r9),%r10d
ddb{1}>

OpenBSD 5.9 (GENERIC.MP) #1870: Mon Feb  8 17:34:23 MST 2016
    dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16838438912 (16058MB)
avail mem = 16323923968 (15567MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xee8c0 (29 entries)
bios0: vendor American Megatrends Inc. version "P2.10" date 05/12/2015
bios0: ASRock Z97 Extreme4
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT SSDT SSDT SSDT MCFG HPET SSDT SSDT AAFT UEFI SSDT acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) PS2K(S4) PS2M(S4) UAR1(S4) USB1(S3) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Pentium(R) CPU G3258 @ 3.20GHz, 3199.47 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,ERMS,INVPCID,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Pentium(R) CPU G3258 @ 3.20GHz, 3199.07 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,ERMS,INVPCID,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG1)
acpiprt3 at acpi0: bus -1 (PEG2)
acpiprt4 at acpi0: bus 1 (RP01)
acpiprt5 at acpi0: bus 2 (RP04)
acpiprt6 at acpi0: bus 3 (RP06)
acpiprt7 at acpi0: bus 4 (RP07)
acpiec0 at acpi0: not present
acpicpu0 at acpi0: C2(500@67 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(500@67 mwait.1@0x10), C1(1000@1 mwait.1), PSS
acpibat0 at acpi0: BAT0 not present
acpibat1 at acpi0: BAT1 not present
acpibat2 at acpi0: BAT2 not present
dwiic at acpi0 not configured
dwiic at acpi0 not configured
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: SLPB
acpibtn2 at acpi0: LID0
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: DD1F
cpu0: Enhanced SpeedStep 3199 MHz: speeds: 3201, 3200, 3000, 2900, 2700, 2500, 2300, 2200, 2000, 1800, 1700, 1500, 1300, 1100, 1000, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1280x1024
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi
azalia0: No codecs found
xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x00: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 9 Series MEI" rev 0x00 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel I218-V" rev 0x00: msi, address d0:50:99:5a:09:1c
ehci0 at pci0 dev 26 function 0 "Intel 9 Series USB" rev 0x00: apic 8 int 16
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x00: msi
azalia1: codecs: Realtek ALC1150
audio0 at azalia1
ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xd0
pci1 at ppb0 bus 1
ppb1 at pci0 dev 28 function 3 "Intel 9 Series PCIE" rev 0xd0: msi
pci2 at ppb1 bus 2
ahci0 at pci2 dev 0 function 0 "ASMedia ASM1061 AHCI" rev 0x02: msi, AHCI 1.2
ahci0: port 0: 6.0Gb/s
ahci0: port 1: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Hitachi HDS72302, MN6O> SCSI3 0/direct fixed naa.5000cca36ac10354
sd0: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd1 at scsibus1 targ 1 lun 0: <ATA, Hitachi HDS72302, MN6O> SCSI3 0/direct fixed naa.5000cca36ac1b3d2
sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors
ppb2 at pci0 dev 28 function 5 "Intel 9 Series PCIE" rev 0xd0: msi
pci3 at ppb2 bus 3
ahci1 at pci3 dev 0 function 0 "ASMedia ASM1061 AHCI" rev 0x01: msi, AHCI 1.2
ahci1: port 0: 6.0Gb/s
scsibus2 at ahci1: 32 targets
sd2 at scsibus2 targ 0 lun 0: <ATA, SanDisk SDSSDA12, Z220> SCSI3 0/direct fixed naa.5001b44e915e1444
sd2: 114473MB, 512 bytes/sector, 234441648 sectors, thin
ppb3 at pci0 dev 28 function 6 "Intel 9 Series PCIE" rev 0xd0: msi
pci4 at ppb3 bus 4
xhci1 at pci4 dev 0 function 0 "ASMedia ASM1042A xHCI" rev 0x00: msi
usb2 at xhci1: USB revision 3.0
uhub2 at usb2 "ASMedia xHCI root hub" rev 3.00/1.00 addr 1
ehci1 at pci0 dev 29 function 0 "Intel 9 Series USB" rev 0x00: apic 8 int 23
usb3 at ehci1: USB revision 2.0
uhub3 at usb3 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel Z97 LPC" rev 0x00
ahci2 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x00: msi, AHCI 1.3
ahci2: port 0: 6.0Gb/s
ahci2: port 1: 6.0Gb/s
ahci2: port 2: 6.0Gb/s
ahci2: port 3: 6.0Gb/s
ahci2: port 4: 6.0Gb/s
ahci2: port 5: 6.0Gb/s
scsibus3 at ahci2: 32 targets
sd3 at scsibus3 targ 0 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3 0/direct fixed naa.5000c500733462fd
sd3: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd4 at scsibus3 targ 1 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3 0/direct fixed naa.50014ee2b64e9980
sd4: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd5 at scsibus3 targ 2 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3 0/direct fixed naa.50014ee2608d2ab9
sd5: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd6 at scsibus3 targ 3 lun 0: <ATA, WDC WD20PURX-64P, 80.0> SCSI3 0/direct fixed naa.50014ee2b62ccc07
sd6: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd7 at scsibus3 targ 4 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3 0/direct fixed naa.5000c5007334c0a0
sd7: 1907729MB, 512 bytes/sector, 3907029168 sectors
sd8 at scsibus3 targ 5 lun 0: <ATA, ST2000VX000-1CU1, CV23> SCSI3 0/direct fixed naa.5000c50073337feb
sd8: 1907729MB, 512 bytes/sector, 3907029168 sectors
ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x00: apic 8 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x51: 8GB DDR3 SDRAM PC3-12800
spdmem1 at iic0 addr 0x53: 8GB DDR3 SDRAM PC3-12800
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
uhub4 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhub5 at uhub3 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
softraid0: trying to bring up sd9 degraded
softraid0: sd9 was not shutdown properly
softraid0: sd9 is offline, will not be brought online
root on sd2a (10906fca8f4bb5bf.a) swap on sd2b dump on sd2b
WARNING: / was not properly unmounted

Reply via email to