On Fri, Jul 17, 2020 at 10:43 AM Jason King <[email protected]> wrote:
> Do you see anything in fmadm dump? > > It panics before fmd can start. > > From: Schweiss, Chip <[email protected]> <[email protected]> > Reply: illumos-discuss <[email protected]> > <[email protected]> > Date: July 17, 2020 at 6:05:50 AM > To: illumos-discuss <[email protected]> > <[email protected]>, omnios-discuss > <[email protected]> <[email protected]> > Subject: [discuss] OmniOS 030 server in continuous panic loop > > I believe I have a hardware failure in one of my servers. I get about > 1-2 minutes of uptime and it panics again. > > I've been able to pull /var/adm/messages (attached) but without a stable > boot. I'm not sure how I can collect the crash dump. > > Here's the panic stack: > > Jul 17 03:31:21 mir-zfs02 genunix: [ID 843051 kern.info] NOTICE: > SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major > Jul 17 03:31:22 mir-zfs02 unix: [ID 836849 kern.notice] > Jul 17 03:31:22 mir-zfs02 ^Mpanic[cpu14]/thread=fffffcc269d0dc20: > Jul 17 03:31:22 mir-zfs02 genunix: [ID 647700 kern.notice] pcieb-14: > PCI(-X) Express Fatal Error. (0x101) > Jul 17 03:31:22 mir-zfs02 unix: [ID 100000 kern.notice] > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d0db60 pcieb:pcieb_intr_handler+1c9 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d0dbd0 apix:apix_dispatch_pending_autovect+101 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d0dc00 apix:apix_dispatch_pending_hardint+34 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d166a0 unix:switch_sp_and_call+13 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16700 apix:apix_do_interrupt+1e9 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16710 unix:_interrupt+ba () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16840 unix:fakesoftint+23 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16870 genunix:disp_lock_exit+47 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d168b0 genunix:cv_signal+8a () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16910 genunix:evch_evq_pub+89 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16970 genunix:evch_chpublish+f1 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16a90 genunix:sysevent_evc_publish+15b () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16ae0 genunix:fm_ereport_post+9d () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16b10 genunix:fm_drain+4d () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16b60 genunix:errorq_drain+108 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16b80 genunix:errorq_intr+11 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16bd0 unix:av_dispatch_softvect+78 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269d16c00 apix:apix_dispatch_softint+35 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc59b0 unix:switch_sp_and_call+13 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5a00 apix:apix_do_softint+34 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5a60 apix:apix_do_interrupt+382 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5a70 unix:lwp_rtt_initial+87 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5be0 unix:disp_getwork+b7 () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5c00 unix:idle+bc () > Jul 17 03:31:22 mir-zfs02 genunix: [ID 655072 kern.notice] > fffffcc269cc5c10 unix:thread_start+8 () > Jul 17 03:31:22 mir-zfs02 unix: [ID 100000 kern.notice] > Jul 17 03:31:21 mir-zfs02 genunix: [ID 111219 kern.notice] dumping to > /dev/zvol/dsk/dump/dump, offset 65536, content: kernel > Jul 17 03:31:21 mir-zfs02 ahci: [ID 405573 kern.info] NOTICE: ahci0: > ahci_tran_reset_dport port 3 reset port > Jul 17 03:40:13 mir-zfs02 genunix: [ID 100000 kern.notice] > Jul 17 03:40:13 mir-zfs02 genunix: [ID 665016 kern.notice] ^M100% done: > 16798672 pages dumped, > Jul 17 03:40:13 mir-zfs02 genunix: [ID 851671 kern.notice] dump succeeded > > I'm guessing the HBA may be the cause because of the genunix in the > stack. Am I on the right track, or is there something else I could try > first? > > -Chip > *illumos <https://illumos.topicbox.com/latest>* / illumos-discuss / see > discussions <https://illumos.topicbox.com/groups/discuss> + participants > <https://illumos.topicbox.com/groups/discuss/members> + delivery options > <https://illumos.topicbox.com/groups/discuss/subscription> Permalink > <https://illumos.topicbox.com/groups/discuss/Td764b67a7e01daf0-Mcf7c049413c12563e9d0b80c> > > ------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/Td764b67a7e01daf0-Mc272a756ad3ae87479dad5eb Delivery options: https://illumos.topicbox.com/groups/discuss/subscription
