https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208957
Bug ID: 208957 Summary: Kernel panic (page fault) on 10.3-STABLE with VIMAGE & Infiniband modules Product: Base System Version: 10.3-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-b...@freebsd.org Reporter: jus...@postgresql.org CC: freebsd-amd64@FreeBSD.org CC: freebsd-amd64@FreeBSD.org The VIMAGE option is causing a kernel panic (page fault) when compiled along with the Infiniband options on 10.3-STABLE. It's 100% reproducible, and easily triggered. ;) Note - compiled this multiple times over the last few days, across several systems, just to ensure it's not due to bad hw in a system. It panic reliably every time, on them all. Definitely a software bug of some sort. Note - Anecdotal evidence suggests the repeated problems of VIMAGE + Infiniband is a large part of the reason Infiniband isn't supported on FreeNAS. The NAS4Free project also has difficulties with Infiniband, very likely also due to this. :( https://bugs.freenas.org/issues/2014#note-18 Anyway, backtrace info below in case it helps: (commands taken from https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html) *********************************************************************************** root@cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq271: mlx4_core0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff807263d0 at kdb_backtrace+0x60 #1 0xffffffff806e8c76 at vpanic+0x126 #2 0xffffffff806e8b43 at panic+0x43 #3 0xffffffff80b8bf3b at trap_fatal+0x36b #4 0xffffffff80b8c23d at trap_pfault+0x2ed #5 0xffffffff80b8b8ba at trap+0x47a #6 0xffffffff80b71892 at calltrap+0x8 #7 0xffffffff807be1a2 at netisr_dispatch_src+0x62 #8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a #9 0xffffffff808fcc98 at ipoib_ib_completion+0x78 #10 0xffffffff80930c43 at mlx4_cq_completion+0x63 #11 0xffffffff80933d43 at mlx4_eq_int+0x2c3 #12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc #13 0xffffffff806b35cb at intr_event_execute_handlers+0xab #14 0xffffffff806b3a16 at ithread_loop+0x96 #15 0xffffffff806b104a at fork_exit+0x9a #16 0xffffffff80b71dce at fork_trampoline+0xe Uptime: 3m47s Dumping 485 out of 7857 MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93% Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) list *0xffffffff808f89fa 0xffffffff808f89fa is in ipoib_cm_handle_rx_wc (/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565). 560 mb->m_pkthdr.rcvif = dev; 561 proto = *mtod(mb, uint16_t *); 562 m_adj(mb, IPOIB_ENCAP_LEN); 563 564 IPOIB_MTAP_PROTO(dev, mb, proto); 565 ipoib_demux(dev, mb, ntohs(proto)); 566 567 repost: 568 if (has_srq) { 569 if (unlikely(ipoib_cm_post_receive_srq(priv, wr_id))) Current language: auto; currently minimal (kgdb) list *0xffffffff807be1a2 0xffffffff807be1a2 is in netisr_dispatch_src (/usr/src/sys/net/netisr.c:976). 971 if (dispatch_policy == NETISR_DISPATCH_DIRECT) { 972 nwsp = DPCPU_PTR(nws); 973 npwp = &nwsp->nws_work[proto]; 974 npwp->nw_dispatched++; 975 npwp->nw_handled++; 976 netisr_proto[proto].np_handler(m); 977 error = 0; 978 goto out_unlock; 979 } 980 (kgdb) list *0xffffffff80b71892 0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238. 233 .type calltrap,@function 234 calltrap: 235 movq %rsp,%rdi 236 call trap 237 MEXITCOUNT 238 jmp doreti /* Handle any pending ASTs */ 239 240 /* 241 * alltraps_noen entry point. Unlike alltraps above, we want to 242 * leave the interrupts disabled. This corresponds to (kgdb) list *0xffffffff80b8b8ba 0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447). 442 443 KASSERT(cold || td->td_ucred != NULL, 444 ("kernel trap doesn't have ucred")); 445 switch (type) { 446 case T_PAGEFLT: /* page fault */ 447 (void) trap_pfault(frame, FALSE); 448 goto out; 449 450 case T_DNA: 451 KASSERT(!PCB_USER_FPU(td->td_pcb), (kgdb) *********************************************************************************** Kernel configuration used: --- # # GENERIC -- Generic kernel configuration file for FreeBSD/amd64 # # For more information on this file, please read the config(5) manual page, # and/or the handbook section on Kernel Configuration Files: # # http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD: stable/10/sys/amd64/conf/GENERIC 286132 2015-07-31 15:25:07Z gjb $ cpu HAMMER ident CONNECTX2 makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support ##################################################################### # NETWORKING OPTIONS # # DEVICE_POLLING adds support for mixed interrupt-polling handling # of network device drivers, which has significant benefits in terms # of robustness to overloads and responsivity, as well as permitting # accurate scheduling of the CPU time between kernel network processing # and other activities. The drawback is a moderate (up to 1/HZ seconds) # potential increase in response times. # It is strongly recommended to use HZ=1000 or 2000 with DEVICE_POLLING # to achieve smoother behaviour. # Additionally, you can enable/disable polling at runtime with help of # the ifconfig(8) utility, and select the CPU fraction reserved to # userland with the sysctl variable kern.polling.user_frac # (default 50, range 0..100). # # Not all device drivers support this mode of operation at the time of # this writing. See polling(4) for more details. options DEVICE_POLLING # BPF_JITTER adds support for BPF just-in-time compiler. options BPF_JITTER # OpenFabrics Enterprise Distribution (Infiniband). options OFED options OFED_DEBUG_INIT # Sockets Direct Protocol options SDP options SDP_DEBUG # IP over Infiniband options IPOIB options IPOIB_DEBUG options IPOIB_CM ##################################################################### options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options TCP_OFFLOAD # TCP offload options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options QUOTA # Enable disk quotas for UFS options MD_ROOT # MD is a potential root device options NFSCL # New Network Filesystem Client options NFSD # New Network Filesystem Server options NFSLOCKD # Network Lock Manager options NFS_ROOT # NFS usable as /, requires NFSCL options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_RAID # Soft RAID functionality. options GEOM_LABEL # Provides labelization options COMPAT_FREEBSD32 # Compatible with i386 binaries options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options COMPAT_FREEBSD7 # Compatible with FreeBSD7 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options STACK # stack(9) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options PRINTF_BUFR_SIZE=128 # Prevent printf output being interspersed. options KBD_INSTALL_CDEV # install a CDEV entry in /dev options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4) options AUDIT # Security event auditing options CAPABILITY_MODE # Capsicum capability mode options CAPABILITIES # Capsicum capabilities options PROCDESC # Support for process descriptors options MAC # TrustedBSD MAC Framework options KDTRACE_FRAME # Ensure frames are compiled in options KDTRACE_HOOKS # Kernel DTrace hooks options DDB_CTF # Kernel ELF linker loads CTF data options INCLUDE_CONFIG_FILE # Include this file in kernel options RACCT # Resource accounting framework options RACCT_DEFAULT_TO_DISABLED # Set kern.racct.enable=0 by default options RCTL # Resource limits # Debugging support. Always need this: options KDB # Enable kernel debugger support. options KDB_TRACE # Print a stack trace for a panic. # Make an SMP-capable kernel by default options SMP # Symmetric MultiProcessor Kernel # CPU frequency control device cpufreq # Bus support. device acpi options ACPI_DMAR device pci # Floppy drives #device fdc # ATA controllers device ahci # AHCI-compatible SATA controllers device ata # Legacy ATA/SATA controllers options ATA_STATIC_ID # Static device numbering device mvs # Marvell 88SX50XX/88SX60XX/88SX70XX/SoC SATA device siis # SiliconImage SiI3124/SiI3132/SiI3531 SATA # SCSI Controllers device ahc # AHA2940 and onboard AIC7xxx devices options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. device ahd # AHA39320/29320 and onboard AIC79xx devices options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. device esp # AMD Am53C974 (Tekram DC-390(T)) device hptiop # Highpoint RocketRaid 3xxx series device isp # Qlogic family device ispfw # Firmware for QLogic HBAs- normally a module device mpt # LSI-Logic MPT-Fusion device mps # LSI-Logic MPT-Fusion 2 device mpr # LSI-Logic MPT-Fusion 3 device ncr # NCR/Symbios Logic device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') device trm # Tekram DC395U/UW/F DC315U adapters device adv # Advansys SCSI adapters device adw # Advansys wide SCSI adapters device aic # Adaptec 15[012]x SCSI adapters, AIC-6[23]60. device bt # Buslogic/Mylex MultiMaster SCSI adapters device isci # Intel C600 SAS controller # ATA/SCSI peripherals device scbus # SCSI bus (required for ATA/SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct ATA/SCSI access) device ses # Enclosure Services (SES and SAF-TE) #device ctl # CAM Target Layer # RAID controllers interfaced to the SCSI subsystem device amr # AMI MegaRAID device arcmsr # Areca SATA II RAID #XXX it is not 64-bit clean, -scottl #device asr # DPT SmartRAID V, VI and Adaptec SCSI RAID device ciss # Compaq Smart RAID 5* device dpt # DPT Smartcache III, IV - See NOTES for options device hptmv # Highpoint RocketRAID 182x device hptnr # Highpoint DC7280, R750 device hptrr # Highpoint RocketRAID 17xx, 22xx, 23xx, 25xx device hpt27xx # Highpoint RocketRAID 27xx device iir # Intel Integrated RAID device ips # IBM (Adaptec) ServeRAID device mly # Mylex AcceleRAID/eXtremeRAID device twa # 3ware 9000 series PATA/SATA RAID device tws # LSI 3ware 9750 SATA+SAS 6Gb/s RAID controller # RAID controllers #device aac # Adaptec FSA RAID #device aacp # SCSI passthrough for aac (requires CAM) #device aacraid # Adaptec by PMC RAID #device ida # Compaq Smart RAID #device mfi # LSI MegaRAID SAS #device mlx # Mylex DAC960 family #device mrsas # LSI/Avago MegaRAID SAS/SATA, 6Gb/s and 12Gb/s #XXX PCI ID conflicts with ahd(4) and mvs(4) #device pmspcv # PMC-Sierra SAS/SATA Controller driver #XXX pointer/int warnings #device pst # Promise Supertrak SX6000 #device twe # 3ware ATA RAID # NVM Express (NVMe) support device nvme # base NVMe driver device nvd # expose NVMe namespaces as disks, depends on nvme # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver options VESA # Add support for VESA BIOS Extensions (VBE) device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc options SC_PIXEL_MODE # add support for the raster text mode # vt is the new video console driver device vt device vt_vga device vt_efifb device agp # support several AGP chipsets # PCCARD (PCMCIA) support # PCMCIA and cardbus bridge support device cbb # cardbus (yenta) bridge device pccard # PC Card (16-bit) bus device cardbus # CardBus (32-bit) bus # Serial (COM) ports device uart # Generic UART driver # Parallel port device ppc device ppbus # Parallel port bus (required) device lpt # Printer device ppi # Parallel port interface device device vpo # Requires scbus and da device puc # Multi I/O cards and multi-channel UARTs # PCI Ethernet NICs. #device bxe # Broadcom NetXtreme II BCM5771X/BCM578XX 10GbE #device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel PRO/1000 Gigabit Ethernet Family #device igb # Intel PRO/1000 PCIE Server Gigabit Family #device ix # Intel PRO/10GbE PCIE PF Ethernet #device ixv # Intel PRO/10GbE PCIE VF Ethernet #device ixl # Intel XL710 40Gbe PCIE Ethernet #device ixlv # Intel XL710 40Gbe VF PCIE Ethernet device mlx4ib # Mellanox ConnectX HCA InfiniBand device mlxen # Mellanox ConnectX HCA Ethernet device mthca # Mellanox HCA InfiniBand #device le # AMD Am7900 LANCE and Am79C9xx PCnet #device ti # Alteon Networks Tigon I/II gigabit Ethernet #device txp # 3Com 3cR990 (``Typhoon'') #device vx # 3Com 3c590, 3c595 (``Vortex'') # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support #device ae # Attansic/Atheros L2 FastEthernet #device age # Attansic/Atheros L1 Gigabit Ethernet #device alc # Atheros AR8131/AR8132 Ethernet #device ale # Atheros AR8121/AR8113/AR8114 Ethernet #device bce # Broadcom BCM5706/BCM5708 Gigabit Ethernet #device bfe # Broadcom BCM440x 10/100 Ethernet #device bge # Broadcom BCM570xx Gigabit Ethernet #device cas # Sun Cassini/Cassini+ and NS DP83065 Saturn #device dc # DEC/Intel 21143 and various workalikes #device et # Agere ET1310 10/100/Gigabit Ethernet #device fxp # Intel EtherExpress PRO/100B (82557, 82558) #device gem # Sun GEM/Sun ERI/Apple GMAC #device hme # Sun HME (Happy Meal Ethernet) #device jme # JMicron JMC250 Gigabit/JMC260 Fast Ethernet #device lge # Level 1 LXT1001 gigabit Ethernet #device msk # Marvell/SysKonnect Yukon II Gigabit Ethernet #device nfe # nVidia nForce MCP on-board Ethernet #device nge # NatSemi DP83820 gigabit Ethernet #device nve # nVidia nForce MCP on-board Ethernet Networking #device pcn # AMD Am79C97x PCI 10/100 (precedence over 'le') device re # RealTek 8139C+/8169/8169S/8110S #device rl # RealTek 8129/8139 #device sf # Adaptec AIC-6915 (``Starfire'') #device sge # Silicon Integrated Systems SiS190/191 #device sis # Silicon Integrated Systems SiS 900/SiS 7016 #device sk # SysKonnect SK-984x & SK-982x gigabit Ethernet #device ste # Sundance ST201 (D-Link DFE-550TX) #device stge # Sundance/Tamarack TC9021 gigabit Ethernet #device tl # Texas Instruments ThunderLAN #device tx # SMC EtherPower II (83c170 ``EPIC'') #device vge # VIA VT612x gigabit Ethernet #device vr # VIA Rhine, Rhine II #device wb # Winbond W89C840F #device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # ISA Ethernet NICs. pccard NICs included. #device cs # Crystal Semiconductor CS89x0 NIC # 'device ed' requires 'device miibus' #device ed # NE[12]000, SMC Ultra, 3c503, DS8390 cards #device ex # Intel EtherExpress Pro/10 and Pro/10+ #device ep # Etherlink III based cards #device fe # Fujitsu MB8696x based cards #device sn # SMC's 9000 series of Ethernet chips #device xe # Xircom pccard Ethernet # Wireless NIC cards #device wlan # 802.11 support #options IEEE80211_DEBUG # enable debug msgs #options IEEE80211_AMPDU_AGE # age frames in AMPDU reorder q's #options IEEE80211_SUPPORT_MESH # enable 802.11s draft support #device wlan_wep # 802.11 WEP support #device wlan_ccmp # 802.11 CCMP support #device wlan_tkip # 802.11 TKIP support #device wlan_amrr # AMRR transmit rate control algorithm #device an # Aironet 4500/4800 802.11 wireless NICs. #device ath # Atheros NICs #device ath_pci # Atheros pci/cardbus glue #device ath_hal # pci/cardbus chip support #options AH_SUPPORT_AR5416 # enable AR5416 tx/rx descriptors #options AH_AR5416_INTERRUPT_MITIGATION # AR5416 interrupt mitigation #options ATH_ENABLE_11N # Enable 802.11n support for AR5416 and later #device ath_rate_sample # SampleRate tx rate control for ath #device bwi # Broadcom BCM430x/BCM431x wireless NICs. #device bwn # Broadcom BCM43xx wireless NICs. #device ipw # Intel 2100 wireless NICs. #device iwi # Intel 2200BG/2225BG/2915ABG wireless NICs. #device iwn # Intel 4965/1000/5000/6000 wireless NICs. #device malo # Marvell Libertas wireless NICs. #device mwl # Marvell 88W8363 802.11n wireless NICs. #device ral # Ralink Technology RT2500 wireless NICs. #device wi # WaveLAN/Intersil/Symbol 802.11 wireless NICs. #device wpi # Intel 3945ABG wireless NICs. # Pseudo devices. device loop # Network loopback device random # Entropy device device padlock_rng # VIA Padlock RNG device rdrand_rng # Intel Bull Mountain RNG device ether # Ethernet support device vlan # 802.1Q VLAN support device tun # Packet tunnel. device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # USB support options USB_DEBUG # enable debug msgs device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device xhci # XHCI PCI->USB interface (USB 3.0) device usb # USB Bus (required) device ukbd # Keyboard device umass # Disks/Mass storage - Requires scbus and da # Sound support device sound # Generic sound driver (required) #device snd_cmi # CMedia CMI8338/CMI8738 #device snd_csa # Crystal Semiconductor CS461x/428x #device snd_emu10kx # Creative SoundBlaster Live! and Audigy #device snd_es137x # Ensoniq AudioPCI ES137x device snd_hda # Intel High Definition Audio device snd_ich # Intel, NVidia and other ICH AC'97 Audio device snd_via8233 # VIA VT8233x Audio # MMC/SD device mmc # MMC/SD bus device mmcsd # MMC/SD memory card device sdhci # Generic PCI SD Host Controller # VirtIO support device virtio # Generic VirtIO bus (required) device virtio_pci # VirtIO PCI device device vtnet # VirtIO Ethernet device device virtio_blk # VirtIO Block device device virtio_scsi # VirtIO SCSI device device virtio_balloon # VirtIO Memory Balloon device # HyperV drivers and enchancement support # NOTE: HYPERV depends on hyperv. They must be added or removed together. options HYPERV # Hyper-V kernel infrastructure device hyperv # HyperV drivers # Xen HVM Guest Optimizations # NOTE: XENHVM depends on xenpci. They must be added or removed together. options XENHVM # Xen HVM kernel infrastructure device xenpci # Xen HVM Hypervisor services driver # VMware support device vmx # VMware VMXNET3 Ethernet # 2016-04-21 JC Added VIMAGE just to verify it's the crash causer options VIMAGE --- -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ freebsd-amd64@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-amd64 To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"