[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #12 from Josh Gitlin --- (In reply to Andriy Gapon from comment #10) > it's quite possible that ARC contributes to the problem but > there is a bug in kmem_back / kmem_malloc. This is what I felt as well when reading the source. I didn't see any specific out of memory error, but rather a page fault which (to my untrained eye) looked like the kernel trying to access a KVA page that did not exist. But I was very unsure of my theory that it was a bug as opposed to a misconfiguration. What I found odd was that we had crashes on production systems where the config in place hadn't changed in years... -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #11 from rai...@ultra-secure.de --- Hi, I compiled a kernel myself with make buildkernel && make installkernel I had thought the debug kernel lived next to the kernel in /boot/kernel... (ewserv-log03-prod ) 0 # uname -a FreeBSD ewserv-log03-prod.everyware.zone 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Fri Sep 28 16:37:02 CEST 2018 r...@ewserv-log03-prod.everyware.zone:/usr/obj/usr/src/sys/GENERIC amd64 (ewserv-log03-prod ) 0 # ll /usr/lib/debug/boot/kernel/kernel.debug -r-xr-xr-x 1 root wheel 86179448 Sep 28 16:37 /usr/lib/debug/boot/kernel/kernel.debug (ewserv-log03-prod ) 0 # ll /boot/kernel/kernel -r-xr-xr-x 1 root wheel 27781528 Sep 28 16:37 /boot/kernel/kernel because I wasn't sure if the default kernel package contains a kernel with debug-symbols. What is the correct way to get a kernel with debug-symbols? I can reboot and run my tests again without the ARC reduction, to make sure this is the kernel that is producing the crashdump. It needed less than an hour to lock up. We would like to get this server back into production, but for now I can do whatever is necessary to solve this problem (apart from allowing direct logins - I'd have to wipe it) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #10 from Andriy Gapon --- (In reply to rainer from comment #9) The problem might be similar but it is certainly different. In the other bug they are getting a panic (unfortunately the panic message is not shown), while you are getting a fatal trap / page fault. Also, in your case there is no ARC calls in the stack trace. It's straight from the ZIO code to the VM code. So, it's quite possible that ARC contributes to the problem (e.g., by creating a memory pressure or some such), but there is a bug in kmem_back / kmem_malloc. Finally, in comment #3 the stack trace recorded by ddb and the stack trace shown by kgdb do not match. I suspect that that is because you passed a wrong kernel to kgdb or /usr/lib/debug/boot/kernel does not match /boot/kernel. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #9 from rai...@ultra-secure.de --- Hi, Firmware revision is 1.60 (from HPE website). But it seems it is an ARC problem that just did not materialize on my other servers because ARC was limited there already, but is actually pretty widespread. Also, one of the first panics we got had the driver-name in the backtrace somewhere - but that was on the old firmware. I was notified of this PR privately: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231794 which seems to describe a similar problem. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 Deepak Ukey changed: What|Removed |Added CC||deepak.u...@microsemi.com --- Comment #8 from Deepak Ukey --- Hi, Can you please tell me how to reproduce the issue or what are steps causing this panic. Also can you please provide me the what is firmware version you are using for E208i-p SR Gen10 / P408i-a SR Gen10 cards so that i can try reproducing this on my setup and help you to resolve this. Thanks. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #7 from rai...@ultra-secure.de --- At least, it ran through the night. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #6 from rai...@ultra-secure.de --- OK. This is a setting that I have in my sysctl.conf.local but commented out by default (because not all hosts use ZFS and I somehow thought that it's only needed on hosts that do other stuff). I stumbled about this PR[1], too, a while ago and I have adjusted it on my ZFS hosts. Just not on this one because this one isn't supposed to run much else - other hosts run mysql and/or apache+php+nginx etc.pp. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229764 or rather, I took my settings from this one: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=163461 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 Josh Gitlin changed: What|Removed |Added CC||jgitlin+freebsd@goboomtown. ||com --- Comment #5 from Josh Gitlin --- I have experienced nearly the same issue, and requested help from the freebsd-fs list as I thought it might have been related to a kernel change or misconfiguration (even though the config we were using had not changed) See: https://lists.freebsd.org/pipermail/freebsd-fs/2018-September/026725.html Panic stack trace we saw was the exact same, happened under ZFS load (but not unusually high load, not higher than we've seen in production before) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #4 from rai...@ultra-secure.de --- BTW: I've been running memtest86 v7.5 (the free edition of the commercial version that does UEFI) in this for 8h and it showed no error. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #3 from rai...@ultra-secure.de --- After updating the firmware, I still get panics. The handbook should be clearer about the fact that you can't get a crashdump from ZFS. After adding an additional swap-partition on an USB drive, I got this crash-dump: (ewserv-log03-prod ) 0 # kgdb /boot/kernel/kernel /var/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x5a fault code = supervisor read data, page not present instruction pointer = 0x20:0x80dff90d stack pointer = 0x28:0xfe084ed93f00 frame pointer = 0x28:0xfe084ed93f40 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_10) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0x80b3d567 at kdb_backtrace+0x67 #1 0x80af6b07 at vpanic+0x177 #2 0x80af6983 at panic+0x43 #3 0x80f77fcf at trap_fatal+0x35f #4 0x80f78029 at trap_pfault+0x49 #5 0x80f777f7 at trap+0x2c7 #6 0x80f57dac at calltrap+0x8 #7 0x80dee7e2 at kmem_back+0xf2 #8 0x80dee6c0 at kmem_malloc+0x60 #9 0x80de6172 at keg_alloc_slab+0xe2 #10 0x80de8b7e at keg_fetch_slab+0x14e #11 0x80de83b4 at zone_fetch_slab+0x64 #12 0x80de848f at zone_import+0x3f #13 0x80de4b99 at uma_zalloc_arg+0x3d9 #14 0x82351ab2 at zio_write_compress+0x1e2 #15 0x8235074c at zio_execute+0xac #16 0x80b4ed74 at taskqueue_run_locked+0x154 #17 0x80b4fed8 at taskqueue_thread_loop+0x98 Uptime: 40m34s Dumping 5489 out of 32379 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...done. done. Loaded symbols for /boot/kernel/accf_data.ko Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...done. done. Loaded symbols for /boot/kernel/accf_http.ko Reading symbols from /boot/kernel/cc_htcp.ko...Reading symbols from /usr/lib/debug//boot/kernel/cc_htcp.ko.debug...done. done. Loaded symbols for /boot/kernel/cc_htcp.ko Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done. done. Loaded symbols for /boot/kernel/ums.ko Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/tmpfs.ko.debug...done. done. Loaded symbols for /boot/kernel/tmpfs.ko #0 0x80af68fb in doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:309 309 if (dumping) (kgdb) bt #0 0x80af68fb in doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:309 #1 0x80af6925 in doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:315 #2 0x80af671b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:382 #3 0x80af6b41 in vpanic (fmt=, ap=0xfe084ed93c50) at /usr/src/sys/kern/kern_shutdown.c:769 #4 0x80af6983 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:706 #5 0x80f77fcf in trap_fatal (frame=0xfe084ed93e40, eva=90) at /usr/src/sys/amd64/amd64/trap.c:875 #6 0x80f78029 in trap_pfault (frame=0xfe084ed93e40, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:712 #7 0x80f777f7 in trap (frame=0xfe084ed93e40) at /usr/src/sys/amd64/amd64/trap.c:514 #8 0x80f57dac in Xtss_pti () at /usr/src/sys/amd64/amd64/exception.S:159 #9 0x80dff90d in vm_page_rename (m=0x3ff, new_object=0xf80018d8d000, new_pindex=) at /usr/src/sys/vm/vm_page.c:1342 #10 0x80dee7e2 in kmem_suballoc (parent=0x262, min=0x14000, max=0x81ebc558, size=874980, superpage_align=) at /usr/src/sys/vm/vm_kern.c:290 #11
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 Mark Linimon changed: What|Removed |Added Keywords||panic -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 --- Comment #2 from rai...@ultra-secure.de --- You are right, there is an update on HPE's website. Unfortunately, it's not yet part of an SPP. So I'll have to figure out a way to install it. Thanks a lot. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 Yuri Pankov changed: What|Removed |Added CC||yur...@yuripv.net --- Comment #1 from Yuri Pankov --- Just for the note (I have no idea if it's related or if there's relevant firmware update from HPE): 1.34 firmware you seem to be running was unstable for me as well with Microsemi HBA 1100-8i, solved by updating to 1.60 from Microsemi site. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 231296] smartpqi - kernel panics
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 Bug ID: 231296 Summary: smartpqi - kernel panics Product: Base System Version: 11.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: rai...@ultra-secure.de Created attachment 197020 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=197020=edit pic of kernel panic Hi, this is a a HPE DL380 Gen10 system. smartpqi0: port 0x4000-0x40ff mem 0xe280-0xe2807fff at device 0.0 numa-domain 0 on pci4 smartpqi0: using MSI-X interrupts (16 vectors) smartpqi1: port 0xc000-0xc0ff mem 0xf380-0xf3807fff at device 0.0 numa-domain 0 on pci9 smartpqi1: using MSI-X interrupts (16 vectors) (server ) 0 # camcontrol devlist at scbus0 target 64 lun 0 (pass0,da0) at scbus0 target 66 lun 0 (pass1,da1) at scbus0 target 187 lun 0 (pass2,ses0) at scbus0 target 1088 lun 0 (pass3) at scbus1 target 64 lun 0 (pass4,da2) at scbus1 target 65 lun 0 (pass5,da3) at scbus1 target 66 lun 0 (pass6,da4) at scbus1 target 67 lun 0 (pass7,da5) at scbus1 target 68 lun 0 (pass8,da6) at scbus1 target 69 lun 0 (pass9,da7) at scbus1 target 70 lun 0 (pass10,da8) at scbus1 target 71 lun 0 (pass11,da9) at scbus1 target 187 lun 0 (pass12,ses1) at scbus1 target 1088 lun 0 (pass13) at scbus2 target 0 lun 0 (da10,pass14) at scbus3 target 0 lun 0 (da11,pass15) We get very frequent kernel panics. The server is receiving syslogs via syslog-ng314-3.14.1_1 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"