** Package changed: kerneloops (Ubuntu) => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1683699
Title: [LTCTest][Opal][FW860] Oops: Kernel access of bad area, sig: 11 [#1] during frozen PE EEH error injection. Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2016-08-13 08:28:54 == ---Problem Description--- Install P8 PowerNV 8284-22A Hardware with latest FW860 firmware having build SV860_028, and install a ubuntu 16.10 on top of it. During EEH FrozenPE error injection, observed a "Oops: Kernel access of bad area, sig: 11 [#1]" Contact Information = ppaid...@in.ibm.com ---uname output--- Linux lep8b 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:04:07 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = PowerNV 8284-22A ---System Hang--- system is hung and need to do a Hard Power OFF/ON to bring the system up again. ---Debugger--- A debugger is not configured ---Steps to Reproduce--- 1. Install a FW860 SV860_028 level of firmware on a P8 PowerNV 8284-22A Hardware. 2. Install a ubuntu 16.10 on top of it. 3. Inject below frozenPE EEH Error. echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $? 4. Immediately we can observe a kernel Oops. *Additional Instructions for ppaid...@in.ibm.com: -Post a private note with access information to the machine that the bug is occuring on. Call Traces: root@lep8b:~# echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 0004:00:00.0; echo $? [ 271.110859] EEH: Frozen PE#0 on PHB#4 detected [ 271.110967] EEH: PE location: N/A, PHB location: N/A 0004:00:00.0 0604: 1014:03dc 0 root@lep8b:~# [ 277.108098] Unable to handle kernel paging request for data at address 0x00000010 [ 277.108183] Faulting instruction address: 0xc000000000083c7c [ 277.108198] Oops: Kernel access of bad area, sig: 11 [#1] [ 277.108253] SMP NR_CPUS=2048 NUMA PowerNV [ 277.108310] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables leds_powernv ibmpowernv powernv_rng ipmi_powernv uio_pdrv_genirq ipmi_msghandler uio ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure be2net lpfc vxlan ip6_udp_tunnel udp_tunnel scsi_transport_fc ipr [ 277.109391] CPU: 9 PID: 973 Comm: eehd Not tainted 4.4.0-34-generic #53-Ubuntu [ 277.109467] task: c000000feb3c2a20 ti: c000000feb408000 task.ti: c000000feb408000 [ 277.109542] NIP: c000000000083c7c LR: c000000000083c78 CTR: c000000000083c20 [ 277.109617] REGS: c000000feb40b760 TRAP: 0300 Not tainted (4.4.0-34-generic) [ 277.109691] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28008822 XER: 00000000 [ 277.109880] CFAR: c000000000008468 DAR: 0000000000000010 DSISR: 40000000 SOFTE: 1 GPR00: c000000000083c78 c000000feb40b9e0 c0000000015b5d00 0000000000000000 GPR04: 0000000000000001 c000000feb40bac0 c000002d74b54220 0000000000000f9f GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000026 GPR12: c000000000083c20 c000000007b45580 c0000000000e63d8 c000002d74c40100 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d42468 GPR24: c000000000d42440 0000000000000100 c000000000036460 0000000000000000 GPR28: c00000000161a3f0 0000000000000001 c000002ffff81000 c0000000fe440000 [ 277.110878] NIP [c000000000083c7c] pnv_eeh_reset+0x5c/0x170 [ 277.110931] LR [c000000000083c78] pnv_eeh_reset+0x58/0x170 [ 277.110981] Call Trace: [ 277.111009] [c000000feb40b9e0] [c000000000083c78] pnv_eeh_reset+0x58/0x170 (unreliable) [ 277.111098] [c000000feb40ba60] [c000000000038250] eeh_reset_pe+0xb0/0x1c0 [ 277.111175] [c000000feb40bb00] [c000000000af472c] eeh_reset_device+0xd8/0x228 [ 277.111255] [c000000feb40bba0] [c00000000003c4c0] eeh_handle_normal_event+0x390/0x440 [ 277.111429] [c000000feb40bc20] [c00000000003c964] eeh_handle_event+0x184/0x370 [ 277.111601] [c000000feb40bcd0] [c00000000003cd28] eeh_event_handler+0x1d8/0x1e0 [ 277.111772] [c000000feb40bd80] [c0000000000e64e0] kthread+0x110/0x130 [ 277.111910] [c000000feb40be30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 [ 277.112068] Instruction dump: [ 277.112143] 60000000 813f0000 ebdf0010 792affe3 408200d4 e95e0250 812a000c 2f890002 [ 277.112385] 419e0054 7fe3fb78 4bfb7065 60000000 <e9230010> 2fa90000 419e00dc e9290010 [ 277.112629] ---[ end trace a6aa80c26ba676f6 ]--- [ 277.116859] [ 277.116910] Sending IPI to other CPUs [ 277.118085] IPI complete [ 277.120271] kexec: waiting for cpu 0 (physical 32) to enter OPAL -> smp_release_cpus() spinning_secondaries = 191 <- smp_release_cpus() <- setup_system() [ 0.397633] Kernel panic - not syncing: Out of memory and no killable processes... [ 0.397633] [ 0.397769] CPU: 4 PID: 1 Comm: swapper/1 Not tainted 4.4.0-34-generic #53-Ubuntu [ 0.397843] Call Trace: [ 0.397870] [c00000000c583190] [c000000008af983c] dump_stack+0xb0/0xf0 (unreliable) [ 0.397959] [c00000000c5831d0] [c000000008af5a70] panic+0x100/0x2c0 [ 0.398035] [c00000000c583260] [c000000008231e04] out_of_memory+0x5e4/0x5f0 [ 0.398114] [c00000000c583310] [c00000000823a434] __alloc_pages_nodemask+0xc54/0xc90 [ 0.398204] [c00000000c583500] [c0000000082a0a6c] alloc_page_interleave+0x6c/0xe0 [ 0.398292] [c00000000c583550] [c0000000082a1558] alloc_pages_current+0x138/0x1a0 [ 0.398381] [c00000000c5835a0] [c00000000822cdcc] __page_cache_alloc+0x11c/0x160 [ 0.398470] [c00000000c5835e0] [c00000000822cf84] pagecache_get_page+0x174/0x2a0 [ 0.398558] [c00000000c583650] [c00000000822d4b4] grab_cache_page_write_begin+0x54/0x80 [ 0.398646] [c00000000c583690] [c00000000831d484] simple_write_begin+0x54/0x180 [ 0.398735] [c00000000c5836e0] [c00000000822ca64] generic_perform_write+0x104/0x280 [ 0.398823] [c00000000c583780] [c00000000822ed08] __generic_file_write_iter+0x208/0x250 [ 0.398912] [c00000000c5837e0] [c00000000822ee40] generic_file_write_iter+0xf0/0x280 [ 0.399000] [c00000000c583830] [c0000000082e1844] new_sync_write+0xc4/0x120 [ 0.399076] [c00000000c5838d0] [c0000000082e2640] vfs_write+0xc0/0x230 [ 0.399152] [c00000000c583920] [c0000000082e367c] SyS_write+0x6c/0x110 [ 0.399229] [c00000000c583970] [c000000008ea700c] xwrite+0x4c/0xb4 [ 0.399305] [c00000000c5839b0] [c000000008ea7164] do_copy+0xf0/0x170 [ 0.399381] [c00000000c5839e0] [c000000008ea6774] write_buffer+0x5c/0x88 [ 0.399458] [c00000000c583a10] [c000000008ea67fc] flush_buffer+0x5c/0xf0 [ 0.399534] [c00000000c583a60] [c000000008eea034] __gunzip+0x378/0x470 [ 0.399610] [c00000000c583ae0] [c000000008ea75ac] unpack_to_rootfs+0x1f8/0x34c [ 0.399699] [c00000000c583ba0] [c000000008ea7910] populate_rootfs+0x94/0x164 [ 0.399775] [c00000000c583c20] [c00000000800b49c] do_one_initcall+0x12c/0x2a0 [ 0.399852] [c00000000c583cf0] [c000000008ea4204] kernel_init_freeable+0x28c/0x37c [ 0.399940] [c00000000c583dc0] [c00000000800be0c] kernel_init+0x2c/0x160 [ 0.400016] [c00000000c583e30] [c000000008009538] ret_from_kernel_thread+0x5c/0xa4 [ 0.418756] ---[ end Kernel panic - not syncing: Out of memory and no killable processes... [ 0.418756] oot@lep8b:~# uname -a Linux lep8b 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:04:07 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux root@lep8b:~# cat /etc/os-release NAME="Ubuntu" VERSION="16.10 (Yakkety Yak)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.10" VERSION_ID="16.10" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=yakkety root@lep8b:~# update_flash -d Current firwmare version : P side : FW860.00 (SV860_026) T side : FW860.00 (SV860_028) Boot side : FW860.00 (SV860_028) root@lep8b:~# cat /sys/firmware/opal/msglog | grep -i skiboot [45182541432,5] SkiBoot skiboot-5.3.0-rc2 starting... root@lep8b:~# root@lep8b:~# lspci 0000:00:00.0 PCI bridge: IBM Device 03dc 0000:01:00.0 RAID bus controller: IBM Obsidian-E PCI-E SCSI controller (rev 01) 0001:00:00.0 PCI bridge: IBM Device 03dc 0001:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 0001:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 0001:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 0001:02:09.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 0001:03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0001:03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0001:03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0001:03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0001:04:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 01) 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 10) 0002:01:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 10) 0003:00:00.0 PCI bridge: IBM Device 03dc 0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca) 0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0003:04:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 01) 0003:05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:05:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:05:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:05:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0003:0b:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03) 0003:0b:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03) 0004:00:00.0 PCI bridge: IBM Device 03dc 0005:00:00.0 PCI bridge: IBM Device 03dc 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0006:00:00.0 PCI bridge: IBM Device 03dc 0006:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0006:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0006:01:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0006:01:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) == Comment: #1 - Milton D. Miller II <milt...@us.ibm.com> - 2016-09-09 19:05:32 == From the opcode the dereferencing 0x10 from a NULL pointer and the DAR was 0x10 so the pointer was NULL. disassembly of the printed opcodes shows an out of module call was made and the result used as a base, the loaded value compared for NULL, then the loaded value again loaded as a base with the same 16 byte offset. Looking at upstream, eeh_pe_bus_get can return NULL, and in pnv_eeh_reset both the returned bus and the bus->parent are checked for pci_is_root_bus which checks the word at offset 16 for NULL. The parent field is immediately after a list head and lines up. Without looking at the full function disassembly, it would appear that pnv_eeh_reset needs to consider the action if the bus returned from pnv_eeh_reset is NULL before checking if the bus or it parent is a root bus. == Comment: #2 - Russell Currey <rus...@au1.ibm.com> - 2016-09-11 21:46:21 == Thanks for the details Milton, you're right. I'll write a patch to fix this in EEH and make sure all eeh_pe_bus_get calls check for failure. == Comment: #3 - Russell Currey <rus...@au1.ibm.com> - 2016-09-12 00:19:27 == == Comment: #4 - Russell Currey <rus...@au1.ibm.com> - 2016-09-12 00:20:25 == Attached a patch that should stop the oops, can you test? Note that not being able to find a bus is still an issue that we need to find the cause of. == Comment: #5 - Milton D. Miller II <milt...@us.ibm.com> - 2016-09-12 12:36:18 == Originator: There is a second problem that the kdump process failed because it ran out of memory. Please open a second defect to investigate that (unless you are aware of instructions setting up kdump that were not followed). You should be able to recreate that via echo c > /proc/sysrq-trigger and look for the message: [ 0.397633] Kernel panic - not syncing: Out of memory and no killable processes... [note: it appears to have failed unpacking the initrd early in the dump process on your machine. This may be related to the partition definition such as memory size and distribution policy] == Comment: #6 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-11 07:15:03 == @mamatha Please create a ubuntu mirror request for this, the patches are merged in upstream. https://patchwork.ozlabs.org/patch/668552/ Please backport the patches to respective 16.04.2/ 16.10 kernels. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683699/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp