The fix for this bug should be included in the next kernel SRU which is scheduled for 9/26.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1487085 Title: Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Committed Bug description: SRU Justification: [Impact] Users of 3.19 kernel with power8 machines get a kernel crash on boot. [Test Case] Boot system. [Fix] commit 792f96e9a769b799a2944e9369e4ea1e467135b2 needed to be backported in addition to d7cf83fcaf1b1668201eae4cdd6e6fe7a2448654. Our 3.19 kernel had a partial backport of the first patch. -- ---Problem Description--- Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login. This happens every time I boot Ubuntu 14.04.3 LTS. I've reinstalled Ubuntu and replaced the hard disk as well and re-installed. Still crashing. ---uname output--- Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Palmetto ---System Hang--- Ubuntu OS crashes and cannot access host. Must reboot system ---Steps to Reproduce--- Boot system Oops output: [ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000 [ 33.132565] Faulting instruction address: 0xc0000000000dbc60 [ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1] [ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV [ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci [ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu [ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000 [ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000 [ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic) [ 33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002888 XER: 00000000 [ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0 GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0 GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000 GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003 GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000 GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880 GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012 GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8 GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff [ 33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100 [ 33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.162090] Call Trace: [ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable) [ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100 [ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100 [ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0 [ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70 [ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0 [ 33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0 [ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240 [ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90 [ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190 [ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24 [ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120 [ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180 [ 33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90 [ 33.184907] LR = arch_local_irq_restore+0x40/0x90 [ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable) [ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260 [ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0 [ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0 [ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558 [ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8 [ 33.196569] Instruction dump: [ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c [ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378 [ 33.202763] ---[ end trace 71076895a9f126ba ]--- [ 33.202836] [ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt [ 35.203727] drm_kms_helper: panic occurred, switching back to text console [ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue. commit 792f96e9a769b799a2944e9369e4ea1e467135b2 Author: Neelesh Gupta <neele...@linux.vnet.ibm.com> Date: Wed Feb 11 11:57:06 2015 +0530 powerpc/powernv: Fix the overflow of OPAL message notifiers head array Fixes the condition check of incoming message type which can otherwise shoot beyond the message notifiers head array. Signed-off-by: Neelesh Gupta <neele...@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevas...@linux.vnet.ibm.com> Reviewed-by: Anshuman Khandual <khand...@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> Below is the hunk from above commit, which is missing from ubuntu 14.04.3: ------------------------------------------------ @@ -354,7 +350,7 @@ static void opal_handle_message(void) type = be32_to_cpu(msg.msg_type); /* Sanity check */ - if (type > OPAL_MSG_TYPE_MAX) { + if (type >= OPAL_MSG_TYPE_MAX) { pr_warning("%s: Unknown message type: %u\n", __func__, type); return; } ------------------------------------------------ I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3 kernel sources. We should mirror this bug to ubuntu and ask them to apply above hunk. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1487085/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp