------- Comment From bjki...@us.ibm.com 2016-01-26 11:27 EDT------- Relevant patch is in vivid, which is 14.04.3. Moving to accepted awaiting verification.
** Tags removed: targetmilestone-inin14044 ** Tags added: targetmilestone-inin14043 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1483170 Title: NVidia: Ubuntu: OS crashed into xmon Prompt; scsi_report_bus_reset Status in linux package in Ubuntu: New Bug description: Problem Description: ==================== This system is running non-virtualized ubuntu with one nvidia k80 GPU. During a hardbootme run the OS crashed. Here are the details from xmon: 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c000003ffff8f3b0] pc: c00000000069ba80: scsi_report_bus_reset+0x60/0xb0 lr: d00000001cae524c: ipr_erp_start+0x3bc/0x644 [ipr] sp: c000003ffff8f630 msr: 9000000000009033 dar: 100178 dsisr: 40000000 current = 0xc000000001359b10 paca = 0xc00000000fb80000 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/0 0:mon> r R00 = d00000001cae524c R16 = 0000000000200000 R01 = c000003ffff8f630 R17 = 0000000000000000 R02 = c0000000013d8028 R18 = 00000000fffefa58 R03 = c000000fdcb00000 R19 = c000000000e4a000 R04 = 0000000000000000 R20 = c000000001412180 R05 = 0000000000000002 R21 = 0000000000000001 R06 = 0000000000000067 R22 = 0000000000000002 R07 = 0000000006290000 R23 = 00000000000001f0 R08 = 0000000000000001 R24 = c00000001010ea00 R09 = 00000000001000f0 R25 = c000000fdcb00730 R10 = 00000000000000ff R26 = 0000000000000001 R11 = d00000001cae6518 R27 = 0000000006290000 R12 = c00000000069ba20 R28 = c000000fdce40cf0 R13 = c00000000fb80000 R29 = c000000fa4c50300 R14 = c00000000135a120 R30 = 0000000000000000 R15 = 0000000000000000 R31 = c000000fdcb00000 pc = c00000000069ba80 scsi_report_bus_reset+0x60/0xb0 cfar= c000000000009368 slb_miss_realmode+0x50/0x78 lr = d00000001cae524c ipr_erp_start+0x3bc/0x644 [ipr] msr = 9000000000009033 cr = 28044444 ctr = c00000000069ba20 xer = 0000000000000000 trap = 300 dar = 0000000000100178 dsisr = 40000000 0:mon> t [c000003ffff8f660] d00000001cae524c ipr_erp_start+0x3bc/0x644 [ipr] [c000003ffff8f6c0] d00000001caddb20 ipr_scsi_done+0x100/0x120 [ipr] [c000003ffff8f700] d00000001cadc5bc ipr_isr_mhrrq+0x10c/0x250 [ipr] [c000003ffff8f760] c00000000012ff90 handle_irq_event_percpu+0x90/0x2b0 [c000003ffff8f820] c000000000130218 handle_irq_event+0x68/0xd0 [c000003ffff8f850] c000000000135380 handle_fasteoi_irq+0xe0/0x250 [c000003ffff8f880] c00000000012f188 generic_handle_irq+0x58/0x90 [c000003ffff8f8b0] c0000000000119d0 __do_irq+0x80/0x190 [c000003ffff8f8e0] c000000000011bec do_IRQ+0x10c/0x120 [c000003ffff8f940] c000000000002794 hardware_interrupt_common+0x114/0x180 --- Exception: 501 (Hardware Interrupt) at c0000000006a45b4 scsi_io_completion+0x1e4/0x800 [c000003ffff8fd00] c00000000069662c scsi_finish_command+0x15c/0x1b0 [c000003ffff8fd80] c0000000006a41d8 scsi_softirq_done+0x198/0x200 [c000003ffff8fe00] c0000000004cbbd4 blk_done_softirq+0xb4/0xe0 [c000003ffff8fe40] c0000000000b5244 __do_softirq+0x174/0x3e0 [c000003ffff8ff30] c0000000000b5888 irq_exit+0xf8/0x140 [c000003ffff8ff60] c0000000000119dc __do_irq+0x8c/0x190 [c000003ffff8ff90] c000000000025320 call_do_irq+0x14/0x24 [c0000000013d7840] c000000000011b80 do_IRQ+0xa0/0x120 [c0000000013d78a0] c000000000002794 hardware_interrupt_common+0x114/0x180 --- Exception: 501 (Hardware Interrupt) at c0000000000110d4 arch_local_irq_restore+0x74/0x90 [c0000000013d7b90] c0000000000162f8 __switch_to+0x208/0x350 (unreliable) [c0000000013d7bb0] c0000000000ef70c finish_task_switch+0x7c/0x1e0 [c0000000013d7bf0] c0000000009d6c40 __schedule+0x370/0x910 [c0000000013d7e10] c0000000009d7880 schedule_preempt_disabled+0x20/0x30 [c0000000013d7e30] c0000000001121e4 cpu_startup_entry+0x1c4/0x500 [c0000000013d7ee0] c00000000000ccd4 rest_init+0xa4/0xc0 [c0000000013d7f00] c000000000d53e4c start_kernel+0x520/0x53c [c0000000013d7f90] c000000000009b6c start_here_common+0x20/0xa8 0:mon> == Comment: #1 - Brian J. King <bjki...@us.ibm.com> - 2015-05-28 17:08:13 == Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes a crash seen as the __devices list in the scsi host was changing as we were iterating through it. == Comment: #8 - Wen Xiong <wenxi...@us.ibm.com> - 2015-08-06 11:09:25 == Release of bug changed to Ubuntu14.04. He has tested the patch and " yes the patch worked". We have upstream the patch last month. Here is the commit link: https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/drivers/scsi/ipr.c?h=misc&id=36b8e180e1e929e00b351c3b72aab3147fc14116 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483170/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp