** Tags added: ubuntu-17.04 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1676678
Title: ISST-LTE:dotg6:Kernel access of bad area, sig: 11 - during stress tests Status in The Ubuntu-power-systems project: Incomplete Status in linux package in Ubuntu: Incomplete Bug description: ---Problem Description--- After running stress tests (IO, TCP, BASE) for a few hours, Ubuntu 17.04 KVM guest dotg6 crashed, produced a kdump, and rebooted. ---uname output--- Linux dotg6 4.10.0-13-generic #15-Ubuntu SMP Thu Mar 9 20:27:28 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = KVM guest on a 8247-22L (host also running Ubuntu 17.04) Stack trace output: [ 1909.621800] Oops: Kernel access of bad area, sig: 11 [#1] [ 1909.621870] SMP NR_CPUS=2048 [ 1909.621871] NUMA [ 1909.621925] pSeries [ 1909.622016] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc xfs libcrc32c vmx_crypto sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq dm_service_time crc32c_vpmsum virtio_scsi virtio_net scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath [ 1909.622401] CPU: 2 PID: 27704 Comm: ppc64_cpu Not tainted 4.10.0-13-generic #15-Ubuntu [ 1909.622536] task: c000000042a64200 task.stack: c00000003423c000 [ 1909.622627] NIP: d0000000016a14f4 LR: d0000000016a14a0 CTR: c000000000609d00 [ 1909.622737] REGS: c00000003423f7f0 TRAP: 0380 Not tainted (4.10.0-13-generic) [ 1909.622850] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [ 1909.622860] CR: 24002428 XER: 20000000 [ 1909.623016] CFAR: c00000000061a238 SOFTE: 1 [ 1909.623016] GPR00: d0000000016a14a0 c00000003423fa70 d0000000016ab8cc c000000170fd5000 [ 1909.623016] GPR04: ffffffffffffffff 0000000000000000 0000000000000000 0000000000007530 [ 1909.623016] GPR08: c00000000146c700 757465736d642f6e c00000000146dbe0 d0000000016a2ef8 [ 1909.623016] GPR12: c000000000609d00 c000000001b81200 0000000000000008 0000000000000001 [ 1909.623016] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000046f05f80 [ 1909.623016] GPR20: 0000000046f061f8 0000000000000000 0000000046f05f58 c00000017fd4a808 [ 1909.623016] GPR24: 0000000000000001 c000000170fc7a30 c000000001326eb0 d0000000016a1648 [ 1909.623016] GPR28: c000000001471c28 c000000170fc7860 0000000071ae4a20 0000000000000058 [ 1909.623990] NIP [d0000000016a14f4] __virtscsi_set_affinity+0xac/0x200 [virtio_scsi] [ 1909.624114] LR [d0000000016a14a0] __virtscsi_set_affinity+0x58/0x200 [virtio_scsi] [ 1909.624235] Call Trace: [ 1909.624278] [c00000003423fa70] [d0000000016a14a0] __virtscsi_set_affinity+0x58/0x200 [virtio_scsi] (unreliable) [ 1909.624445] [c00000003423fac0] [d0000000016a1678] virtscsi_cpu_online+0x30/0x70 [virtio_scsi] [ 1909.624746] [c00000003423fae0] [c0000000000db73c] cpuhp_invoke_callback+0x3ec/0x5a0 [ 1909.624887] [c00000003423fb50] [c0000000000dba88] cpuhp_down_callbacks+0x78/0xf0 [ 1909.625037] [c00000003423fba0] [c000000000268bb0] _cpu_down+0x150/0x1b0 [ 1909.625174] [c00000003423fc00] [c0000000000de1b4] do_cpu_down+0x64/0xb0 [ 1909.625330] [c00000003423fc40] [c00000000074b834] cpu_subsys_offline+0x24/0x40 [ 1909.625485] [c00000003423fc60] [c000000000743284] device_offline+0xf4/0x130 [ 1909.625610] [c00000003423fca0] [c000000000743434] online_store+0x64/0xb0 [ 1909.625736] [c00000003423fce0] [c00000000073e37c] dev_attr_store+0x3c/0x60 [ 1909.625862] [c00000003423fd00] [c0000000003faa18] sysfs_kf_write+0x68/0xa0 [ 1909.625984] [c00000003423fd20] [c0000000003f98bc] kernfs_fop_write+0x17c/0x250 [ 1909.626132] [c00000003423fd70] [c00000000033c98c] __vfs_write+0x3c/0x70 [ 1909.626253] [c00000003423fd90] [c00000000033e414] vfs_write+0xd4/0x240 [ 1909.626374] [c00000003423fde0] [c00000000033ffc8] SyS_write+0x68/0x110 [ 1909.626501] [c00000003423fe30] [c00000000000b184] system_call+0x38/0xe0 [ 1909.626624] Instruction dump: [ 1909.626691] 2f890000 419e0064 3be00000 393f0021 3880ffff 792926e4 7d3d4a14 e9290010 [ 1909.626835] 2fa90000 7d234b78 419e002c e9290020 <e9290330> e9290058 2fa90000 7d2c4b78 [ 1909.627003] ---[ end trace ecc8a323beb021a2 ]--- crash> bt PID: 27704 TASK: c000000042a64200 CPU: 2 COMMAND: "ppc64_cpu" #0 [c00000003423f630] crash_kexec at c0000000001a04c4 #1 [c00000003423f670] oops_end at c000000000024da8 #2 [c00000003423f6f0] bad_page_fault at c0000000000627b0 #3 [c00000003423f760] slb_miss_bad_addr at c000000000026828 #4 [c00000003423f780] bad_addr_slb at c000000000008acc Data SLB Access [380] exception frame: R0: d0000000016a14a0 R1: c00000003423fa70 R2: d0000000016ab8cc R3: c000000170fd5000 R4: ffffffffffffffff R5: 0000000000000000 R6: 0000000000000000 R7: 0000000000007530 R8: c00000000146c700 R9: 757465736d642f6e R10: c00000000146dbe0 R11: d0000000016a2ef8 R12: c000000000609d00 R13: c000000001b81200 R14: 0000000000000008 R15: 0000000000000001 R16: 0000000000000000 R17: 0000000000000000 R18: 0000000000000000 R19: 0000000046f05f80 R20: 0000000046f061f8 R21: 0000000000000000 R22: 0000000046f05f58 R23: c00000017fd4a808 R24: 0000000000000001 R25: c000000170fc7a30 R26: c000000001326eb0 R27: d0000000016a1648 R28: c000000001471c28 R29: c000000170fc7860 R30: 0000000071ae4a20 R31: 0000000000000058 NIP: d0000000016a14f4 MSR: 800000000280b033 OR3: c00000000061a238 CTR: c000000000609d00 LR: d0000000016a14a0 XER: 0000000020000000 CCR: 0000000024002428 MQ: 0000000000000001 DAR: 757465736d64329e DSISR: c00000000001b910 Syscall Result: 0000000000000000 #5 [c00000003423fa70] __virtscsi_set_affinity at d0000000016a14f4 [virtio_scsi] [Link Register] [c00000003423fa70] __virtscsi_set_affinity at d0000000016a14a0 (unreliable) #6 [c00000003423fac0] virtscsi_cpu_online at d0000000016a1678 [virtio_scsi] #7 [c00000003423fae0] cpuhp_invoke_callback at c0000000000db73c #8 [c00000003423fb50] cpuhp_down_callbacks at c0000000000dba88 #9 [c00000003423fba0] _cpu_down at c000000000268bb0 #10 [c00000003423fc00] do_cpu_down at c0000000000de1b4 #11 [c00000003423fc40] cpu_subsys_offline at c00000000074b834 #12 [c00000003423fc60] device_offline at c000000000743284 #13 [c00000003423fca0] online_store at c000000000743434 #14 [c00000003423fce0] dev_attr_store at c00000000073e37c #15 [c00000003423fd00] sysfs_kf_write at c0000000003faa18 #16 [c00000003423fd20] kernfs_fop_write at c0000000003f98bc #17 [c00000003423fd70] __vfs_write at c00000000033c98c #18 [c00000003423fd90] vfs_write at c00000000033e414 #19 [c00000003423fde0] sys_write at c00000000033ffc8 #20 [c00000003423fe30] system_call at c00000000000b184 System Call [c01] exception frame: R0: 0000000000000004 R1: 00003ffff823a5c0 R2: 00003fff7bf57f00 R3: 0000000000000008 R4: 0000010029020080 R5: 0000000000000001 R6: 00003fff7bee0d2c R7: 0000010029020010 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00003fff7bfed060 NIP: 00003fff7bf350cc MSR: 800000000280f033 OR3: 0000000000000008 CTR: 0000000000000000 LR: 0000000046f01e0c XER: 0000000000000000 CCR: 0000000048000484 MQ: 0000000000000001 DAR: 00003fff7bd7e2c8 DSISR: 0000000040000000 Syscall Result: 0000000000000008 The initial part while invoking the crash tool on the vmcore : KERNEL: /usr/lib/debug/boot/vmlinux-4.10.0-13-generic DUMPFILE: /var/crash/201703221034/dump.201703221034 [PARTIAL DUMP] CPUS: 7 DATE: Wed Mar 22 10:34:11 2017 UPTIME: 00:11:42 LOAD AVERAGE: 35.29, 25.73, 15.42 TASKS: 704 NODENAME: dotg6 RELEASE: 4.10.0-13-generic VERSION: #15-Ubuntu SMP Thu Mar 9 20:27:28 UTC 2017 MACHINE: ppc64le (3425 Mhz) MEMORY: 6 GB PANIC: "Unable to handle kernel paging request for data at address 0x757465736d64329e" PID: 27704 COMMAND: "ppc64_cpu" TASK: c000000042a64200 [THREAD_INFO: c00000003423c000] CPU: 2 STATE: TASK_RUNNING (PANIC) > Can this problem be reproduced with some certainty ? If so, I could probably > provide a debug patch to the guest kernel and collect some information when > this happens. This guest seems to have crashed twice with this error now with the same backtrace, so it seems likely that it will occur again, but there's no specific timeframe for a crash. There is a test running on this guest which periodically turns SMT on and off, and it's possible that the SMT test is triggering this crash. Causing the SMT test to run more frequently may also trigger this crash more consistently. Mirroring to Canonical for their awareness while IBM continues investigation... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1676678/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp