** Changed in: linux (Ubuntu) Assignee: Tim Gardner (timg-tpi) => (unassigned)
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1659111 Title: UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel 4.4.0-47-generic Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Won't Fix Status in linux source package in Zesty: Incomplete Bug description: Attn. Canonical: For your awareness only at this time. == Comment: #0 - LEKSHMI C. PILLAI - 2016-11-22 03:49:38 == Machine INFO KVM HOST: luckyv1 Guest :lucky05 lucky05 crashed while running the I/O stress test for SAN disks. Installed lucky05 and enabled the xmon on that.After that started the RAW disk test on around 50 disks.After 6-7 hours after running,Now machine dropped into xmon. Logs: [25023.224182] Unable to handle kernel paging request for data at address 0x00000000 [25023.224257] Faulting instruction address: 0xc000000000324c60 cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3620] pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290 lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590 sp: c0000000fffc38a0 msr: 8000000100009033 dar: 0 dsisr: 40000000 current = 0xc0000000ff99e470 paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01 pid = 14736, comm = kworker/u16:8 enter ? for help [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450 [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130 [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 3:mon> f 3:mon> th [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450 [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130 [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 3:mon> sh [27384.651055] INFO: rcu_sched detected stalls on CPUs/tasks: [27384.651220] (detected by 4, t=40598 jiffies, g=2849830, c=2849829, q=992) [27384.651286] All QSes seen, last rcu_sched kthread activity 40596 (4301188714-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0 [27384.651501] rcu_sched kthread starved for 40596 jiffies! g2849830 c2849829 f0x2 s3 ->state=0x0 [27384.651747] INFO: rcu_sched detected stalls on CPUs/tasks: [27384.651905] (detected by 4, t=590354 jiffies, g=2849830, c=2849829, q=1285) [27384.652012] All QSes seen, last rcu_sched kthread activity 590352 (4301738470-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0 [27384.652191] rcu_sched kthread starved for 590352 jiffies! g2849830 c2849829 f0x2 s3 ->state=0x0 [27384.730645] Unable to handle kernel paging request for data at address 0xffffffffffffffd8 [27384.730781] Faulting instruction address: 0xc0000000000e7258 cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000] pc: c0000000000e7258: kthread_data+0x28/0x40 lr: c0000000000de940: wq_worker_sleeping+0x30/0x110 sp: c0000000fffc3280 msr: 8000000100009033 dar: ffffffffffffffd8 dsisr: 40000000 current = 0xc0000000ff99e470 paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01 pid = 14736, comm = kworker/u16:8 enter ? for help == Comment: #1 - LEKSHMI C. PILLAI - 2016-11-22 04:05:41 == 3:mon> th [c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110 [c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990 [c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0 [c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30 [c0000000fffc34b0] c000000000020bf4 die+0x314/0x470 [c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150 [c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30 --- Exception: 300 (Data Access) at c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290 [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450 [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130 [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 3:mon> == Comment: #6 - Laurent Dufour - 2016-11-23 03:00:16 == Logged in luckyv1, found a lot of ipr issue on this node: [525973.896624] qla2xxx 0005:09:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update [525973.956619] qla2xxx 0005:09:00.1: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update [529433.834853] ipr 0001:04:00.0: FFFE: Soft device bus error recovered by the IOA [529433.834867] ipr: -----Failing Device Information----- [529433.834870] ipr: World Wide Unique ID: 500507605EC10C000000000000000000 [529433.834873] ipr: Device Resource Path: FF [529433.834875] ipr: Primary Problem Description: Command Timeout [529433.834878] ipr: Secondary Problem Description: Command timeout expired [529433.834880] ipr: SCSI Sense Data: [529433.834882] ipr: 00000000: 00000000 00000000 00000000 00000000 [529433.834884] ipr: 00000010: 00000000 00000000 00000000 00000000 [529433.834886] ipr: SCSI Command Descriptor Block: [529433.834889] ipr: 00000000: 9E120004 0F000000 00000000 0020AD00 [529433.834891] ipr: Additional IOA Data: [529433.834893] ipr: 00000000: 4646001C 44010007 00050000 04700002 [529433.834895] ipr: 00000010: 3B894A49 1EE620CC 04700002 49574631 [529433.834897] ipr: 00000020: 455300CC 06B00027 00000020 84000000 [529433.834899] ipr: 00000030: 00000000 05801000 0B29A7C0 00000000 [529433.834901] ipr: 00000040: 00000000 00000000 00000000 00000000 [529433.834904] ipr: 00000050: 00000000 00000000 00000000 00000000 [529433.834906] ipr: 00000060: 00000000 00000000 00000000 00000000 [529433.834908] ipr: 00000070: 00000000 00000000 00000000 00000000 [529433.834910] ipr: 00000080: 00000000 00000000 00000000 00000000 [529433.834912] ipr: 00000090: 00000000 00000000 00000000 00000000 [529433.834914] ipr: 000000A0: 00000000 D4000018 80000000 FFFFFFFF [529433.834917] ipr: 000000B0: FFFFFFFF 00000000 0980EC21 00000000 [529433.834919] ipr: 000000C0: 00000000 00000000 01769A24 00000000 [529433.834921] ipr: 000000D0: 01D3C300 E0050000 FFFFFFFE 0B5A0000 [529433.834923] ipr: 000000E0: 00000000 9E120004 0F000000 00000000 [529433.834926] ipr: 000000F0: 43440010 9E120004 0F000000 00000000 [529433.834928] ipr: 00000100: 0020AD00 45480010 0100E038 9E12FFFF [529433.834930] ipr: 00000110: 01080002 00000000 45540004 00001463 In addition there are some NFS issue reported: [563034.817901] nfs: server 10.33.11.31 not responding, timed out [563405.504308] nfs: server 10.33.11.31 not responding, timed out This said, chig5 enter xmon due to a bad pointer in the kernel: 3:mon> e cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000] pc: c0000000000e7258: kthread_data+0x28/0x40 lr: c0000000000de940: wq_worker_sleeping+0x30/0x110 sp: c0000000fffc3280 msr: 8000000100009033 dar: ffffffffffffffd8 dsisr: 40000000 current = 0xc0000000ff99e470 paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01 pid = 14736, comm = kworker/u16:8 3:mon> th [c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110 [c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990 [c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0 [c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30 [c0000000fffc34b0] c000000000020bf4 die+0x314/0x470 [c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150 [c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30 --- Exception: 300 (Data Access) at c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290 [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450 [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130 [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 Looking at the other guest as Lekshmi mentioned that all the guests are crashing. == Comment: #7 - Laurent Dufour - 2016-11-23 03:24:34 == The guest lucky01 (4.4.0-47-generic) is fine : root@lucky01:/Blast# date Wed Nov 23 03:04:23 CST 2016 The guest lucky02 (4.4.0-47generic) has entered xmon due to the same issue as lukcy05: 7:mon> e cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620] pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290 lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590 sp: c0000001f265b8a0 msr: 8000000100009033 dar: 0 dsisr: 40000000 current = 0xc0000001f222fcc0 paca = 0xc00000000fb44280 softe: 0 irq_happened: 0x01 pid = 12062, comm = kworker/u16:3 7:mon> t [c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450 [c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130 [c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 --- Exception: 0 at 0000000000000000 The guest lucky03 didn't enter xmon but is not responding any more. Unfornately sysrq is not enabled on this guest. There are still some activity on this guest. root@luckyv1:~# virsh qemu-monitor-command --hmp lucky03 'info cpus' * CPU #0: nip=0xc0000000001035e0 thread_id=76434 CPU #1: nip=0xc0000000000863dc thread_id=76435 CPU #2: nip=0xc0000000000863dc thread_id=76436 CPU #3: nip=0xc0000000000863dc thread_id=76437 CPU #4: nip=0xc0000000000863dc thread_id=76439 CPU #5: nip=0xc0000000000863dc thread_id=76440 CPU #6: nip=0x0000000010072f68 thread_id=76441 CPU #7: nip=0xc0000000000863dc thread_id=76442 The guest lucky04 is not responding but neither enter xmon, but sysrq are not enabled on this node. But the node seems to be still active: root@luckyv1:~# virsh qemu-monitor-command --hmp lucky04 'info cpus' * CPU #0: nip=0xc000000000af8834 thread_id=68201 CPU #1: nip=0xc0000000000863dc thread_id=68202 CPU #2: nip=0xc0000000000645ac thread_id=68203 CPU #3: nip=0xc0000000000863dc thread_id=68204 CPU #4: nip=0xc0000000000863dc thread_id=68205 CPU #5: nip=0xc0000000000863dc thread_id=68206 CPU #6: nip=0xc000000000064590 thread_id=68207 CPU #7: nip=0xc000000000af8904 thread_id=68208 The guest lucky06 is alive: root@lucky06:/# cat /proc/version; date Linux version 4.4.0-47-generic (buildd@bos01-ppc64el-008) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 Wed Nov 23 03:20:19 CST 2016 To summarize: lucky01 good lucky02 panic in locked_inode_to_wb_and_lock_list() lucky03 not responding but still active lucky04 not responding but still active lucky05 panic in locked_inode_to_wb_and_lock_list() lucky06 good == Comment: #10 - Laurent Dufour - 2016-11-24 10:27:52 == Here the data I captured on lucky02 which did panic the way lucky05 did. CPU 7 panic due to a data access error: 7:mon> e cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620] pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290 lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590 sp: c0000001f265b8a0 msr: 8000000100009033 dar: 0 dsisr: 40000000 current = 0xc0000001f222fcc0 paca = 0xc00000000fb44280 softe: 0 irq_happened: 0x01 pid = 12062, comm = kworker/u16:3 7:mon> r R00 = c00000000032831c R16 = c0000001fc972ef8 R01 = c0000001f265b8a0 R17 = c0000001fc972e70 R02 = c0000000015c6a00 R18 = c0000001fc972f60 R03 = c0000001fc972e70 R19 = 0000000000000000 R04 = c0000001f2230700 R20 = 0000000000000000 R05 = 0000000000000000 R21 = c0000001f2658000 R06 = 00000001fef30000 R22 = c0000001f35d5c88 R07 = 000108f684c40713 R23 = c0000001f35d5c68 R08 = 0000000000000000 R24 = 0000000000000000 R09 = 0000000000000000 R25 = c0000001fc972ef8 R10 = 0000000080000007 R26 = 0000000000000000 R11 = 00000000030883ec R27 = 0000000000000000 R12 = 0000000000000000 R28 = 0000000000000001 R13 = c00000000fb44280 R29 = c0000001fc972e70 R14 = c0000000000e6878 R30 = c0000001f265bba0 R15 = 0000000000000000 R31 = 0000000000000000 pc = c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290 cfar= 00003fff9647a5a8 lr = c00000000032831c writeback_sb_inodes+0x30c/0x590 msr = 8000000100009033 cr = 24652882 ctr = c000000000110b50 xer = 0000000020000000 trap = 300 dar = 0000000000000000 dsisr = 40000000 7:mon> t [c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590 [c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150 [c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450 [c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570 [c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0 [c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680 [c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130 [c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4 The system tried to access data pointed by r31 which contains data retrieved from the inode address stored in r29. The panic happened during the inline call to wb_get when inode->i_wb is used. So here inode->i_wb is null which is not expeted to happen. At this time, CPU 6 is waiting for the same inode's spinlock inode->i_lock to be released here: 6:mon> t [link register ] c000000000064624 __spin_yield+0xb4/0xc0 [c0000000fdb93900] c0000000fdb93940 (unreliable) [c0000000fdb93970] c000000000af8968 _raw_spin_lock+0xd8/0xe0 [c0000000fdb939a0] c000000000327330 __mark_inode_dirty+0xd0/0x4a0 [c0000000fdb93a20] c0000000003326f0 mark_buffer_dirty+0x1f0/0x210 [c0000000fdb93a60] c000000000334ff0 __block_commit_write.isra.7+0xf0/0x170 [c0000000fdb93ad0] c00000000033513c block_write_end+0x7c/0x100 [c0000000fdb93b20] c00000000033a340 blkdev_write_end+0x60/0xa0 [c0000000fdb93b80] c00000000022d340 generic_perform_write+0x180/0x280 [c0000000fdb93c20] c00000000022f568 __generic_file_write_iter+0x208/0x250 [c0000000fdb93c80] c00000000033b498 blkdev_write_iter+0x98/0x160 [c0000000fdb93cf0] c0000000002e24a4 new_sync_write+0xc4/0x120 [c0000000fdb93d90] c0000000002e32a0 vfs_write+0xc0/0x230 [c0000000fdb93de0] c0000000002e42dc SyS_write+0x6c/0x110 [c0000000fdb93e30] c000000000009204 system_call+0x38/0xb4 --- Exception: c01 (System Call) at 00003fff944c6728 SP (3ffef9ffe0c0) is in userspace The CPU 6 hold the inode->i_lock in the call to inode_to_wb_and_lock_list(). Why inode->i_wb is null ? == Comment: #11 - Laurent Dufour - 2016-11-25 11:57:50 == I found that lucky03 hit the panic also. I took a closer look and it seems that there is a lock / memory barrier issue around between the code run in locked_inode_to_wb_and_lock_list() and another CPU. I found that the CPU 5 was running 'latest_blast' at the time the CPU 0 hit the panic. The same applied on lucky02. == Comment: #13 - Laurent Dufour - 2016-12-05 07:32:30 == I did some test on luckyv05 and I was able to recreate it on 4.8 vanilla kernel: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c --- Exception: 0 at 0000000000000000 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620] pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290 lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590 sp: c0000000fb65f8a0 msr: 800000010280b033 dar: 0 dsisr: 40000000 current = 0xc0000001d69be400 paca = 0xc000000003480000 softe: 0 irq_happened: 0x01 pid = 18689, comm = kworker/u16:10 Linux version 4.8.0 (laurent@lucky05) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Thu Dec 1 09:25:13 CST 2016 So this is not a Ubuntu's issue but a more global one which is not fixed by the patch https://patchwork.kernel.org/patch/9247955/ as expected while investigating the bug 142781. == Comment: #17 - Laurent Dufour - 2016-12-07 03:22:05 == For the record, I also hit the bug with 4.9-rc8: 4:mon> t [c000000012a7f900] c0000000003787cc writeback_sb_inodes+0x30c/0x590 [c000000012a7fa10] c000000000378b34 __writeback_inodes_wb+0xe4/0x150 [c000000012a7fa70] c000000000378f9c wb_writeback+0x30c/0x450 [c000000012a7fb40] c000000000379df8 wb_workfn+0x268/0x580 [c000000012a7fc50] c0000000000f8c20 process_one_work+0x1e0/0x590 [c000000012a7fce0] c0000000000f9078 worker_thread+0xa8/0x650 [c000000012a7fd80] c000000000101a30 kthread+0x110/0x130 [c000000012a7fe30] c00000000000c0e8 ret_from_kernel_thread+0x5c/0x74 4:mon> e cpu 0x4: Vector: 300 (Data Access) at [c000000012a7f620] pc: c000000000374f40: locked_inode_to_wb_and_lock_list+0x50/0x290 lr: c0000000003787cc: writeback_sb_inodes+0x30c/0x590 sp: c000000012a7f8a0 msr: 800000010280b033 dar: 0 dsisr: 40000000 current = 0xc000000011540000 paca = 0xc000000003482400 softe: 0 irq_happened: 0x01 pid = 8357, comm = kworker/u16:3 Linux version 4.9.0-rc8 (root@lucky05) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Tue Dec 6 05:17:47 CST 2016 == Comment: #24 - Thiago Jung Bauermann - 2017-01-11 16:09:45 == Dan Willians posted on 01/06 a patch series which aims to solve this bug: https://www.spinics.net/lists/linux-fsdevel/msg106092.html Unfortunately, the kernel test robot found problems with it: http://lkml.iu.edu/hypermail/linux/kernel/1701.1/00239.html Still, I think it's useful to perform tests to confirm that: 1. v4.10 is still affected by the problem and 2. Dan's patches fix this bug. Therefore, could you please reproduce this bug on the unmodified v4.10-rc3 build below? http://kernel.stglabs.ibm.com/~bauermann/bug149014/v4.10-rc3/ This will allow us to confirm point 1. Then, can you please try to reproduce it with the build below? http://kernel.stglabs.ibm.com/~bauermann/bug149014/fix- backing_dev_info-lifetime-v2/ This one is v4.10-rc3 plus Dan Willian's two patches from my link above applied to it. == Comment: #28 - Lata Kuntal - 2017-01-16 01:34:05 == I am seeing the same crash issue on one of UbuntuKVM 16.04.02 guest gusg8. Pasting the console logs below : root@guskvm:~# virsh console gusg8 --force Connected to domain gusg8 Escape character is ^] 0:mon> 0:mon> 0:mon> t [c00000023d1ab900] c00000000036a41c writeback_sb_inodes+0x30c/0x590 [c00000023d1aba10] c00000000036a784 __writeback_inodes_wb+0xe4/0x150 [c00000023d1aba70] c00000000036abfc wb_writeback+0x30c/0x450 [c00000023d1abb40] c00000000036ba38 wb_workfn+0x268/0x580 [c00000023d1abc50] c0000000000ef5e8 process_one_work+0x1e8/0x5b0 [c00000023d1abce0] c0000000000efa58 worker_thread+0xa8/0x650 [c00000023d1abd80] c0000000000f8224 kthread+0x114/0x140 [c00000023d1abe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c --- Exception: 0 at 0000000000000000 0:mon> 0:mon> 0:mon> d 0000000000000000 **************** **************** | | 0:mon> r R00 = c00000000036a41c R16 = c00000027ca0e868 R01 = c00000023d1ab8a0 R17 = c00000027ca0e7e0 R02 = c0000000014a6600 R18 = c00000027ca0e8d0 R03 = c00000027ca0e7e0 R19 = 0000000000000000 R04 = c0000001b092e710 R20 = 0000000000000000 R05 = 0000000000000000 R21 = c00000023d1a8000 R06 = 000000027ee30000 R22 = c000000273aace50 R07 = 00001d0c11165f1a R23 = c000000273aace30 R08 = 0000000000000000 R24 = 0000000000000000 R09 = 0000000000000000 R25 = 0000000000000000 R10 = 0000000080000000 R26 = c00000027ca0e868 R11 = c0000000014daae0 R27 = 0000000000000000 R12 = 0000000000005500 R28 = 0000000000000001 R13 = c00000000fb80000 R29 = c00000027ca0e7e0 R14 = c0000000000f8118 R30 = c00000023d1abba0 R15 = 0000000000000000 R31 = 0000000000000000 pc = c000000000366be4 locked_inode_to_wb_and_lock_list+0x54/0x290 cfar= d000000004bbf2e4 xfs_buf_delwri_submit_buffers+0x1e4/0x2b0 [xfs] lr = c00000000036a41c writeback_sb_inodes+0x30c/0x590 msr = 800000010280b033 cr = 24aa2882 ctr = c000000000122210 xer = 0000000020000000 trap = 300 dar = 0000000000000000 dsisr = 40000000 0:mon> c cpus stopped: 0x0-0x3 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000023d1ab620] pc: c000000000366be4: locked_inode_to_wb_and_lock_list+0x54/0x290 lr: c00000000036a41c: writeback_sb_inodes+0x30c/0x590 sp: c00000023d1ab8a0 msr: 800000010280b033 dar: 0 dsisr: 40000000 current = 0xc0000001b092dc00 paca = 0xc00000000fb80000 softe: 0 irq_happened: 0x01 pid = 774, comm = kworker/u8:3 Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11) 0:mon> == Comment: #33 - Thiago Jung Bauermann - 2017-01-23 15:31:24 == Lekshmi mentioned that she wasn't able to reproduce this bug with kernel 4.10.0-rc3fixlifetime+, so I replied to Dan's patch series mentioning that it fixes this bug: https://www.spinics.net/lists/linux-fsdevel/msg106830.html Let's see if he answers back with a status or thoughts regarding the patch series. == Comment: #34 - LEKSHMI C. PILLAI - 2017-01-24 00:26:22 == Hi The fix worked with 4.10.0-rc3fixlifetime+ kernel.Need to know which kernel the fix is going to be.and whether able to get the workaround for 16.04.02 ie; kernel 4.8 Thanks Lekshmi To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1659111/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp