Package: src:linux Version: 3.16.43-2+deb8u2 Severity: normal Dear Kernel Maintainers,
I randomly get disk lock on guest VMs on top of a ganeti cluster using KVM/DRBD. Kernel logs on the guest say: ----- task jbd2/vda1-8:107 blocked for more than 120 seconds. [1814040.120033] INFO: task jbd2/vda1-8:107 blocked for more than 120 seconds. [1814040.120798] Not tainted 3.16.0-4-amd64 #1 [1814040.121297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1814040.122132] jbd2/vda1-8 D ffff880036931948 0 107 2 0x00000000 [1814040.122139] ffff8800369314f0 0000000000000046 0000000000012f40 ffff880139f8ffd8 [1814040.122144] 0000000000012f40 ffff8800369314f0 ffff880139f8fdc8 ffff880139f8fe60 [1814040.122148] ffff8800368240b8 ffff8800369314f0 ffff880098d5be00 ffff880139f8fdb0 [1814040.122152] Call Trace: [1814040.122171] [<ffffffffa0160fb0>] ? jbd2_journal_commit_transaction+0x260/0x1a30 [jbd2] [1814040.122179] [<ffffffff8105297b>] ? kvm_clock_read+0x1b/0x20 [1814040.122186] [<ffffffff8101cab5>] ? sched_clock+0x5/0x10 … ----- There is nothing in the logs (kernel.log, system.log,…) on the hosts. Hosts and guests both use a stable Jessie kernel. This report come from the guest having this issue. The hosts have 256G of RAM and SSDs disks in a hard RAID10 and use qemu-kvm 2.1+dfsg-12+deb8u6, drbd-utils 8.9.5-1~bpo8+1. Some people talk about decreasing vm.dirty_background_ratio and vm.dirty_ratio that could help: * https://forum.proxmox.com/threads/vm-blocked-due-to-hung_task_timeout_secs.22488/page-3#post-173136 I was thinking of upgrading the kernel to the jessie-backports version because some people solve similar issue like this: * https://phabricator.wikimedia.org/T129891#2853793 Do you think any of the solutions above could be helpful? Is there an other way I can troubleshoot this issue? Thanks in advance for your help and your time. Best regards, Thomas Pierson -- Package-specific info: ** Version: Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26) ** Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=UUID=e25af31e-8d5f-497e-b8d1-2e99b63bf41e ro text console=ttyS0,38400n8 quiet ** Not tainted ** Kernel log: [1814160.132439] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0 [1814160.132443] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0 [1814160.132450] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 [1814160.132455] INFO: task master:684 blocked for more than 120 seconds. [1814160.133139] Not tainted 3.16.0-4-amd64 #1 [1814160.133636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1814160.134468] master D ffff8800bb16b9c8 0 684 1 0x00000004 [1814160.134472] ffff8800bb16b570 0000000000000086 0000000000012f40 ffff8800bb243fd8 [1814160.134476] 0000000000012f40 ffff8800bb16b570 ffff880036824000 ffff880036824070 [1814160.134480] 0000000000021dad 0000000000000002 0000000000000000 ffff8801305c7358 [1814160.134484] Call Trace: [1814160.134491] [<ffffffffa015e074>] ? wait_transaction_locked+0x74/0xa0 [jbd2] [1814160.134495] [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0 [1814160.134501] [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0 [jbd2] [1814160.134508] [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200 [jbd2] [1814160.134518] [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4] [1814160.134522] [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0 [1814160.134526] [<ffffffff811c47b1>] ? update_time+0x81/0xc0 [1814160.134531] [<ffffffff8106c5e2>] ? current_fs_time+0x12/0x60 [1814160.134534] [<ffffffff811c49b0>] ? file_update_time+0x80/0xd0 [1814160.134540] [<ffffffff811b2dc0>] ? pipe_write+0x3e0/0x450 [1814160.134545] [<ffffffff811aa7f4>] ? new_sync_write+0x74/0xa0 [1814160.134549] [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0 [1814160.134552] [<ffffffff811aba72>] ? SyS_write+0x42/0xa0 [1814160.134557] [<ffffffff811bf6ed>] ? SyS_poll+0x5d/0xf0 [1814160.134562] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 [1814160.134566] INFO: task pickup:25876 blocked for more than 120 seconds. [1814160.135273] Not tainted 3.16.0-4-amd64 #1 [1814160.135770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1814160.136615] pickup D ffff880036f25908 0 25876 684 0x00000000 [1814160.136619] ffff880036f254b0 0000000000000086 0000000000012f40 ffff8800baf2bfd8 [1814160.136623] 0000000000012f40 ffff880036f254b0 ffff880036824000 ffff880036824070 [1814160.136633] 0000000000021dad 0000000000000002 0000000000000000 ffff8801305c7058 [1814160.136663] Call Trace: [1814160.136676] [<ffffffffa015e074>] ? wait_transaction_locked+0x74/0xa0 [jbd2] [1814160.136681] [<ffffffff810a9610>] ? prepare_to_wait_event+0xf0/0xf0 [1814160.136687] [<ffffffffa015e3a1>] ? start_this_handle+0x281/0x5a0 [jbd2] [1814160.136694] [<ffffffffa015e8b4>] ? jbd2__journal_start+0xf4/0x200 [jbd2] [1814160.136703] [<ffffffffa0182e05>] ? ext4_dirty_inode+0x25/0x60 [ext4] [1814160.136708] [<ffffffff811d21e1>] ? __mark_inode_dirty+0x171/0x2b0 [1814160.136712] [<ffffffff811c47b1>] ? update_time+0x81/0xc0 [1814160.136716] [<ffffffff811c48e9>] ? touch_atime+0xf9/0x140 [1814160.136732] [<ffffffff811b30b0>] ? pipe_read+0x280/0x2f0 [1814160.136738] [<ffffffff811aa751>] ? new_sync_read+0x71/0xa0 [1814160.136742] [<ffffffff811aada3>] ? vfs_read+0x93/0x170 [1814160.136750] [<ffffffff811ab9d2>] ? SyS_read+0x42/0xa0 [1814160.136755] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 [1814160.136762] INFO: task puppet:28066 blocked for more than 120 seconds. [1814160.137464] Not tainted 3.16.0-4-amd64 #1 [1814160.137960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1814160.138798] puppet D ffff8800bb16a728 0 28066 886 0x00000000 [1814160.138802] ffff8800bb16a2d0 0000000000000082 0000000000012f40 ffff8800baf8ffd8 [1814160.138806] 0000000000012f40 ffff8800bb16a2d0 ffff88013fd137f0 ffff88013ffc3a10 [1814160.138809] ffff8800baf8fac0 0000000000000002 ffffffff811da390 ffff8800bbbd67e0 [1814160.138813] Call Trace: [1814160.138818] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50 [1814160.138822] [<ffffffff81516f49>] ? io_schedule+0x99/0x120 [1814160.138826] [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10 [1814160.138831] [<ffffffff815173d1>] ? __wait_on_bit_lock+0x41/0xa0 [1814160.138834] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50 [1814160.138839] [<ffffffff815174a7>] ? out_of_line_wait_on_bit_lock+0x77/0x90 [1814160.138843] [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30 [1814160.138851] [<ffffffffa015fa10>] ? do_get_write_access+0x250/0x4f0 [jbd2] [1814160.138858] [<ffffffffa015fcd2>] ? jbd2_journal_get_write_access+0x22/0x40 [jbd2] [1814160.138870] [<ffffffffa01a9fa6>] ? __ext4_journal_get_write_access+0x36/0x80 [ext4] [1814160.138879] [<ffffffffa0178fe6>] ? __ext4_new_inode+0x436/0x1350 [ext4] [1814160.138891] [<ffffffffa0189a24>] ? ext4_create+0xc4/0x160 [ext4] [1814160.138895] [<ffffffff811b8e54>] ? vfs_create+0xc4/0x130 [1814160.138899] [<ffffffff811b9f98>] ? do_last+0x10d8/0x1330 [1814160.138902] [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680 [1814160.138908] [<ffffffff8116c30d>] ? handle_mm_fault+0x84d/0x1150 [1814160.138912] [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90 [1814160.138916] [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120 [1814160.138922] [<ffffffff811aa169>] ? do_sys_open+0x129/0x220 [1814160.138927] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 [1814160.138930] INFO: task cron:28670 blocked for more than 120 seconds. [1814160.139613] Not tainted 3.16.0-4-amd64 #1 [1814160.140122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1814160.140955] cron D ffff880139d347a8 0 28670 343 0x00000000 [1814160.140959] ffff880139d34350 0000000000000082 0000000000012f40 ffff88013a20ffd8 [1814160.140962] 0000000000012f40 ffff880139d34350 ffff88013fc137f0 ffff88013ffcf830 [1814160.140966] 0000000000000002 ffffffff811da390 ffff88013a20fba0 0000000000000000 [1814160.140972] Call Trace: [1814160.140977] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50 [1814160.140981] [<ffffffff81516f49>] ? io_schedule+0x99/0x120 [1814160.140985] [<ffffffff811da39a>] ? sleep_on_buffer+0xa/0x10 [1814160.140989] [<ffffffff815172cc>] ? __wait_on_bit+0x5c/0x90 [1814160.140992] [<ffffffff811da390>] ? generic_block_bmap+0x50/0x50 [1814160.140997] [<ffffffff81517377>] ? out_of_line_wait_on_bit+0x77/0x90 [1814160.141001] [<ffffffff810a9640>] ? autoremove_wake_function+0x30/0x30 [1814160.141013] [<ffffffffa0187ac7>] ? ext4_find_entry+0x117/0x4d0 [ext4] [1814160.141023] [<ffffffffa0188371>] ? ext4_lookup+0x41/0x140 [ext4] [1814160.141029] [<ffffffff811b3e39>] ? lookup_real+0x19/0x50 [1814160.141032] [<ffffffff811b9920>] ? do_last+0xa60/0x1330 [1814160.141036] [<ffffffff811ba2ab>] ? path_openat+0xbb/0x680 [1814160.141042] [<ffffffff8116c0fc>] ? handle_mm_fault+0x63c/0x1150 [1814160.141046] [<ffffffff811bb01a>] ? do_filp_open+0x3a/0x90 [1814160.141050] [<ffffffff811c722c>] ? __alloc_fd+0x7c/0x120 [1814160.141055] [<ffffffff811aa169>] ? do_sys_open+0x129/0x220 [1814160.141061] [<ffffffff8151a48d>] ? system_call_fast_compare_end+0x10/0x15 ** Model information sys_vendor: QEMU product_name: Standard PC (i440FX + PIIX, 1996) product_version: pc-i440fx-2.1 chassis_vendor: QEMU chassis_version: pc-i440fx-2.1 bios_vendor: SeaBIOS bios_version: 1.7.5-20140531_083030-gandalf ** Loaded modules: xt_multiport ip_vs libcrc32c crc32c_generic binfmt_misc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ppdev evdev serio_raw pcspkr virtio_balloon pvpanic parport_pc parport ttm drm_kms_helper processor thermal_sys drm button autofs4 ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix libata virtio_pci virtio_ring scsi_mod uhci_hcd ehci_hcd i2c_piix4 i2c_core virtio psmouse usbcore usb_common ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000] Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010] (prog-if 80 [Master]) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) Region 4: I/O ports at c0c0 [size=16] Kernel driver in use: ata_piix 00:01.2 USB controller [0c03]: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] [8086:7020] (rev 01) (prog-if 00 [UHCI]) Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin D routed to IRQ 11 Region 4: I/O ports at c040 [size=32] Kernel driver in use: uhci_hcd 00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 9 Kernel driver in use: piix4_smbus 00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8] (prog-if 00 [VGA controller]) Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at febc0000 [disabled] [size=64K] 00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon [1af4:1002] Subsystem: Red Hat, Inc Device [1af4:0005] Physical Slot: 3 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c060 [size=32] Kernel driver in use: virtio-pci 00:04.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001] Subsystem: Red Hat, Inc Device [1af4:0002] Physical Slot: 4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at c000 [size=64] Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] Subsystem: Red Hat, Inc Device [1af4:0001] Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c080 [size=32] Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at feb40000 [disabled] [size=256K] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:06.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] Subsystem: Red Hat, Inc Device [1af4:0001] Physical Slot: 6 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at c0a0 [size=32] Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at feb80000 [disabled] [size=256K] Capabilities: <access denied> Kernel driver in use: virtio-pci ** USB devices: not available -- System Information: Debian Release: 8.8 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages linux-image-3.16.0-4-amd64 depends on: ii debconf [debconf-2.0] 1.5.56 ii initramfs-tools [linux-initramfs-tool] 0.120+deb8u3 ii kmod 18-3 ii linux-base 3.5 Versions of packages linux-image-3.16.0-4-amd64 recommends: pn firmware-linux-free <none> pn irqbalance <none> Versions of packages linux-image-3.16.0-4-amd64 suggests: pn debian-kernel-handbook <none> ii grub-pc 2.02~beta2-22+deb8u1 pn linux-doc-3.16 <none> Versions of packages linux-image-3.16.0-4-amd64 is related to: pn firmware-atheros <none> pn firmware-bnx2 <none> pn firmware-bnx2x <none> pn firmware-brcm80211 <none> pn firmware-intelwimax <none> pn firmware-ipw2x00 <none> pn firmware-ivtv <none> pn firmware-iwlwifi <none> pn firmware-libertas <none> pn firmware-linux <none> pn firmware-linux-nonfree <none> pn firmware-myricom <none> pn firmware-netxen <none> pn firmware-qlogic <none> pn firmware-ralink <none> pn firmware-realtek <none> pn xen-hypervisor <none> -- debconf information: linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64: