[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu) Status: Fix Committed => Fix Released ** Changed in: ubuntu-z-systems Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
This bug was fixed in the package linux - 4.4.0-97.120 --- linux (4.4.0-97.120) xenial; urgency=low * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149) * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634) - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT * Xenial update to 4.4.87 stable release (LP: #1715678) - irqchip: mips-gic: SYNC after enabling GIC region - i2c: ismt: Don't duplicate the receive length for block reads - i2c: ismt: Return EMSGSIZE for block reads with bogus length - ceph: fix readpage from fscache - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs - cpuset: Fix incorrect memory_pressure control file mapping - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__ - CIFS: remove endian related sparse warning - wl1251: add a missing spin_lock_init() - xfrm: policy: check policy direction value - drm/ttm: Fix accounting error when fail to get pages for pool - kvm: arm/arm64: Fix race in resetting stage2 PGD - kvm: arm/arm64: Force reading uncached stage2 PGD - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() - crypto: algif_skcipher - only call put_page on referenced and used pages - Linux 4.4.87 * Xenial update to 4.4.86 stable release (LP: #1715430) - scsi: isci: avoid array subscript warning - ALSA: au88x0: Fix zero clear of stream->resources - btrfs: remove duplicate const specifier - i2c: jz4780: drop superfluous init - gcov: add support for gcc version >= 6 - gcov: support GCC 7.1 - lightnvm: initialize ppa_addr in dev_to_generic_addr() - p54: memset(0) whole array - lpfc: Fix Device discovery failures during switch reboot test. - arm64: mm: abort uaccess retries upon fatal signal - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl - arm64: fpsimd: Prevent registers leaking across exec - scsi: sg: protect accesses to 'reserved' page array - scsi: sg: reset 'res_in_use' after unlinking reserved array - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c - Linux 4.4.86 * Xenial update to 4.4.85 stable release (LP: #1714298) - af_key: do not use GFP_KERNEL in atomic contexts - dccp: purge write queue in dccp_destroy_sock() - dccp: defer ccid_hc_tx_delete() at dismantle time - ipv4: fix NULL dereference in free_fib_info_rcu() - net_sched/sfq: update hierarchical backlog when drop packet - ipv4: better IP_MAX_MTU enforcement - sctp: fully initialize the IPv6 address in sctp_v6_to_addr() - tipc: fix use-after-free - ipv6: reset fn->rr_ptr when replacing route - ipv6: repair fib6 tree in failure case - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP - irda: do not leak initialized list.dev to userspace - net: sched: fix NULL pointer dereference when action calls some targets - net_sched: fix order of queue length updates in qdisc_replace() - mei: me: add broxton pci device ids - mei: me: add lewisburg device ids - Input: trackpoint - add new trackpoint firmware ID - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 - ALSA: core: Fix unexpected error at replacing user TLV - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978) - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses - i2c: designware: Fix system suspend - drm: Release driver tracking before making the object available again - drm/atomic: If the atomic check fails, return its value first - drm: rcar-du: lvds: Fix PLL frequency-related configuration - drm: rcar-du: lvds: Rename PLLEN bit to PLLON - drm: rcar-du: Fix crash in encoder failure error path - drm: rcar-du: Fix display timing controller parameter - drm: rcar-du: Fix H/V sync signal polarity configuration - tracing: Fix freeing of filter in create_filter() when set_str is false - cifs: Fix df output for users with quota limits - cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() - nfsd: Limit end of page list when decoding NFSv4 WRITE - perf/core: Fix group {cpu,task} validation - Bluetooth: hidp: fix possible might sleep error in hidp_session_thread - Bluetooth: cmtp: fix possible might sleep error in cmtp_session - Bluetooth: bnep: fix possible might sleep error in bnep_session - binder: use group leader instead of open thread - binder: Use wake up hint for synchronous transactions. - ANDROID: binder: fix proc->tsk check. - iio: imu: adis16480: Fix acceleration scale factor for adis16480 - iio: hid-sensor-trigger: Fix the race with user space powering up sensors - staging: rtl8188eu: add RNX-N150NUB support - ASoC: simple-card: don't fail if sysclk setting is not supported - ASoC: rsnd: disable SRC.out only when stop timing - ASoC: rsnd: avoid
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Hi @jacobi, Thank you very much for verifying the fix! Kleber ** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed- xenial'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu) Status: Triaged => Fix Committed ** Changed in: ubuntu-z-systems Status: Triaged => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu Xenial) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
For the record, the following commit fixes the issue for later kernels (>4.11). What we're seeing with the 4.4 kernel is most likely a different issue though. commit ba74b6f7fcc07355d087af6939712eed4a454821 (refs/bisect/new) Author: Christoph HellwigDate: Thu Aug 24 18:07:02 2017 +0200 virtio_pci: fix cpu affinity support Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues"") removed the adjustment of the pre_vectors for the virtio MSI-X vector allocation which was added in commit fb5e31d9 ("virtio: allow drivers to request IRQ affinity when creating VQs"). This will lead to an incorrect assignment of MSI-X vectors, and potential deadlocks when offlining cpus. Signed-off-by: Christoph Hellwig Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues") Reported-by: YASUAKI ISHIMATSU Cc: sta...@vger.kernel.org Signed-off-by: Michael S. Tsirkin diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 007a4f366086..1c4797e53f68 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -107,6 +107,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, { struct virtio_pci_device *vp_dev = to_vp_device(vdev); const char *name = dev_name(_dev->vdev.dev); + unsigned flags = PCI_IRQ_MSIX; unsigned i, v; int err = -ENOMEM; @@ -126,10 +127,13 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, GFP_KERNEL)) goto error; + if (desc) { + flags |= PCI_IRQ_AFFINITY; + desc->pre_vectors++; /* virtio config vector */ + } + err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors, -nvectors, PCI_IRQ_MSIX | -(desc ? PCI_IRQ_AFFINITY : 0), -desc); +nvectors, flags, desc); if (err < 0) goto error; vp_dev->msix_enabled = 1; -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi - 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Ok, we will turn these options off for Xenial 4.4 and bring s390x in line with the other architectures. Note that we *think* that the reason that they're enabled for s390x is that we initially received a config file from IBM that we used as a base. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Tags removed: kernel-key ** Tags added: kernel-da-key -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
The difference between the Xenial and SUSE kernel is that Xenial has: CONFIG_SCSI_MQ_DEFAULT=y CONFIG_DM_MQ_DEFAULT=y but SUSE: # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set If I disable blk-mq in the Xenial kernel, the test passes. The easiest 'fix' would be to simply disable blk-mq. This can be accomplished via the kernel commandline parameters: scsi_mod.use_blk_mq=0 dm_mod.use_blk_mq=0. I also noticed that s390x is the only architecture where these options are enabled in the Xenial kernel. Is there a specific requirement for this? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Test passes with 4.13 and 4.12 and SUSE's SLE12 SP2 4.4.21-69-generic kernel. Test fails with 4.11. ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Juerg Haefliger (juergh) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
I'm able to reproduce the issue locally. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Are you running specific CPU (un)plug tests in parallel with pdebuild? Can you post the contents of /etc/cpuplugd.conf? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Sorry for the late reply, I'm just getting around looking at this. Yes, I do have access to the folder now, thanks! Questions: Is this setup currently working with another distribution and you're just experiencing issues when running Ubuntu? If so, what's that other distro and kernel? Also, can you give me a high-level overview of your storage architecture? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
I've tried to take a look at at the dump files but the BOX link referenced in comment #4 doesn't work for me. Either I don't have access or it has been removed. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
The purpose of testing v4.12-rc7 is to narrow down that last kernel version that had the hang bug(The bad kernel) and the first kernel version that did not(The good kernel). This will allow us to identify the exact commit that fixes the hang bug. This can be accomplished by performing a "Reverse" bisect[0]. Once we know the commit that fixes the bug, we can SRU it to all the previous Ubuntu releases. Are you not able to test for the hang bug without compiling the DKMS-OpenAFS packages? If so, did they compile okay when you tested v4.12-rc8? [0] https://wiki.ubuntu.com/Kernel/KernelBisection -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Thanks for the update. Can you test v4.11-rc7? It is available from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc7/ We can perform a "Reverse" kernel bisect if we can identify the last bad kernel version and the first good one. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
@jac...@de.ibm.com Can you confirm if v4.11-rc8 fixed the bug or not? Per comment #24 it ts un-clear if it does or not. If we find that it is fixed in the v4.11-rc8 kernel, or any other newer kernel, we can perform a "Reverse" bisect to identify the commit that fixes the bug, then backport it to prior releases. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Tags removed: kernel-da-key ** Tags added: kernel-key -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
I copied over the latest 4_11_0-041100rc8 dump from IBM Box to our Canonical private file share into my home: ~fheimes/mclint_20170607_kernel_4_11_0-041100rc8_without_openafs.dump.bz2 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: ubuntu-z-systems Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Tags removed: kernel-key ** Tags added: kernel-da-key -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
@jac...@de.ibm.com It would be good to know if this bug is already fixed in the mainline kernel. Would it be possible for you to test 4.11-rc8? It can be downloaded from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc8/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Is there an update on this bug? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>] kthread+0x10a/0x110 [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu) Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
I copied over the latest 4.4-72 dump from IBM Box to our Canonical private file share into my home: /~fheimes/mclint_20170406_kernel_4_4_0-72_without_openafs.dump.bz2 also reachable via https ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu) Importance: Undecided => High ** Tags added: kernel-key -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
An x86-64 fixdep is an artifact of cross compiling. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640] [<03ff805ec668>] xfsaild+0x170/0x798 [xfs] [ 5281.179643] [<0018335a>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Carsten, I am currently thinking there are two possibilities here. Maybe three. 1) The fix I submitted is not in the kernel(s) you are running. 2) The s390 compiler does not produce the necessary code to implicitly convert long int to bool. 3) You are hitting a different bug that just happens to look the same. For #2, a simple compiler test could be done to check what code is produced when assigning a long int value to a bool (GCC _Bool). If you want to pursue that let me know. I am not familiar with s390 object code, so we might need someone to interpret the objdump. As far as I can tell, the s390 Linux kernel does use GCC _Bool for the data type "bool", so it would then be a matter of what code the s390 GCC produces in this case. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
I'm confused about the release mechanics, I guess. Looking at the git repository, I see tag "Ubuntu-4.4.0-65.86" (for example) and that tag commit does contain the fix. Is it then possible for a kernel labeled "4.4.0-65-generic #86" to NOT contain that patch? Am I making a gross assumption that these tags reflect what was released? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Due to an emergency CVE rebase, that patch still hasn't made it into the wild. Here is a test kernel that definitely has the patch http://kernel.ubuntu.com/~rtg/lp1670634/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
Douglas - The patch referred to in LP #1662673 ("percpu-refcount: fix reference leak during percpu-atomic transition") is in Ubuntu-4.4.0-65.86 which has yet to be released. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
This looks a lot like the problem in LP #1662673, but that fix is supposed to be in kernel 4.4.0-65-generic. Might be worth confirming though. Or perhaps confirming that the fix actually works on the Z architecture (depends on how the architecture/compiler handles 'bool'). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589]
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: ubuntu-z-systems Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team) ** Changed in: ubuntu-z-systems Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100
[Kernel-packages] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Tags removed: bot-comment ** Tags added: s390x ** Also affects: ubuntu-z-systems Importance: Undecided Status: New ** Package changed: ubuntu => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100