[Group.of.nepali.translators] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Changed in: linux (Ubuntu) Status: Fix Committed => Fix Released ** Changed in: ubuntu-z-systems Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Bug description: == Comment: #0 - Carsten Jacobi - 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+
[Group.of.nepali.translators] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
This bug was fixed in the package linux - 4.4.0-97.120 --- linux (4.4.0-97.120) xenial; urgency=low * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149) * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634) - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT * Xenial update to 4.4.87 stable release (LP: #1715678) - irqchip: mips-gic: SYNC after enabling GIC region - i2c: ismt: Don't duplicate the receive length for block reads - i2c: ismt: Return EMSGSIZE for block reads with bogus length - ceph: fix readpage from fscache - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs - cpuset: Fix incorrect memory_pressure control file mapping - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__ - CIFS: remove endian related sparse warning - wl1251: add a missing spin_lock_init() - xfrm: policy: check policy direction value - drm/ttm: Fix accounting error when fail to get pages for pool - kvm: arm/arm64: Fix race in resetting stage2 PGD - kvm: arm/arm64: Force reading uncached stage2 PGD - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() - crypto: algif_skcipher - only call put_page on referenced and used pages - Linux 4.4.87 * Xenial update to 4.4.86 stable release (LP: #1715430) - scsi: isci: avoid array subscript warning - ALSA: au88x0: Fix zero clear of stream->resources - btrfs: remove duplicate const specifier - i2c: jz4780: drop superfluous init - gcov: add support for gcc version >= 6 - gcov: support GCC 7.1 - lightnvm: initialize ppa_addr in dev_to_generic_addr() - p54: memset(0) whole array - lpfc: Fix Device discovery failures during switch reboot test. - arm64: mm: abort uaccess retries upon fatal signal - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl - arm64: fpsimd: Prevent registers leaking across exec - scsi: sg: protect accesses to 'reserved' page array - scsi: sg: reset 'res_in_use' after unlinking reserved array - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c - Linux 4.4.86 * Xenial update to 4.4.85 stable release (LP: #1714298) - af_key: do not use GFP_KERNEL in atomic contexts - dccp: purge write queue in dccp_destroy_sock() - dccp: defer ccid_hc_tx_delete() at dismantle time - ipv4: fix NULL dereference in free_fib_info_rcu() - net_sched/sfq: update hierarchical backlog when drop packet - ipv4: better IP_MAX_MTU enforcement - sctp: fully initialize the IPv6 address in sctp_v6_to_addr() - tipc: fix use-after-free - ipv6: reset fn->rr_ptr when replacing route - ipv6: repair fib6 tree in failure case - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP - irda: do not leak initialized list.dev to userspace - net: sched: fix NULL pointer dereference when action calls some targets - net_sched: fix order of queue length updates in qdisc_replace() - mei: me: add broxton pci device ids - mei: me: add lewisburg device ids - Input: trackpoint - add new trackpoint firmware ID - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 - ALSA: core: Fix unexpected error at replacing user TLV - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978) - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses - i2c: designware: Fix system suspend - drm: Release driver tracking before making the object available again - drm/atomic: If the atomic check fails, return its value first - drm: rcar-du: lvds: Fix PLL frequency-related configuration - drm: rcar-du: lvds: Rename PLLEN bit to PLLON - drm: rcar-du: Fix crash in encoder failure error path - drm: rcar-du: Fix display timing controller parameter - drm: rcar-du: Fix H/V sync signal polarity configuration - tracing: Fix freeing of filter in create_filter() when set_str is false - cifs: Fix df output for users with quota limits - cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() - nfsd: Limit end of page list when decoding NFSv4 WRITE - perf/core: Fix group {cpu,task} validation - Bluetooth: hidp: fix possible might sleep error in hidp_session_thread - Bluetooth: cmtp: fix possible might sleep error in cmtp_session - Bluetooth: bnep: fix possible might sleep error in bnep_session - binder: use group leader instead of open thread - binder: Use wake up hint for synchronous transactions. - ANDROID: binder: fix proc->tsk check. - iio: imu: adis16480: Fix acceleration scale factor for adis16480 - iio: hid-sensor-trigger: Fix the race with user space powering up sensors - staging: rtl8188eu: add RNX-N150NUB support - ASoC: simple-card: don't fail if sysclk setting is not supported - ASoC: rsnd: disable SRC.out only when stop timing - ASoC: rsnd: avoid po
[Group.of.nepali.translators] [Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: New Bug description: == Comment: #0 - Carsten Jacobi - 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs]