[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux (Ubuntu Xenial) Status: Confirmed => New ** Changed in: linux (Ubuntu) Status: Confirmed => Fix Released ** Changed in: linux (Ubuntu Xenial) Status: New => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Fix Released Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: Invalid Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Released Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Released Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux (Ubuntu Xenial) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: Confirmed Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Released Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Released Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Tags added: cscc -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Released Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Released Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux - 4.15.0-47.50 --- linux (4.15.0-47.50) bionic; urgency=medium * linux: 4.15.0-47.50 -proposed tracker (LP: #1819716) * Packaging resync (LP: #1786013) - [Packaging] resync getabis - [Packaging] update helper scripts - [Packaging] resync retpoline extraction * C++ demangling support missing from perf (LP: #1396654) - [Packaging] fix a mistype * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162) - iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747) - nvme-pci: fix out of bounds access in nvme_cqe_pending * CVE-2019-9213 - mm: enforce min addr even if capable() in expand_downwards() * CVE-2019-3460 - Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt * amdgpu with mst WARNING on blanking (LP: #1814308) - drm/amd/display: Don't use dc_link in link_encoder - drm/amd/display: Move wait for hpd ready out from edp power control. - drm/amd/display: eDP sequence BL off first then DP blank. - drm/amd/display: Fix unused variable compilation error - drm/amd/display: Fix warning about misaligned code - drm/amd/display: Fix MST dp_blank REG_WAIT timeout * tun/tap: unable to manage carrier state from userland (LP: #1806392) - tun: implement carrier change * CVE-2019-8980 - exec: Fix mem leak in kernel_read_file * raw_skew in timer from the ubuntu_kernel_selftests failed on Bionic (LP: #1811194) - selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress * [Packaging] Allow overlay of config annotations (LP: #1752072) - [Packaging] config-check: Add an include directive * CVE-2019-7308 - bpf: move {prev_,}insn_idx into verifier env - bpf: move tmp variable into ax register in interpreter - bpf: enable access to ax register also from verifier rewrite - bpf: restrict map value pointer arithmetic for unprivileged - bpf: restrict stack pointer arithmetic for unprivileged - bpf: restrict unknown scalars of mixed signed bounds for unprivileged - bpf: fix check_map_access smin_value test when pointer contains offset - bpf: prevent out of bounds speculation on pointer arithmetic - bpf: fix sanitation of alu op with pointer / scalar type from different paths - bpf: add various test cases to selftests * CVE-2017-5753 - bpf: properly enforce index mask to prevent out-of-bounds speculation - bpf: fix inner map masking to prevent oob under speculation * BPF: kernel pointer leak to unprivileged userspace (LP: #1815259) - bpf/verifier: disallow pointer subtraction * squashfs hardening (LP: #1816756) - squashfs: more metadata hardening - squashfs metadata 2: electric boogaloo - squashfs: more metadata hardening - Squashfs: Compute expected length from inode size rather than block length * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982) - efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted * Update ENA driver to version 2.0.3K (LP: #1816806) - net: ena: update driver version from 2.0.2 to 2.0.3 - net: ena: fix race between link up and device initalization - net: ena: fix crash during failed resume from hibernation * ipset kernel error: 4.15.0-43-generic (LP: #1811394) - netfilter: ipset: Fix wraparound in hash:*net* types * Silent "Unknown key" message when pressing keyboard backlight hotkey (LP: #1817063) - platform/x86: dell-wmi: Ignore new keyboard backlight change event * CVE-2018-18021 - arm64: KVM: Tighten guest core register access from userspace - KVM: arm/arm64: Introduce vcpu_el1_is_32bit - arm64: KVM: Sanitize PSTATE.M when being set from userspace * CVE-2018-14678 - x86/entry/64: Remove %ebx handling from error_entry/exit * CVE-2018-19824 - ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c * CVE-2019-3459 - Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer * Bionic update: upstream stable patchset 2019-02-08 (LP: #1815234) - fork: unconditionally clear stack on fork - spi: spi-s3c64xx: Fix system resume support - Input: elan_i2c - add ACPI ID for lenovo ideapad 330 - Input: i8042 - add Lenovo LaVie Z to the i8042 reset list - Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST - kvm, mm: account shadow page tables to kmemcg - delayacct: fix crash in delayacct_blkio_end() after delayacct init failure - tracing: Fix double free of event_trigger_data - tracing: Fix possible double free in event_enable_trigger_func() - kthread, tracing: Don't expose half-written comm when creating kthreads - tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure - tracing: Quiet gcc warning about maybe unused link
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux - 4.18.0-17.18 --- linux (4.18.0-17.18) cosmic; urgency=medium * linux: 4.18.0-17.18 -proposed tracker (LP: #1819624) * Packaging resync (LP: #1786013) - [Packaging] resync getabis - [Packaging] update helper scripts * C++ demangling support missing from perf (LP: #1396654) - [Packaging] fix a mistype * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162) - iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747) - nvme-pci: fix out of bounds access in nvme_cqe_pending * CVE-2019-9003 - ipmi: fix use-after-free of user->release_barrier.rda * CVE-2019-9162 - netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs * CVE-2019-9213 - mm: enforce min addr even if capable() in expand_downwards() * CVE-2019-3460 - Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt * tun/tap: unable to manage carrier state from userland (LP: #1806392) - tun: implement carrier change * CVE-2019-8980 - exec: Fix mem leak in kernel_read_file * [Packaging] Allow overlay of config annotations (LP: #1752072) - [Packaging] config-check: Add an include directive * amdgpu with mst WARNING on blanking (LP: #1814308) - drm/amd/display: Fix MST dp_blank REG_WAIT timeout * CVE-2019-7308 - bpf: move {prev_,}insn_idx into verifier env - bpf: move tmp variable into ax register in interpreter - bpf: enable access to ax register also from verifier rewrite - bpf: restrict map value pointer arithmetic for unprivileged - bpf: restrict stack pointer arithmetic for unprivileged - bpf: restrict unknown scalars of mixed signed bounds for unprivileged - bpf: fix check_map_access smin_value test when pointer contains offset - bpf: prevent out of bounds speculation on pointer arithmetic - bpf: fix sanitation of alu op with pointer / scalar type from different paths - bpf: add various test cases to test_verifier - bpf: add various test cases to selftests * CVE-2017-5753 - bpf: fix inner map masking to prevent oob under speculation * Use memblock quirk instead of delayed allocation for GICv3 LPI tables (LP: #1816425) - efi/arm: Revert "Defer persistent reservations until after paging_init()" - arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982) - efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted * Update ENA driver to version 2.0.3K (LP: #1816806) - net: ena: update driver version from 2.0.2 to 2.0.3 - net: ena: fix race between link up and device initalization - net: ena: fix crash during failed resume from hibernation * Silent "Unknown key" message when pressing keyboard backlight hotkey (LP: #1817063) - platform/x86: dell-wmi: Ignore new keyboard backlight change event * CVE-2018-19824 - ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c * CVE-2019-3459 - Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer * CONFIG_TEST_BPF is disabled (LP: #1813955) - [Config]: Reenable TEST_BPF * installer does not support iSCSI iBFT (LP: #1817321) - d-i: add iscsi_ibft to scsi-modules * CVE-2019-7222 - KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) * CVE-2019-7221 - KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) * CVE-2019-6974 - kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) * hns3 nic speed may not match optical port speed (LP: #1817969) - net: hns3: Config NIC port speed same as that of optical module * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Lock srcu_data structure in srcu_gp_start() * libsas disks can have non-unique by-path names (LP: #1817784) - scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached * Bluetooth not working (Intel CyclonePeak) (LP: #1817518) - Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029 * CVE-2019-8912 - net: crypto set sk to NULL when af_alg_release. - net: socket: set sock->sk to NULL after calling proto_ops::release() * 4.18.0 thinkpad_acpi : thresholds for BAT1 not writable (LP: #1812099) - platform/x86: thinkpad_acpi: Fix multi-battery bug * [ALSA] [PATCH] System76 darp5 and oryp5 fixups (LP: #1815831) - ALSA: hda/realtek - Headset microphone support for System76 darp5 - ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5 * CVE-2019-8956 - sctp: walk the list of asoc safely * Constant noise in the headphone on Lenovo X1 machines (LP: #1817263) - ALSA: hda/realtek: Disable PC beep in passthrough on
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Updating bug tags to verification done. As mentioned by users in this LP bug, the verification period of 5 days is _usually_ not enough to reproduce this problem, however, we have some datapoints that support the fix is good. 1) The fix has been first delivery in linux-azure, 3 weeks ago, and has reportedly resolved the issue for @alanjcastonguay: the issue was experienced within 4 days at the most, and hasn't happened for 2 weeks in 8 nodes (which is statistically very positive; and it helps that the fix is not specific to -azure). 2) One of the users who reported this in linux (-generic), has verified a test kernel with this fix for weeks, based upon which the fix has been submitted after linux-azure had it. The same user has verified -proposed for about a week now, and it's looking good. 3) Users in this LP bug have been running the -proposed kernel in multiple nodes for about a week now too, and haven't hit the issue yet. On top of 1), with 2) and 3) combined, and the schedule for -proposed verification, this seems to be a reasonable compromise between results and test time. cheers, Mauricio ** Tags removed: verification-needed-bionic verification-needed-cosmic ** Tags added: verification-done-bionic verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
I was experiencing a situation with moby dockerd entering a state similar to comment 0 here, running kubernetes and linux kernel 4.15.0-1037-azure. This was an 8 node cluster. Observed with combinations: kubernetes 1.11.5 + moby runtime 3.0.1 + Ubuntu 16.04.5 kubernetes 1.11.7 + moby runtime 3.0.4 + Ubuntu 16.04.10 The longest window between outages prior was 4 days, with the shortest being less than a day. I have observed 2 weeks of uptime on 8 nodes without observation of the original symptoms since upgrading the kubernetes node kernel to 4.15.0-1040-azure. I am confident the kernel patch has resolved our problem. Ref https://github.com/moby/moby/issues/38750 and https://github.com/Azure/AKS/issues/838, both closed. ** Bug watch added: github.com/moby/moby/issues #38750 https://github.com/moby/moby/issues/38750 ** Bug watch added: github.com/Azure/AKS/issues #838 https://github.com/Azure/AKS/issues/838 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist,
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
As Marius mentioned we already deployed this patch on several instances and now we are monitoring them to see if this still happens. At least 2 of them were affected by the bug before patching so we had to reboot them. If the above information (previous post) is correct and considering the time until this might reproduce or not, I think you might want to include this fix into existing release cycle. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
@mfo Just to confirm that we installed the right proposed version of the kernel as this doesn't have any steps to reproduce. It only reproduces randomly after a certain amount of time. These are the steps we followed to install echo 'deb http://archive.ubuntu.com/ubuntu/ xenial-proposed restricted main multiverse universe' > /etc/apt/sources.list.d/kernel.list echo -e "Package: *\nPin: release a=xenial-proposed\nPin-Priority: 400" > /etc/apt/preferences.d/proposed-updates apt-get -t xenial-proposed install -y linux-image-unsigned-4.15.0-47-generic reboot Installed packages dpkg -l | grep -i 4.15.0-47 ii linux-image-unsigned-4.15.0-47-generic 4.15.0-47.50~16.04.1 amd64Linux kernel image for version 4.15.0 on 64 bit x86 SMP ii linux-modules-4.15.0-47-generic4.15.0-47.50~16.04.1 amd64Linux kernel extra modules for version 4.15.0 on 64 bit x86 SMP dpkg -l | grep -i hwe ii linux-generic-hwe-16.044.15.0.45.66 amd64Complete Generic Linux kernel and headers ii linux-headers-generic-hwe-16.044.15.0.45.66 amd64Generic Linux kernel headers ii linux-image-generic-hwe-16.04 4.15.0.45.66 amd64Generic Linux kernel image uname -a Linux myhostname 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Thanks Mauricio! @mfo Will begin deploying this and let you guys know as soon as possible. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Marius @lazamarius1, Per the kernel.ubuntu.com schedule, the version for Bionic/linux -> Xenial/linux-hwe should land soon. You can verify the version/timestamps for each package/release at the bottom of these pages (the linux-hwe version comes a bit after the corresponding linux version) https://launchpad.net/ubuntu/+source/linux https://launchpad.net/ubuntu/+source/linux-hwe As far as testing, yes, this issue might take longer to reproduce, but initial testing from another user that happened in order to first submit the fix to Ubuntu showed good results, so it's previously good sign of it invidivudally. The integration of it with other fixes, i.e., testing with it in -proposed, will be done by that other user as well, so collectively w/ your testing that might increase chances of the issue still happening or not. There's also regression testing of the kernel builds, which can spot failures, so that collaborates too. Hope this helps, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
@lazamarius1, Actually linux-hwe for Bionic with this fix has just been uploaded. See in https://launchpad.net/ubuntu/+source/linux-hwe Changelog linux-hwe (4.15.0-47.50~16.04.1) xenial; urgency=medium ... * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Prohibit call_srcu() use under raw spinlocks - srcu: Lock srcu_data structure in srcu_gp_start() ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Brad @brad-figg, 5 days are not enough to test this bug. Citing David: "@overlord: AFAIK, there is no simple reproducer test case for this issue. The ideal testing scenario for a bug and fix like this one would be for each user who reported this issue to use a test kernel with the fix in their environment and report back if the issue still manifests or not after some time has passed with that test kernel in your affected environment." In most of the cases systems showed the bug symptoms after 40+ days uptime. Plus, is there a xenial fix for linux-hwe yet? We really need this patch in 4.15 kernel. Thanks, Marius -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed- cosmic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux-azure - 4.15.0-1040.44 --- linux-azure (4.15.0-1040.44) xenial; urgency=medium * linux-azure: 4.15.0-1040.44 -proposed tracker (LP: #1817038) * Packaging resync (LP: #1786013) - [Packaging] resync retpoline extraction * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure (LP: #1813866) - [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE - [Config] Update configs * Allow I/O schedulers to be loaded with modprobe in linux-azure (LP: #1813211) - [Config] linux-azure: Enable all IO schedulers as modules * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Prohibit call_srcu() use under raw spinlocks - srcu: Lock srcu_data structure in srcu_gp_start() [ Ubuntu: 4.15.0-46.49 ] * linux: 4.15.0-46.49 -proposed tracker (LP: #1814726) * mprotect fails on ext4 with dax (LP: #1799237) - x86/speculation/l1tf: Exempt zeroed PTEs from inversion * kernel BUG at /build/linux-vxxS7y/linux-4.15.0/mm/slub.c:296! (LP: #1812086) - iscsi target: fix session creation failure handling - scsi: iscsi: target: Set conn->sess to NULL when iscsi_login_set_conn_values fails - scsi: iscsi: target: Fix conn_ops double free * user_copy in user from ubuntu_kernel_selftests failed on KVM kernel (LP: #1812198) - selftests: user: return Kselftest Skip code for skipped tests - selftests: kselftest: change KSFT_SKIP=4 instead of KSFT_PASS - selftests: kselftest: Remove outdated comment * RTL8822BE WiFi Disabled in Kernel 4.18.0-12 (LP: #1806472) - SAUCE: staging: rtlwifi: allow RTLWIFI_DEBUG_ST to be disabled - [Config] CONFIG_RTLWIFI_DEBUG_ST=n - SAUCE: Add r8822be to signature inclusion list * kernel oops in bcache module (LP: #1793901) - SAUCE: bcache: never writeback a discard operation * CVE-2018-18397 - userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails - userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem - userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas - userfaultfd: shmem: add i_size checks - userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set * Ignore "incomplete report" from Elan touchpanels (LP: #1813733) - HID: i2c-hid: Ignore input report if there's no data present on Elan touchpanels * Vsock connect fails with ENODEV for large CID (LP: #1813934) - vhost/vsock: fix vhost vsock cid hashing inconsistent * SRU: Fix thinkpad 11e 3rd boot hang (LP: #1804604) - ACPI / LPSS: Force LPSS quirks on boot * Bionic update: upstream stable patchset 2019-01-17 (LP: #1812229) - scsi: sd_zbc: Fix variable type and bogus comment - KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. - x86/apm: Don't access __preempt_count with zeroed fs - x86/events/intel/ds: Fix bts_interrupt_threshold alignment - x86/MCE: Remove min interval polling limitation - fat: fix memory allocation failure handling of match_strdup() - ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk - ARCv2: [plat-hsdk]: Save accl reg pair by default - ARC: Fix CONFIG_SWAP - ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs - ARC: mm: allow mprotect to make stack mappings executable - mm: memcg: fix use after free in mem_cgroup_iter() - mm/huge_memory.c: fix data loss when splitting a file pmd - cpufreq: intel_pstate: Register when ACPI PCCH is present - vfio/pci: Fix potential Spectre v1 - stop_machine: Disable preemption when waking two stopper threads - drm/i915: Fix hotplug irq ack on i965/g4x - drm/nouveau: Use drm_connector_list_iter_* for iterating connectors - drm/nouveau: Avoid looping through fake MST connectors - gen_stats: Fix netlink stats dumping in the presence of padding - ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns - ipv6: fix useless rol32 call on hash - ipv6: ila: select CONFIG_DST_CACHE - lib/rhashtable: consider param->min_size when setting initial table size - net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort - net: Don't copy pfmemalloc flag in __copy_skb_header() - skbuff: Unconditionally copy pfmemalloc in __skb_clone() - net/ipv4: Set oif in fib_compute_spec_dst - net: phy: fix flag masking in __set_phy_supported - ptp: fix missing break in switch - qmi_wwan: add support for Quectel EG91 - tg3: Add higher cpu clock for 5762. - hv_netvsc: Fix napi reschedule while receive completion is busy - net/mlx4_en: Don't reuse RX page when XDP is set - net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite - ipv6: make DAD fail with enhanced DAD when nonce length differs - net: usb: asix: replace mii_nway_restart in resume path - alpha: fix osf_wait4() breakage -
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux-azure - 4.15.0-1040.44 --- linux-azure (4.15.0-1040.44) xenial; urgency=medium * linux-azure: 4.15.0-1040.44 -proposed tracker (LP: #1817038) * Packaging resync (LP: #1786013) - [Packaging] resync retpoline extraction * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure (LP: #1813866) - [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE - [Config] Update configs * Allow I/O schedulers to be loaded with modprobe in linux-azure (LP: #1813211) - [Config] linux-azure: Enable all IO schedulers as modules * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Prohibit call_srcu() use under raw spinlocks - srcu: Lock srcu_data structure in srcu_gp_start() [ Ubuntu: 4.15.0-46.49 ] * linux: 4.15.0-46.49 -proposed tracker (LP: #1814726) * mprotect fails on ext4 with dax (LP: #1799237) - x86/speculation/l1tf: Exempt zeroed PTEs from inversion * kernel BUG at /build/linux-vxxS7y/linux-4.15.0/mm/slub.c:296! (LP: #1812086) - iscsi target: fix session creation failure handling - scsi: iscsi: target: Set conn->sess to NULL when iscsi_login_set_conn_values fails - scsi: iscsi: target: Fix conn_ops double free * user_copy in user from ubuntu_kernel_selftests failed on KVM kernel (LP: #1812198) - selftests: user: return Kselftest Skip code for skipped tests - selftests: kselftest: change KSFT_SKIP=4 instead of KSFT_PASS - selftests: kselftest: Remove outdated comment * RTL8822BE WiFi Disabled in Kernel 4.18.0-12 (LP: #1806472) - SAUCE: staging: rtlwifi: allow RTLWIFI_DEBUG_ST to be disabled - [Config] CONFIG_RTLWIFI_DEBUG_ST=n - SAUCE: Add r8822be to signature inclusion list * kernel oops in bcache module (LP: #1793901) - SAUCE: bcache: never writeback a discard operation * CVE-2018-18397 - userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails - userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem - userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas - userfaultfd: shmem: add i_size checks - userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set * Ignore "incomplete report" from Elan touchpanels (LP: #1813733) - HID: i2c-hid: Ignore input report if there's no data present on Elan touchpanels * Vsock connect fails with ENODEV for large CID (LP: #1813934) - vhost/vsock: fix vhost vsock cid hashing inconsistent * SRU: Fix thinkpad 11e 3rd boot hang (LP: #1804604) - ACPI / LPSS: Force LPSS quirks on boot * Bionic update: upstream stable patchset 2019-01-17 (LP: #1812229) - scsi: sd_zbc: Fix variable type and bogus comment - KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. - x86/apm: Don't access __preempt_count with zeroed fs - x86/events/intel/ds: Fix bts_interrupt_threshold alignment - x86/MCE: Remove min interval polling limitation - fat: fix memory allocation failure handling of match_strdup() - ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk - ARCv2: [plat-hsdk]: Save accl reg pair by default - ARC: Fix CONFIG_SWAP - ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs - ARC: mm: allow mprotect to make stack mappings executable - mm: memcg: fix use after free in mem_cgroup_iter() - mm/huge_memory.c: fix data loss when splitting a file pmd - cpufreq: intel_pstate: Register when ACPI PCCH is present - vfio/pci: Fix potential Spectre v1 - stop_machine: Disable preemption when waking two stopper threads - drm/i915: Fix hotplug irq ack on i965/g4x - drm/nouveau: Use drm_connector_list_iter_* for iterating connectors - drm/nouveau: Avoid looping through fake MST connectors - gen_stats: Fix netlink stats dumping in the presence of padding - ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns - ipv6: fix useless rol32 call on hash - ipv6: ila: select CONFIG_DST_CACHE - lib/rhashtable: consider param->min_size when setting initial table size - net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort - net: Don't copy pfmemalloc flag in __copy_skb_header() - skbuff: Unconditionally copy pfmemalloc in __skb_clone() - net/ipv4: Set oif in fib_compute_spec_dst - net: phy: fix flag masking in __set_phy_supported - ptp: fix missing break in switch - qmi_wwan: add support for Quectel EG91 - tg3: Add higher cpu clock for 5762. - hv_netvsc: Fix napi reschedule while receive completion is busy - net/mlx4_en: Don't reuse RX page when XDP is set - net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite - ipv6: make DAD fail with enhanced DAD when nonce length differs - net: usb: asix: replace mii_nway_restart in resume path - alpha: fix osf_wait4() breakage -
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Andrei, I will discuss with engineering to confirm availability and get back to you -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Do you know when a fix for linux-xenial will be available? Looks like that's the only one remaining. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This is the final reminder to please verify that the kernel in -proposed resolves the issue for which you've filed this bug report. Canonical is planning to release these kernels early next week. Thank you in advance! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux (Ubuntu Bionic) Status: Confirmed => Fix Committed ** Changed in: linux (Ubuntu Cosmic) Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux-azure - 4.18.0-1011.11 --- linux-azure (4.18.0-1011.11) cosmic; urgency=medium * linux-azure: 4.18.0-1011.11 -proposed tracker (LP: #1816081) * 4.15.0-1037 does not see all PCI devices on GPU VMs (LP: #1816106) - Revert "PCI: hv: Make sure the bus domain is really unique" linux-azure (4.18.0-1009.9) cosmic; urgency=medium * Allow I/O schedulers to be loaded with modprobe in linux-azure (LP: #1813211) - [Config] linux-azure: Enable all IO schedulers as modules * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Lock srcu_data structure in srcu_gp_start() * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure (LP: #1813866) - [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE [ Ubuntu: 4.18.0-15.16 ] * Ubuntu boot failure. 4.18.0-14 boot stalls. (does not boot) (LP: #1814555) - Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5" * Userspace break as a result of missing patch backport (LP: #1813873) - tty: Don't hold ldisc lock in tty_reopen() if ldisc present -- Stefan Bader Fri, 15 Feb 2019 17:16:24 +0100 ** Changed in: linux-azure (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Xenial) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Committed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux-azure - 4.18.0-1011.11 --- linux-azure (4.18.0-1011.11) cosmic; urgency=medium * linux-azure: 4.18.0-1011.11 -proposed tracker (LP: #1816081) * 4.15.0-1037 does not see all PCI devices on GPU VMs (LP: #1816106) - Revert "PCI: hv: Make sure the bus domain is really unique" linux-azure (4.18.0-1009.9) cosmic; urgency=medium * Allow I/O schedulers to be loaded with modprobe in linux-azure (LP: #1813211) - [Config] linux-azure: Enable all IO schedulers as modules * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Lock srcu_data structure in srcu_gp_start() * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure (LP: #1813866) - [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE [ Ubuntu: 4.18.0-15.16 ] * Ubuntu boot failure. 4.18.0-14 boot stalls. (does not boot) (LP: #1814555) - Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5" * Userspace break as a result of missing patch backport (LP: #1813873) - tty: Don't hold ldisc lock in tty_reopen() if ldisc present -- Stefan Bader Fri, 15 Feb 2019 17:16:24 +0100 ** Changed in: linux-azure (Ubuntu Cosmic) Status: Fix Committed => Fix Released ** Changed in: linux-azure (Ubuntu Cosmic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug was fixed in the package linux-azure - 4.18.0-1011.11 --- linux-azure (4.18.0-1011.11) cosmic; urgency=medium * linux-azure: 4.18.0-1011.11 -proposed tracker (LP: #1816081) * 4.15.0-1037 does not see all PCI devices on GPU VMs (LP: #1816106) - Revert "PCI: hv: Make sure the bus domain is really unique" linux-azure (4.18.0-1009.9) cosmic; urgency=medium * Allow I/O schedulers to be loaded with modprobe in linux-azure (LP: #1813211) - [Config] linux-azure: Enable all IO schedulers as modules * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Lock srcu_data structure in srcu_gp_start() * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure (LP: #1813866) - [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE [ Ubuntu: 4.18.0-15.16 ] * Ubuntu boot failure. 4.18.0-14 boot stalls. (does not boot) (LP: #1814555) - Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5" * Userspace break as a result of missing patch backport (LP: #1813873) - tty: Don't hold ldisc lock in tty_reopen() if ldisc present -- Stefan Bader Fri, 15 Feb 2019 17:16:24 +0100 ** Changed in: linux-azure (Ubuntu Bionic) Status: Fix Committed => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5715 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5753 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5754 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-14625 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-14633 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-15471 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-16882 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-18653 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-18710 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-18955 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-19407 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-5391 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-6559 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-7755 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-9363 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
@lazamarius1: Just to clarify, the fix is scheduled to go in the 4.15 kernel in Bionic which is the same kernel as the Xenial HWE kernel. So there's no need to add anything to the Affects section. You will see a new linux-hwe 4.15 kernel in xenial-proposed once this is ready to test. Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi @lazamarius1, The fix for linux generic should be applied in the next kernel SRU cycle. The current cycle ends on late February [1]. [1] https://kernel.ubuntu.com/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi @brad-figg, Can we get a proposed fix for Xenial linux(-generic) package? Or that is planned after the bionic one? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Marcelo (@mhcerri), We have another user who confirmed the 2 patches submitted for linux- azure also fix the problem on linux(-generic). srcu: Prohibit call_srcu() use under raw spinlocks srcu: Lock srcu_data structure in srcu_gp_start() Could they be submitted for linux as well? Thank you very much, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu Bionic) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu Cosmic) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Confirmed Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** No longer affects: linux-hwe (Ubuntu) ** No longer affects: linux-hwe (Ubuntu Bionic) ** No longer affects: linux-hwe (Ubuntu Cosmic) ** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** No longer affects: linux-azure (Ubuntu Bionic) ** No longer affects: linux-azure (Ubuntu Cosmic) ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Bionic) Status: New => Fix Committed ** Changed in: linux-azure (Ubuntu Cosmic) Status: New => Fix Committed ** Changed in: linux-azure (Ubuntu Bionic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Cosmic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Incomplete Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Incomplete Status in linux-azure source package in Bionic: Fix Committed Status in linux source package in Cosmic: Incomplete Status in linux-azure source package in Cosmic: Fix Committed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
You can clone the ubuntu-xenial kernel: git clone git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git And then grep for the commit you're looking for. There's a few different ways to do it, I do: git log --oneline | grep "Expose SMT control init function" -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Thanks David, that was my intention to patch the systems and wait for it to reproduce (usually we get the issue back in 3-8 days or so...) Also thanks for the pointers for checking the release notes for certain releases. Is there a connection between the changes listed in the second link and the code base for those changes? I mean to say maybe the LP bug does not appear in the changes because it was not tagged with a correct tag for it to show up but maybe the code fix is already there ? Does this make sense? I know the code fix it's already available starting with 4.19.16 but I don't know how the back-porting is handled. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi overlord. An easy way to check the updates included in released kernels is to look at the "-changes" mailing list for your Ubuntu release. In this situation it would be https://lists.ubuntu.com/archives/xenial- changes/ And you can find that new kernel here: https://lists.ubuntu.com/archives/xenial- changes/2019-February/023480.html You can see this kernel 4.15.0-45 does not include the fix for this LP bug #1802021. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Mauricio @mfo, is there a way to reproduce the issue easily in order to test it? I was not able to find it. The only way I can tell the issue is there or not is to apply the patch and wait for the servers to "hit" the problem (could take days or weeks...), and when that happens, in my case, docker tasks will end up in D state and load average will go to 100 very fast, then also a certain kworker will hit the D state and possibly in the end the init/systemd will go also in D state, and the only recovery action is to restart the box. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
@overlord: AFAIK, there is no simple reproducer test case for this issue. The ideal testing scenario for a bug and fix like this one would be for each user who reported this issue to use a test kernel with the fix in their environment and report back if the issue still manifests or not after some time has passed with that test kernel in your affected environment. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Thanks Marcelo! However I still see the unassigned package is linux-hwe, and I can't add Xenial tag. I also notices I received an update today from 4.15.0-43-generic to 4.15.0-45-generic but I cannot tell if this update has the fix from this bug. Could you help me with some directions on this? Where could I check ? Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Marcelo @mhcerri, Would you be able to provide a test kernel for bionic/linux-hwe so that @lazamarius1 can provide test results for -generic? I'll be happy to do that as well if you're short on time right now. (I guess the patchset is the same you posted for linux-azure.) Thanks, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi, @overlord. I changed it to "linux" instead because xenial/linux-hwe is simply a backport of bionic/linux, so we need to apply the fix to bionic/linux first and that will be include to xenial/linux-hwe automatically. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hey guys, I am not sure how to do this but can you also make a patch for Xenial, 4.15.0-generic, for linux-hwe package? We have the same problem and we can't upgrade the boxes to a newer release so this would really help us. ** Package changed: linux (Ubuntu) => linux-hwe (Ubuntu) ** Changed in: linux-hwe (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Can't run the logs collection tool since the system is stuck in D state and times out. (Also the tool is not installed on the systems). We just need this fix back-ported on 4.15.0-generic for Xenial on linux-hwe package (we don't use linux-azure) ** Changed in: linux-hwe (Ubuntu Cosmic) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Can't run the logs collection tool since the system is stuck in D state and times out. (Also the tool is not installed on the systems). We just need this fix back-ported on 4.15.0-generic for Xenial on linux-hwe package (we don't use linux-azure) ** Changed in: linux-hwe (Ubuntu Bionic) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: In Progress Status in linux-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: In Progress Status in linux-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
https://lists.ubuntu.com/archives/kernel-team/2019-February/098264.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Incomplete Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: Incomplete Status in linux-azure source package in Bionic: In Progress Status in linux source package in Cosmic: Incomplete Status in linux-azure source package in Cosmic: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** No longer affects: linux-azure (Ubuntu Bionic) ** No longer affects: linux-azure (Ubuntu Cosmic) ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Cosmic) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Bionic) Status: New => In Progress ** Changed in: linux-azure (Ubuntu Cosmic) Status: New => In Progress ** Changed in: linux-azure (Ubuntu Bionic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Cosmic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: linux-azure (Ubuntu Cosmic) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: New Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: New Status in linux-azure source package in Bionic: In Progress Status in linux source package in Cosmic: New Status in linux-azure source package in Cosmic: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock.
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** No longer affects: linux-meta-hwe (Ubuntu) ** No longer affects: linux-meta-hwe (Ubuntu Bionic) ** No longer affects: linux-meta-hwe (Ubuntu Cosmic) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-meta-hwe (Ubuntu Cosmic) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-meta-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-meta-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: Confirmed Status in linux-meta-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Package changed: linux-hwe (Ubuntu) => linux-meta-hwe (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-meta-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-meta-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: Confirmed Status in linux-meta-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-meta-hwe (Ubuntu Bionic) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-meta-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-meta-hwe source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: Confirmed Status in linux-meta-hwe source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux-hwe (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-hwe source package in Bionic: New Status in linux-azure source package in Cosmic: Confirmed Status in linux-hwe source package in Cosmic: New Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Also affects: linux-hwe (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-hwe package in Ubuntu: New Status in linux-azure source package in Bionic: Confirmed Status in linux-hwe source package in Bionic: New Status in linux-azure source package in Cosmic: Confirmed Status in linux-hwe source package in Cosmic: New Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Also affects: linux-azure (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Bionic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Cosmic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux-azure (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: linux-azure (Ubuntu) Status: Triaged => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Confirmed Status in linux-azure source package in Bionic: Confirmed Status in linux-azure source package in Cosmic: Confirmed Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Yes, please, per comment #8 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi, Josh. Should we apply the fixes for the 4.15 and 4.18 linux-azure kernel then? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
The fix was picked up for upstream stable 4.19.15 and 4.20.2. I would expect the generic kernels to eventually pick up this fix. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Using Ubuntu 16.04.5 LTS btw. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Will this fix be available for Linux 4.15.0-generic x86_64, or is it available already? I am currently on Linux 4.15.0-43-generic x86_64 and on some servers I have this issue, others are fine, I am not sure what triggers the problem but when it triggers kworker, dockerd, systemd, go in uninterruptible sleep and I need to reboot the servers to recover from the issue. After a while the issue reappears, so I would like to patch the servers as fast as possible. Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
No new instances of the problem on the test cluster for many weeks. Let's move forward with this change. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi, Josh. Did you have any feedback from the customer regarding the test kernel? How do you want to proceed with that? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux-azure (Ubuntu) Assignee: Joseph Salisbury (jsalisbury) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu) Status: In Progress => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Joshua, I will touched base with Gavin to compare our trees. My test kernel is the current Azure kernel with two commits applied: d633198088bd9 and eb4c2382272ae7. Commit eb4c2382272ae7 being the patch from Dennis Krein in linux-next: eb4c2382272a ("srcu: Lock srcu_data structure in srcu_gp_start()") Gavin and I both had the same set of commits. I can submit and SRU request for this if you don't want to wait for the testing, since it could take a long time. If I submit it this week, it won't land in the Azure kernel until the next SRU cycle in the new year. Just let us know what you think. Here are the dates for the next cycle: cycle: 14-Jan through 03-Feb 11-Jan Last day for kernel commits for this cycle. 14-Jan - 18-Jan Kernel prep week. 21-Jan - 01-Feb Bug verification & Regression testing. 31-Jan Release 18.04.2 kernels to -updates 04-Feb Release remaining kernels to -updates. Looking at these dates, we may want to SRU it this week, due to a Company shutdown between 24-Dec and 06-Jan. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Joshua, I just reviewed the commit Joseph provided in the launchpad. It's the same as the two patches I backported. eb4c2382272a srcu: Lock srcu_data structure in srcu_gp_start() d633198088bd srcu: Prohibit call_srcu() use under raw spinlocks The commit id eb4c2382272a is the latest linux-next commit id. When I backported the above two patches, the eb4c2382272a was in Paul's rcu-next tree. So, for the SRU process, Joseph's backport are the formal one. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Joe, is your kernel different than the one Gavin Guo built for us here? https://launchpad.net/~mimi0213kimo/+archive/ubuntu/sf00204509-rcu- backport We have enabled extra debugging and given that kernel to our internal customer who attempting to repro. Since the repro takes a very long time it is difficult to decide when the fix is working or not. Paul McKenney upstream has submitted a pull request for this patch (and others) to go into 4.21. Getting some "burn in" time upstream hasn't really started in earnest yet, but there is no negative discussion about the PR, and I am tempted to get this into the regular based on Paul's comment: "Lock srcu_data structure in srcu_gp_start(), fixing a an extremely rare but also extremely embarrassing concurrency bug, courtesy of Dennis Krein." -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Tags added: bjf ** Tags removed: bjf ** Tags added: bjf-tracking -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
I built a test kernel with commit eb4c2382272ae7 from linux-next. This commit relies on commit d633198088bd9 for the definition of spin_lock_rcu_node and it's corresponding unlock. That commit was added to mainline in v4.16-rc1 The test kernel can be downloaded from: http://kernel.ubuntu.com/~jsalisbury/lp1802021 It sounds like this bug is difficult to reproduce, but it would be great if the affected customer was willing to test this kernel. Note about installing test kernels: • If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages. • If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages. Thanks in advance! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to:
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Changed in: linux-azure (Ubuntu) Assignee: (unassigned) => Joseph Salisbury (jsalisbury) ** Changed in: linux-azure (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
It's a heavy database workload from a online site, so it is difficult to make a simple repro for. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi, Josh. Do you have an specific workload that triggers that issue? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
** Tags added: kernel-da-key kernel-hyper-v ** Changed in: linux-azure (Ubuntu) Importance: Undecided => Medium ** Changed in: linux-azure (Ubuntu) Status: Confirmed => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux-azure package in Ubuntu: Triaged Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp