** Also affects: linux (Ubuntu) Importance: Undecided Status: New
** No longer affects: linux-azure (Ubuntu Bionic) ** No longer affects: linux-azure (Ubuntu Cosmic) ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Cosmic) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Bionic) Status: New => In Progress ** Changed in: linux-azure (Ubuntu Cosmic) Status: New => In Progress ** Changed in: linux-azure (Ubuntu Bionic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Cosmic) Assignee: (unassigned) => Marcelo Cerri (mhcerri) ** Changed in: linux-azure (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: linux-azure (Ubuntu Cosmic) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: New Status in linux-azure package in Ubuntu: Confirmed Status in linux source package in Bionic: New Status in linux-azure source package in Bionic: In Progress Status in linux source package in Cosmic: New Status in linux-azure source package in Cosmic: In Progress Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x80000000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev&id=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp