Hi, Adam Zabrocki, CC'ed here, informed me of the following:
There is an undesirable situation in SMP Linux machines when sending an IPI via the smp_call_function_single() API: /* * smp_call_function_single - Run a function on a specific CPU * @func: The function to run. This must be fast and non-blocking. * @info: An arbitrary pointer to pass to the function. * @wait: If true, wait until function has completed on other CPUs. * * Returns 0 on success, else a negative status code. */ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait) It runs "func" on a specific CPU, and if the "wait" flag is true, it waits until "func" has completed work before returning to the caller. This is a useful feature used to control when it is safe to continue and potentially parse data generated by another CPU. Unfortunately, Linux can kill the task waiting on its smp_call_function_single(). The NMI watchdog is detecting a "soft lockup" on the CPU which calls smp_call_function_single() with the "wait" flag set. In practice it is not always the desirable behavior. It is possible that the target CPU of the IPI was non-responsive and would never (or not in time) pick up the IPI sent to it. From the perspective of the task sending the IPI, the target CPU never finished the work, so smp_call_function_single() continues its busy wait loop. In the meantime, the NMI watchdog can kill the busy-waiting task instead of somehow "unlocking" the CPU that did not handle the IPI in time. This is often visible in the logs like e.g.: Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [thermald:714] ... Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald Tainted: G W OEL 4.2.0-30-generic #36-Ubuntu Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: ffff88003776e040 ti: ffff880038390000 task.ti: ffff880038390000 Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[<ffffffff810fc9d8>] [<ffffffff810fc9d8>] smp_call_function_single+0xd8/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:ffff880038393c98 EFLAGS: 00000202 Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000292 Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: ffff880038393ce8 R08: 0000000000000000 R09: 0000000000000001 Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: ffff8800175de000 R11: ffff880035123180 R12: ffff880034c1c2d0 Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0000000000000292 R14: 0000000000000292 R15: ffff880038393c38 Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS: 00007fa118895700(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 00007fabda352000 CR3: 0000000034c22000 CR4: 00000000001406e0 Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack: Jul 18 00:43:06 ubuntu kernel: [1578895.078510] 0000000000000000 ffffffff8121e2c4 ffff880038393cd8 0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078513] ffffffff813f86b0 ffff880038393d00 0000000000000003 000000005a1685d7 Jul 18 00:43:06 ubuntu kernel: [1578895.078515] ffff880038393d4c ffff880038393d48 ffff880038393d38 ffffffff813f884d Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace: Jul 18 00:43:06 ubuntu kernel: [1578895.078526] [<ffffffff8121e2c4>] ? mntput+0x24/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078532] [<ffffffff813f86b0>] ? ucs2_strncmp+0x50/0x50 Jul 18 00:43:06 ubuntu kernel: [1578895.078642] [<ffffffff813f884d>] rdmsr_on_cpu+0x5d/0x90 Jul 18 00:43:06 ubuntu kernel: [1578895.078650] [<ffffffffc030b2e7>] show_temp+0xa7/0xe0 [coretemp] Jul 18 00:43:06 ubuntu kernel: [1578895.078656] [<ffffffff8151d1d0>] dev_attr_show+0x20/0x50 Jul 18 00:43:06 ubuntu kernel: [1578895.078662] [<ffffffff817f0c16>] ? mutex_lock+0x16/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078667] [<ffffffff8127a3dc>] sysfs_kf_seq_show+0xbc/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078669] [<ffffffff81278c13>] kernfs_seq_show+0x23/0x30 Jul 18 00:43:06 ubuntu kernel: [1578895.078672] [<ffffffff81221c9c>] seq_read+0xec/0x390 Jul 18 00:43:06 ubuntu kernel: [1578895.078675] [<ffffffff8127953a>] kernfs_fop_read+0x10a/0x160 Jul 18 00:43:06 ubuntu kernel: [1578895.078679] [<ffffffff81326a00>] ? security_file_permission+0xa0/0xc0 Jul 18 00:43:06 ubuntu kernel: [1578895.078682] [<ffffffff811fd5b8>] __vfs_read+0x18/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078685] [<ffffffff811fdb87>] vfs_read+0x87/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078687] [<ffffffff811fe905>] SyS_read+0x55/0xc0 Jul 18 00:43:07 ubuntu kernel: [1578895.078690] [<ffffffff817f2d72>] entry_SYSCALL_64_fastpath+0x16/0x75 Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 12 fe ff ff 8b 55 e0 83 e2 01 74 cf f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 f0 6c df 00 85 c0 75 85 80 In this specific case, the "thermald" process, which was trying to send an IPI via smp_call_function_single(), is not the most problematic one, but it has been killed. This problem can manifest itself in several different ways. Imagine the following situation: CPU0: CPU1: -- work -- -- work -- -- work -- -- STUCK / bug -- -- work -- ... -- send IPI -- ... -- busy loop -- ... ... ... -- killed by NMI watchdog -- ... -- new work -- ... -- new work -- -- killed by watchdog -- In this situation you will see two task dumps in the logs: first for the smp_call_function_single() busy loop task, and then for the IPI target task, with different "stuck" time for the CPU/core, e.g.: Jul 12 14:32:32 ubuntu kernel: [1110493.211223] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u128:1:23379] ... Jul 12 14:32:32 ubuntu kernel: [1110502.681253] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 172s! [swapper/0:0] (As an aside, it is unclear to us why both were syslogged with exactly the same timestamp whereas their kernel's timestamps differ by 9 seconds. We didn't notice this detail until much later.) It is also possible to "mask" the problematic task/CPU in the following scenario: CPU0: CPU1: -- work -- -- work -- -- work -- -- STUCK / bug -- -- work -- ... -- send IPI -- ... -- busy loop -- ... ... ... -- killed by NMI watchdog -- ... -- new work -- ... -- new work -- -- CPU unstuck -- -- new work -- -- continue work -- In that case, before the watchdog killed the problematic task on CPU1, it "recovered" on its own and continued normal work. But at the same time the NMI watchdog killed the task which sent the IPI and was waiting for it to finish. In that case, in the logs you will only see a dump of the task that was busy-waiting, and nothing about the original culprit. In the source code: 270 int smp_call_function_single(int cpu, smp_call_func_t func, void *info, 271 int wait) 272 { 273 struct call_single_data *csd; 274 struct call_single_data csd_stack = { .flags = CSD_FLAG_LOCK | CSD_FLAG_SYNCHRONOUS }; 275 int this_cpu; 276 int err; 277 278 /* 279 * prevent preemption and reschedule on another processor, 280 * as well as CPU removal 281 */ 282 this_cpu = get_cpu(); 283 284 /* 285 * Can deadlock when called with interrupts disabled. 286 * We allow cpu's that are not yet online though, as no one else can 287 * send smp call function interrupt to this cpu and as such deadlocks 288 * can't happen. 289 */ 290 WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() 291 && !oops_in_progress); 292 293 csd = &csd_stack; 294 if (!wait) { 295 csd = this_cpu_ptr(&csd_data); 296 csd_lock(csd); 297 } 298 299 err = generic_exec_single(cpu, csd, func, info); 300 301 if (wait) 302 csd_lock_wait(csd); 303 304 put_cpu(); 305 306 return err; 307 } 308 EXPORT_SYMBOL(smp_call_function_single); csd_lock_wait() is defined as: 108 static __always_inline void csd_lock_wait(struct call_single_data *csd) 109 { 110 smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK)); 111 } Next: 318 #define smp_cond_acquire(cond) do { \ 319 while (!(cond)) \ 320 cpu_relax(); \ 321 smp_rmb(); /* ctrl + rmb := acquire */ \ 322 } while (0) so it is calling cpu_relax() until the condition is met. On the x86 architecture, it's the following assembly stub: 564 asm volatile("rep; nop" ::: "memory"); smp_call_function_single()'s use of get_cpu() internally prevents the core from being preemptive. During testing, 20 Linux VMs with different distributions / kernel versions were run under the VmWare hypervisor. The most common "trouble maker" tasks were: - swapper: Jul 12 14:32:32 ubuntu kernel: [1110502.682439] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W OEL 4.2.0-30-generic #36-Ubuntu Jul 12 14:32:32 ubuntu kernel: [1110502.682440] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 Jul 12 14:32:32 ubuntu kernel: [1110502.682443] task: ffffffff81c13500 ti: ffffffff81c00000 task.ti: ffffffff81c00000 Jul 12 14:32:32 ubuntu kernel: [1110502.682445] RIP: 0010:[<ffffffff817f2785>] [<ffffffff817f2785>] _raw_spin_unlock_irqrestore+0x15/0x20 Jul 12 14:32:32 ubuntu kernel: [1110502.682455] RSP: 0018:ffff88003d603da8 EFLAGS: 00000296 Jul 12 14:32:32 ubuntu kernel: [1110502.682456] RAX: 0000000000000001 RBX: ffff880035cdc2e0 RCX: 0000000000000002 Jul 12 14:32:32 ubuntu kernel: [1110502.682458] RDX: 0000000000002090 RSI: 0000000000000296 RDI: 0000000000000296 Jul 12 14:32:32 ubuntu kernel: [1110502.682459] RBP: ffff88003d603da8 R08: 0000000000000296 R09: 000000000000080a Jul 12 14:32:32 ubuntu kernel: [1110502.682461] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003d603d18 Jul 12 14:32:32 ubuntu kernel: [1110502.682462] R13: ffffffff817f3bdb R14: ffff88003d603da8 R15: ffff880035cdc000 Jul 12 14:32:32 ubuntu kernel: [1110502.682464] FS: 0000000000000000(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000 Jul 12 14:32:32 ubuntu kernel: [1110502.682466] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 12 14:32:32 ubuntu kernel: [1110502.682468] CR2: 00007f847a0db000 CR3: 0000000035598000 CR4: 00000000001406f0 Jul 12 14:32:32 ubuntu kernel: [1110502.682609] Stack: Jul 12 14:32:32 ubuntu kernel: [1110502.682611] ffff88003d603df8 ffffffff81619ddd ffff880035a43200 0000000000000296 Jul 12 14:32:32 ubuntu kernel: [1110502.682615] ffffffff81d2a7c0 ffff880035cdc000 ffff880035cdc0d0 0000000000000100 Jul 12 14:32:32 ubuntu kernel: [1110502.682617] ffffffff815dd7b0 ffff880035cdc000 ffff88003d603e38 ffffffff815dd6ab Jul 12 14:32:32 ubuntu kernel: [1110502.682619] Call Trace: Jul 12 14:32:32 ubuntu kernel: [1110502.682621] <IRQ> Jul 12 14:32:32 ubuntu kernel: [1110502.682629] [<ffffffff81619ddd>] uhci_hub_status_data+0x6d/0x270 Jul 12 14:32:32 ubuntu kernel: [1110502.682636] [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160 Jul 12 14:32:32 ubuntu kernel: [1110502.682638] [<ffffffff815dd6ab>] usb_hcd_poll_rh_status+0x5b/0x160 Jul 12 14:32:32 ubuntu kernel: [1110502.682641] [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160 Jul 12 14:32:32 ubuntu kernel: [1110502.682644] [<ffffffff815dd7be>] rh_timer_func+0xe/0x10 Jul 12 14:32:32 ubuntu kernel: [1110502.682649] [<ffffffff810e59d9>] call_timer_fn+0x39/0xf0 Jul 12 14:32:32 ubuntu kernel: [1110502.682652] [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160 Jul 12 14:32:32 ubuntu kernel: [1110502.682655] [<ffffffff810e6461>] run_timer_softirq+0x221/0x2d0 Jul 12 14:32:32 ubuntu kernel: [1110502.682659] [<ffffffff8107fe46>] __do_softirq+0xf6/0x250 Jul 12 14:32:32 ubuntu kernel: [1110502.682662] [<ffffffff81080113>] irq_exit+0xa3/0xb0 Jul 12 14:32:32 ubuntu kernel: [1110502.682667] [<ffffffff817f5a46>] smp_apic_timer_interrupt+0x46/0x60 Jul 12 14:32:32 ubuntu kernel: [1110502.682670] [<ffffffff817f3bdb>] apic_timer_interrupt+0x6b/0x70 Jul 12 14:32:32 ubuntu kernel: [1110502.682671] <EOI> Jul 12 14:32:32 ubuntu kernel: [1110502.682677] [<ffffffff81060526>] ? native_safe_halt+0x6/0x10 Jul 12 14:32:32 ubuntu kernel: [1110502.682682] [<ffffffff8101f24e>] default_idle+0x1e/0xa0 Jul 12 14:32:32 ubuntu kernel: [1110502.682685] [<ffffffff8101f9ff>] arch_cpu_idle+0xf/0x20 Jul 12 14:32:32 ubuntu kernel: [1110502.682689] [<ffffffff810bdb1a>] default_idle_call+0x2a/0x40 Jul 12 14:32:32 ubuntu kernel: [1110502.682691] [<ffffffff810bde5e>] cpu_startup_entry+0x2ce/0x330 Jul 12 14:32:32 ubuntu kernel: [1110502.682696] [<ffffffff817e047c>] rest_init+0x7c/0x80 Jul 12 14:32:32 ubuntu kernel: [1110502.682702] [<ffffffff81d50025>] start_kernel+0x48b/0x4ac Jul 12 14:32:32 ubuntu kernel: [1110502.682705] [<ffffffff81d4f120>] ? early_idt_handler_array+0x120/0x120 Jul 12 14:32:32 ubuntu kernel: [1110502.682708] [<ffffffff81d4f339>] x86_64_start_reservations+0x2a/0x2c Jul 12 14:32:32 ubuntu kernel: [1110502.682711] [<ffffffff81d4f485>] x86_64_start_kernel+0x14a/0x16d - e1000_watchdog: Jul 15 02:12:19 ubuntu kernel: [1325198.227183] INFO: rcu_sched self-detected stall on CPU { 0} (t=79984 jiffies g=22282232 c=22282231 q=0) Jul 15 02:12:19 ubuntu kernel: [1325198.228218] rcu_sched kthread starved for 79984 jiffies! g22282232 c22282231 f0x0 Jul 15 02:12:19 ubuntu kernel: [1325198.228221] Task dump for CPU 0: Jul 15 02:12:19 ubuntu kernel: [1325198.228224] kworker/0:2 R running task 0 57658 2 0x00000008 Jul 15 02:12:19 ubuntu kernel: [1325198.228359] Workqueue: events e1000_watchdog [e1000] Jul 15 02:12:19 ubuntu kernel: [1325198.228363] ffff880032508dc0 000000002c551eb3 ffff88003d603d78 ffffffff810a8799 Jul 15 02:12:19 ubuntu kernel: [1325198.228366] 0000000000000000 ffffffff81c51b80 ffff88003d603d98 ffffffff810aafbb Jul 15 02:12:19 ubuntu kernel: [1325198.228369] 0000000000000083 0000000000000001 ffff88003d603dc8 ffffffff810ddf38 Jul 15 02:12:19 ubuntu kernel: [1325198.228372] Call Trace: Jul 15 02:12:19 ubuntu kernel: [1325198.228374] <IRQ> [<ffffffff810a8799>] sched_show_task+0xa9/0x110 Jul 15 02:12:19 ubuntu kernel: [1325198.228390] [<ffffffff810aafbb>] dump_cpu_task+0x3b/0x50 Jul 15 02:12:19 ubuntu kernel: [1325198.228395] [<ffffffff810ddf38>] rcu_dump_cpu_stacks+0x88/0xd0 Jul 15 02:12:19 ubuntu kernel: [1325198.228399] [<ffffffff810e19d5>] rcu_check_callbacks+0x485/0x760 Jul 15 02:12:19 ubuntu kernel: [1325198.228405] [<ffffffff811375ac>] ? acct_account_cputime+0x1c/0x20 Jul 15 02:12:19 ubuntu kernel: [1325198.228407] [<ffffffff810aba71>] ? account_system_time+0x81/0x120 Jul 15 02:12:19 ubuntu kernel: [1325198.228413] [<ffffffff810f7450>] ? tick_sched_handle.isra.14+0x60/0x60 Jul 15 02:12:19 ubuntu kernel: [1325198.228416] [<ffffffff810e7de9>] update_process_times+0x39/0x60 Jul 15 02:12:19 ubuntu kernel: [1325198.228419] [<ffffffff810f7415>] tick_sched_handle.isra.14+0x25/0x60 Jul 15 02:12:19 ubuntu kernel: [1325198.228423] [<ffffffff810f7494>] tick_sched_timer+0x44/0x80 Jul 15 02:12:19 ubuntu kernel: [1325198.228428] [<ffffffff810e8703>] __hrtimer_run_queues+0xf3/0x220 Jul 15 02:12:19 ubuntu kernel: [1325198.228431] [<ffffffff810e8e38>] hrtimer_interrupt+0xa8/0x1a0 Jul 15 02:12:19 ubuntu kernel: [1325198.228435] [<ffffffff8104effc>] local_apic_timer_interrupt+0x3c/0x70 Jul 15 02:12:19 ubuntu kernel: [1325198.228440] [<ffffffff817f5a41>] smp_apic_timer_interrupt+0x41/0x60 Jul 15 02:12:19 ubuntu kernel: [1325198.228443] [<ffffffff817f3bdb>] apic_timer_interrupt+0x6b/0x70 Jul 15 02:12:19 ubuntu kernel: [1325198.228444] <EOI> [<ffffffff817f2785>] ? _raw_spin_unlock_irqrestore+0x15/0x20 Jul 15 02:12:19 ubuntu kernel: [1325198.228454] [<ffffffffc007f742>] e1000_update_stats+0x702/0x740 [e1000] Jul 15 02:12:19 ubuntu kernel: [1325198.228458] [<ffffffffc007f7ea>] e1000_watchdog+0x6a/0x530 [e1000] Jul 15 02:12:19 ubuntu kernel: [1325198.228465] [<ffffffff810947aa>] process_one_work+0x1aa/0x440 Jul 15 02:12:19 ubuntu kernel: [1325198.228468] [<ffffffff81094a8b>] worker_thread+0x4b/0x4c0 Jul 15 02:12:19 ubuntu kernel: [1325198.228471] [<ffffffff81094a40>] ? process_one_work+0x440/0x440 Jul 15 02:12:19 ubuntu kernel: [1325198.228475] [<ffffffff81094a40>] ? process_one_work+0x440/0x440 Jul 15 02:12:19 ubuntu kernel: [1325198.228478] [<ffffffff8109ae28>] kthread+0xd8/0xf0 Jul 15 02:12:19 ubuntu kernel: [1325198.228481] [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0 Jul 15 02:12:19 ubuntu kernel: [1325198.228483] [<ffffffff817f319f>] ret_from_fork+0x3f/0x70 Jul 15 02:12:19 ubuntu kernel: [1325198.228486] [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0 - disk_events_workfn: Jul 18 00:41:52 ubuntu kernel: [1578849.085673] CPU: 1 PID: 31576 Comm: kworker/u128:0 Tainted: G W OEL 4.2.0-30-generic #36-Ubuntu Jul 18 00:41:52 ubuntu kernel: [1578849.085676] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 Jul 18 00:41:52 ubuntu kernel: [1578849.085686] Workqueue: events_freezable_power_ disk_events_workfn Jul 18 00:41:52 ubuntu kernel: [1578849.085688] task: ffff880036de5280 ti: ffff880006ff0000 task.ti: ffff880006ff0000 Jul 18 00:41:52 ubuntu kernel: [1578849.085690] RIP: 0010:[<ffffffff817f2785>] [<ffffffff817f2785>] _raw_spin_unlock_irqrestore+0x15/0x20 Jul 18 00:41:52 ubuntu kernel: [1578849.085699] RSP: 0018:ffff880006ff3a08 EFLAGS: 00000293 Jul 18 00:41:52 ubuntu kernel: [1578849.085700] RAX: 0000000000000000 RBX: ffffffff81c1abc0 RCX: 000000000000001c Jul 18 00:41:52 ubuntu kernel: [1578849.085702] RDX: ffff880034781e70 RSI: 0000000000000293 RDI: 0000000000000293 Jul 18 00:41:52 ubuntu kernel: [1578849.085703] RBP: ffff880006ff3a08 R08: 000000100000014a R09: 0000000800000010 Jul 18 00:41:52 ubuntu kernel: [1578849.085705] R10: ffff880034781af8 R11: 0000000000000000 R12: ffffffff8159ce7c Jul 18 00:41:52 ubuntu kernel: [1578849.085706] R13: ffff880006ff39c8 R14: ffff880034781ab8 R15: 0000000000000002 Jul 18 00:41:52 ubuntu kernel: [1578849.085709] FS: 0000000000000000(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000 Jul 18 00:41:52 ubuntu kernel: [1578849.085710] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 18 00:41:52 ubuntu kernel: [1578849.085712] CR2: 00007fb3a269c000 CR3: 000000003cb8d000 CR4: 00000000001406e0 Jul 18 00:41:52 ubuntu kernel: [1578849.085727] Stack: Jul 18 00:41:52 ubuntu kernel: [1578849.085729] ffff880006ff3a38 ffffffff815a5901 ffff88003313cf00 ffff88003492a260 Jul 18 00:41:52 ubuntu kernel: [1578849.085732] ffff880035680800 ffff880036d4d050 ffff880006ff3a78 ffffffff8158010a Jul 18 00:41:52 ubuntu kernel: [1578849.085734] ffff880006ff3a78 ffff880035680800 ffff88003492a260 ffff880035680800 Jul 18 00:41:52 ubuntu kernel: [1578849.085737] Call Trace: Jul 18 00:41:52 ubuntu kernel: [1578849.085785] [<ffffffff815a5901>] ata_scsi_queuecmd+0xb1/0x250 Jul 18 00:41:52 ubuntu kernel: [1578849.085790] [<ffffffff8158010a>] scsi_dispatch_cmd+0x10a/0x200 Jul 18 00:41:52 ubuntu kernel: [1578849.085794] [<ffffffff815829f2>] scsi_request_fn+0x472/0x610 Jul 18 00:41:52 ubuntu kernel: [1578849.085798] [<ffffffff813951e7>] __blk_run_queue+0x37/0x50 Jul 18 00:41:52 ubuntu kernel: [1578849.085802] [<ffffffff8139eb71>] blk_execute_rq_nowait+0xb1/0x160 Jul 18 00:41:52 ubuntu kernel: [1578849.085808] [<ffffffff811d3a41>] ? alloc_pages_current+0x91/0x100 Jul 18 00:41:52 ubuntu kernel: [1578849.085811] [<ffffffff8139ecab>] blk_execute_rq+0x8b/0x140 Jul 18 00:41:52 ubuntu kernel: [1578849.085814] [<ffffffff8139ab33>] ? blk_rq_bio_prep+0x63/0x80 Jul 18 00:41:52 ubuntu kernel: [1578849.085818] [<ffffffff8139ea36>] ? blk_rq_map_kern+0xe6/0x140 Jul 18 00:41:52 ubuntu kernel: [1578849.085820] [<ffffffff8157f637>] scsi_execute+0x137/0x1d0 Jul 18 00:41:52 ubuntu kernel: [1578849.085823] [<ffffffff81581269>] ? scsi_execute_req_flags+0x49/0xf0 Jul 18 00:41:52 ubuntu kernel: [1578849.085826] [<ffffffff815812ae>] scsi_execute_req_flags+0x8e/0xf0 Jul 18 00:41:52 ubuntu kernel: [1578849.085830] [<ffffffff81591db7>] sr_check_events+0xb7/0x2d0 Jul 18 00:41:52 ubuntu kernel: [1578849.085835] [<ffffffff815cdd5c>] cdrom_check_events+0x1c/0x40 Jul 18 00:41:52 ubuntu kernel: [1578849.085838] [<ffffffff815921ea>] sr_block_check_events+0x2a/0x30 Jul 18 00:41:52 ubuntu kernel: [1578849.085841] [<ffffffff813a9410>] disk_check_events+0x60/0x150 Jul 18 00:41:52 ubuntu kernel: [1578849.085844] [<ffffffff813a9516>] disk_events_workfn+0x16/0x20 Jul 18 00:41:52 ubuntu kernel: [1578849.085850] [<ffffffff810947aa>] process_one_work+0x1aa/0x440 Jul 18 00:41:52 ubuntu kernel: [1578849.085853] [<ffffffff81094a8b>] worker_thread+0x4b/0x4c0 Jul 18 00:41:52 ubuntu kernel: [1578849.085856] [<ffffffff81094a40>] ? process_one_work+0x440/0x440 Jul 18 00:41:52 ubuntu kernel: [1578849.085859] [<ffffffff81094a40>] ? process_one_work+0x440/0x440 Jul 18 00:41:52 ubuntu kernel: [1578849.085863] [<ffffffff8109ae28>] kthread+0xd8/0xf0 Jul 18 00:41:52 ubuntu kernel: [1578849.085866] [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0 Jul 18 00:41:52 ubuntu kernel: [1578849.085869] [<ffffffff817f319f>] ret_from_fork+0x3f/0x70 Jul 18 00:41:52 ubuntu kernel: [1578849.085872] [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0 Jul 18 00:41:52 ubuntu kernel: [1578849.085874] Code: 00 00 eb c7 31 c0 eb cb e8 49 90 88 ff 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 c6 as a result, it was common to see in the logs that another task was killed during the busy loop, e.g.: Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [thermald:714] Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald Tainted: G W OEL 4.2.0-30-generic #36-Ubuntu Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: ffff88003776e040 ti: ffff880038390000 task.ti: ffff880038390000 Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[<ffffffff810fc9d8>] [<ffffffff810fc9d8>] smp_call_function_single+0xd8/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:ffff880038393c98 EFLAGS: 00000202 Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000292 Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: ffff880038393ce8 R08: 0000000000000000 R09: 0000000000000001 Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: ffff8800175de000 R11: ffff880035123180 R12: ffff880034c1c2d0 Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0000000000000292 R14: 0000000000000292 R15: ffff880038393c38 Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS: 00007fa118895700(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 00007fabda352000 CR3: 0000000034c22000 CR4: 00000000001406e0 Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack: Jul 18 00:43:06 ubuntu kernel: [1578895.078510] 0000000000000000 ffffffff8121e2c4 ffff880038393cd8 0000000000000000 Jul 18 00:43:06 ubuntu kernel: [1578895.078513] ffffffff813f86b0 ffff880038393d00 0000000000000003 000000005a1685d7 Jul 18 00:43:06 ubuntu kernel: [1578895.078515] ffff880038393d4c ffff880038393d48 ffff880038393d38 ffffffff813f884d Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace: Jul 18 00:43:06 ubuntu kernel: [1578895.078526] [<ffffffff8121e2c4>] ? mntput+0x24/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078532] [<ffffffff813f86b0>] ? ucs2_strncmp+0x50/0x50 Jul 18 00:43:06 ubuntu kernel: [1578895.078642] [<ffffffff813f884d>] rdmsr_on_cpu+0x5d/0x90 Jul 18 00:43:06 ubuntu kernel: [1578895.078650] [<ffffffffc030b2e7>] show_temp+0xa7/0xe0 [coretemp] Jul 18 00:43:06 ubuntu kernel: [1578895.078656] [<ffffffff8151d1d0>] dev_attr_show+0x20/0x50 Jul 18 00:43:06 ubuntu kernel: [1578895.078662] [<ffffffff817f0c16>] ? mutex_lock+0x16/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078667] [<ffffffff8127a3dc>] sysfs_kf_seq_show+0xbc/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078669] [<ffffffff81278c13>] kernfs_seq_show+0x23/0x30 Jul 18 00:43:06 ubuntu kernel: [1578895.078672] [<ffffffff81221c9c>] seq_read+0xec/0x390 Jul 18 00:43:06 ubuntu kernel: [1578895.078675] [<ffffffff8127953a>] kernfs_fop_read+0x10a/0x160 Jul 18 00:43:06 ubuntu kernel: [1578895.078679] [<ffffffff81326a00>] ? security_file_permission+0xa0/0xc0 Jul 18 00:43:06 ubuntu kernel: [1578895.078682] [<ffffffff811fd5b8>] __vfs_read+0x18/0x40 Jul 18 00:43:06 ubuntu kernel: [1578895.078685] [<ffffffff811fdb87>] vfs_read+0x87/0x130 Jul 18 00:43:06 ubuntu kernel: [1578895.078687] [<ffffffff811fe905>] SyS_read+0x55/0xc0 Jul 18 00:43:07 ubuntu kernel: [1578895.078690] [<ffffffff817f2d72>] entry_SYSCALL_64_fastpath+0x16/0x75 Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 12 fe ff ff 8b 55 e0 83 e2 01 74 cf f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 f0 6c df 00 85 c0 75 85 80 Any other APIs wrapping smp_call_function_* are also affected by the same problem. However, functions like on_each_cpu() have additional logic, making them less visible in the logs than functions using the smp_call_function_single() API directly. Another aspect of the described situation is that for stability of the system as a whole, it might in fact be safer to kill the busy-waiting task rather than the more problematic task that fails to handle an IPI. Unfortunately, the resulting logs are very confusing. They appear to suggest that the problem resides on the correct execution context which is killed and dumped, but not on the actually problematic context, which might not be dumped. This makes it hard to root-cause the problem even if one is aware of this shortcoming of the killings and the logging. Maybe better behavior in that case should be having the busy-looping time out cleanly, with an error return from smp_call_function_single(). At the same time, we may send an NMI to all CPUs for printing a backtrace, which will greatly help in diagnosing the problem, although on systems with a lot of logical CPUs this may be impractical (e.g., with Knights Landing being up to 288 "CPUs" in one chip, such systems may already be surprisingly common). Alexander