On 26.07.19 20:30, Thomas Gleixner wrote:
From: Anna-Maria Gleixner <[email protected]>When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism.
As in virtual machines the soft interrupt thread preemption should not be an issue, I guess the spinning is "just" sub-optimal (similar to not using paravirt spinlocks). In case we'd want to change that I'd rather not special case timers, but apply a more general solution to the quite large amount of similar cases: I assume the majority of cpu_relax() uses are affected, so adding a paravirt op cpu_relax() might be appropriate. That could be put under CONFIG_PARAVIRT_SPINLOCK. If called in a guest it could ask the hypervisor to give up the physical cpu voluntarily (in Xen this would be a "yield" hypercall). Juergen

