[tip:timers/core] itimers: Prepare for PREEMPT_RT
Commit-ID: c7e6d704a0097e59667495cf52dcc4e1085e620b Gitweb: https://git.kernel.org/tip/c7e6d704a0097e59667495cf52dcc4e1085e620b Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:51 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 20:51:24 +0200 itimers: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. As a benefit the retry loop gains the missing cpu_relax() on !RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.690771...@linutronix.de --- kernel/time/itimer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c index 02068b2d5862..9d26fd4ba4c0 100644 --- a/kernel/time/itimer.c +++ b/kernel/time/itimer.c @@ -213,6 +213,7 @@ again: /* We are sharing ->siglock with it_real_fn() */ if (hrtimer_try_to_cancel(timer) < 0) { spin_unlock_irq(>sighand->siglock); + hrtimer_cancel_wait_running(timer); goto again; } expires = timeval_to_ktime(value->it_value);
[tip:timers/core] timerfd: Prepare for PREEMPT_RT
Commit-ID: a125ecc16453a4fe0ba865c7df87b9c722991fdf Gitweb: https://git.kernel.org/tip/a125ecc16453a4fe0ba865c7df87b9c722991fdf Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:50 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 20:51:23 +0200 timerfd: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.600085...@linutronix.de --- fs/timerfd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/timerfd.c b/fs/timerfd.c index 6a6fc8aa1de7..48305ba41e3c 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -471,7 +471,11 @@ static int do_timerfd_settime(int ufd, int flags, break; } spin_unlock_irq(>wqh.lock); - cpu_relax(); + + if (isalarm(ctx)) + hrtimer_cancel_wait_running(>t.alarm.timer); + else + hrtimer_cancel_wait_running(>t.tmr); } /*
[tip:timers/core] alarmtimer: Prepare for PREEMPT_RT
Commit-ID: 51ae33092bb8320497ec75ddc5ab383d8fafd55c Gitweb: https://git.kernel.org/tip/51ae33092bb8320497ec75ddc5ab383d8fafd55c Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:49 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 20:51:23 +0200 alarmtimer: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.508744...@linutronix.de --- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 57518efc3810..36947449dba2 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -432,7 +432,7 @@ int alarm_cancel(struct alarm *alarm) int ret = alarm_try_to_cancel(alarm); if (ret >= 0) return ret; - cpu_relax(); + hrtimer_cancel_wait_running(>timer); } } EXPORT_SYMBOL_GPL(alarm_cancel);
[tip:timers/core] timers: Prepare support for PREEMPT_RT
Commit-ID: 030dcdd197d77374879bb5603d091eee7d8aba80 Gitweb: https://git.kernel.org/tip/030dcdd197d77374879bb5603d091eee7d8aba80 Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 20:51:22 +0200 timers: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de --- include/linux/timer.h | 2 +- kernel/time/timer.c | 103 ++ 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 282e4f2a532a..1e6650ed066d 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer); extern int try_to_del_timer_sync(struct timer_list *timer); -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) extern int del_timer_sync(struct timer_list *timer); #else # define del_timer_sync(t) del_timer(t) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 343c7ba33b1c..673c6a0f0c45 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64); struct timer_base { raw_spinlock_t lock; struct timer_list *running_timer; +#ifdef CONFIG_PREEMPT_RT + spinlock_t expiry_lock; + atomic_ttimer_waiters; +#endif unsigned long clk; unsigned long next_expiry; unsigned intcpu; @@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer) } EXPORT_SYMBOL(try_to_del_timer_sync); -#ifdef CONFIG_SMP +#ifdef CONFIG_PREEMPT_RT +static __init void timer_base_init_expiry_lock(struct timer_base *base) +{ + spin_lock_init(>expiry_lock); +} + +static inline void timer_base_lock_expiry(struct timer_base *base) +{ + spin_lock(>expiry_lock); +} + +static inline void timer_base_unlock_expiry(struct timer_base *base) +{ + spin_unlock(>expiry_lock); +} + +/* + * The counterpart to del_timer_wait_running(). + * + * If there is a waiter for base->expiry_lock, then it was waiting for the + * timer callback to finish. Drop expiry_lock and reaquire it. That allows + * the waiter to acquire the lock and make progress. + */ +static void timer_sync_wait_running(struct timer_base *base) +{ + if (atomic_read(>timer_waiters)) { + spin_unlock(>expiry_lock); + spin_lock(>expiry_lock); + } +} + +/* + * This function is called on PREEMPT_RT kernels when the fast path + * deletion of a timer failed because the timer callback function was + * running. + * + * This prevents priority inversion, if the softirq thread on a remote CPU + * got preempted, and it prevents a life lock when the task which tries to + * delete a timer preempted the softirq thread running the timer callback + * function. + */ +static void del_time
[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT
Commit-ID: f61eff83cec9cfab31fd30a2ca8856be379cdcd5 Gitweb: https://git.kernel.org/tip/f61eff83cec9cfab31fd30a2ca8856be379cdcd5 Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 20:51:22 +0200 hrtimer: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de --- include/linux/hrtimer.h | 16 + kernel/time/hrtimer.c | 95 + 2 files changed, 105 insertions(+), 6 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 7d0d0a36a8f4..5df4bcff96d5 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -192,6 +192,10 @@ enum hrtimer_base_type { * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are + * expired + * @timer_waiters: A hrtimer_cancel() invocation waits for the timer + * callback to finish. * @expires_next: absolute time of the next event, is required for remote * hrtimer enqueue; it is the total first expiry time (hard * and soft hrtimer are taken into account) @@ -218,6 +222,10 @@ struct hrtimer_cpu_base { unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; +#endif +#ifdef CONFIG_PREEMPT_RT + spinlock_t softirq_expiry_lock; + atomic_ttimer_waiters; #endif ktime_t expires_next; struct hrtimer *next_timer; @@ -350,6 +358,14 @@ extern void hrtimers_resume(void); DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +#ifdef CONFIG_PREEMPT_RT +void hrtimer_cancel_wait_running(const struct hrtimer *timer); +#else +static inline void hrtimer_cancel_wait_running(struct hrtimer *timer) +{ + cpu_relax(); +} +#endif /* Exported timer functions: */ diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index c101f88ae8aa..499122752649 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) } EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel); +#ifdef CONFIG_PREEMPT_RT +static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base) +{ + spin_lock_init(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base) +{ + spin_lock(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base) +{ + spin_unlock(>softirq_expiry_lock); +} + +/* + * The counterpart to hrtimer_cancel_wait_running(). + * + * If there is a waiter for cpu_base->expiry_lock, then it was waiting for + * the timer callback to finish. Drop expiry_lock and reaquire it. That + * allows the waiter to acquire the lock and make progress. + */ +static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base, + unsigned long flags) +{ + if (atomi
[tip:timers/core] posix-timers: Cleanup the flag/flags confusion
Commit-ID: b0ccc6eb0d7e0b7d346b118ccc8b38bf18e39b7f Gitweb: https://git.kernel.org/tip/b0ccc6eb0d7e0b7d346b118ccc8b38bf18e39b7f Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:52 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:46:42 +0200 posix-timers: Cleanup the flag/flags confusion do_timer_settime() has a 'flags' argument and uses 'flag' for the interrupt flags, which is confusing at best. Rename the argument so 'flags' can be used for interrupt flags as usual. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.782664...@linutronix.de --- kernel/time/posix-timers.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index d7f2d91acdac..f5aedd2f60df 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -844,13 +844,13 @@ int common_timer_set(struct k_itimer *timr, int flags, return 0; } -static int do_timer_settime(timer_t timer_id, int flags, +static int do_timer_settime(timer_t timer_id, int tmr_flags, struct itimerspec64 *new_spec64, struct itimerspec64 *old_spec64) { const struct k_clock *kc; struct k_itimer *timr; - unsigned long flag; + unsigned long flags; int error = 0; if (!timespec64_valid(_spec64->it_interval) || @@ -860,7 +860,7 @@ static int do_timer_settime(timer_t timer_id, int flags, if (old_spec64) memset(old_spec64, 0, sizeof(*old_spec64)); retry: - timr = lock_timer(timer_id, ); + timr = lock_timer(timer_id, ); if (!timr) return -EINVAL; @@ -868,9 +868,9 @@ retry: if (WARN_ON_ONCE(!kc || !kc->timer_set)) error = -EINVAL; else - error = kc->timer_set(timr, flags, new_spec64, old_spec64); + error = kc->timer_set(timr, tmr_flags, new_spec64, old_spec64); - unlock_timer(timr, flag); + unlock_timer(timr, flags); if (error == TIMER_RETRY) { old_spec64 = NULL; // We already got the old time... goto retry;
[tip:timers/core] itimers: Prepare for PREEMPT_RT
Commit-ID: cab46ec655eec1b5dbb0c17a25e19f67c539f00b Gitweb: https://git.kernel.org/tip/cab46ec655eec1b5dbb0c17a25e19f67c539f00b Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:51 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:46:41 +0200 itimers: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. As a benefit the retry loop gains the missing cpu_relax() on !RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.690771...@linutronix.de --- kernel/time/itimer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c index 02068b2d5862..9d26fd4ba4c0 100644 --- a/kernel/time/itimer.c +++ b/kernel/time/itimer.c @@ -213,6 +213,7 @@ again: /* We are sharing ->siglock with it_real_fn() */ if (hrtimer_try_to_cancel(timer) < 0) { spin_unlock_irq(>sighand->siglock); + hrtimer_cancel_wait_running(timer); goto again; } expires = timeval_to_ktime(value->it_value);
[tip:timers/core] timerfd: Prepare for PREEMPT_RT
Commit-ID: 4da1306fb920a267b5ea21ee15cd771c7bc09cc6 Gitweb: https://git.kernel.org/tip/4da1306fb920a267b5ea21ee15cd771c7bc09cc6 Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:50 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:46:41 +0200 timerfd: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.600085...@linutronix.de --- fs/timerfd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/timerfd.c b/fs/timerfd.c index 6a6fc8aa1de7..48305ba41e3c 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -471,7 +471,11 @@ static int do_timerfd_settime(int ufd, int flags, break; } spin_unlock_irq(>wqh.lock); - cpu_relax(); + + if (isalarm(ctx)) + hrtimer_cancel_wait_running(>t.alarm.timer); + else + hrtimer_cancel_wait_running(>t.tmr); } /*
[tip:timers/core] alarmtimer: Prepare for PREEMPT_RT
Commit-ID: 1f8e8bd8b74c8089a43bc5f1f24e4bf0f855d760 Gitweb: https://git.kernel.org/tip/1f8e8bd8b74c8089a43bc5f1f24e4bf0f855d760 Author: Anna-Maria Gleixner AuthorDate: Wed, 31 Jul 2019 00:33:49 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:46:41 +0200 alarmtimer: Prepare for PREEMPT_RT Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190730223828.508744...@linutronix.de --- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 57518efc3810..36947449dba2 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -432,7 +432,7 @@ int alarm_cancel(struct alarm *alarm) int ret = alarm_try_to_cancel(alarm); if (ret >= 0) return ret; - cpu_relax(); + hrtimer_cancel_wait_running(>timer); } } EXPORT_SYMBOL_GPL(alarm_cancel);
[tip:timers/core] timers: Prepare support for PREEMPT_RT
Commit-ID: 1c2df8ac9292ea1fe6c958c198bf6bc5c768acf5 Gitweb: https://git.kernel.org/tip/1c2df8ac9292ea1fe6c958c198bf6bc5c768acf5 Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:43:20 +0200 timers: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de --- include/linux/timer.h | 2 +- kernel/time/timer.c | 103 ++ 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 282e4f2a532a..1e6650ed066d 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer); extern int try_to_del_timer_sync(struct timer_list *timer); -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) extern int del_timer_sync(struct timer_list *timer); #else # define del_timer_sync(t) del_timer(t) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 343c7ba33b1c..673c6a0f0c45 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64); struct timer_base { raw_spinlock_t lock; struct timer_list *running_timer; +#ifdef CONFIG_PREEMPT_RT + spinlock_t expiry_lock; + atomic_ttimer_waiters; +#endif unsigned long clk; unsigned long next_expiry; unsigned intcpu; @@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer) } EXPORT_SYMBOL(try_to_del_timer_sync); -#ifdef CONFIG_SMP +#ifdef CONFIG_PREEMPT_RT +static __init void timer_base_init_expiry_lock(struct timer_base *base) +{ + spin_lock_init(>expiry_lock); +} + +static inline void timer_base_lock_expiry(struct timer_base *base) +{ + spin_lock(>expiry_lock); +} + +static inline void timer_base_unlock_expiry(struct timer_base *base) +{ + spin_unlock(>expiry_lock); +} + +/* + * The counterpart to del_timer_wait_running(). + * + * If there is a waiter for base->expiry_lock, then it was waiting for the + * timer callback to finish. Drop expiry_lock and reaquire it. That allows + * the waiter to acquire the lock and make progress. + */ +static void timer_sync_wait_running(struct timer_base *base) +{ + if (atomic_read(>timer_waiters)) { + spin_unlock(>expiry_lock); + spin_lock(>expiry_lock); + } +} + +/* + * This function is called on PREEMPT_RT kernels when the fast path + * deletion of a timer failed because the timer callback function was + * running. + * + * This prevents priority inversion, if the softirq thread on a remote CPU + * got preempted, and it prevents a life lock when the task which tries to + * delete a timer preempted the softirq thread running the timer callback + * function. + */ +static void del_timer_wait_runn
[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT
Commit-ID: 37226a1807c5f41537190462362e3e2739e22f13 Gitweb: https://git.kernel.org/tip/37226a1807c5f41537190462362e3e2739e22f13 Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200 Committer: Thomas Gleixner CommitDate: Thu, 1 Aug 2019 17:43:19 +0200 hrtimer: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de --- include/linux/hrtimer.h | 16 + kernel/time/hrtimer.c | 95 + 2 files changed, 105 insertions(+), 6 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 7d0d0a36a8f4..5df4bcff96d5 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -192,6 +192,10 @@ enum hrtimer_base_type { * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are + * expired + * @timer_waiters: A hrtimer_cancel() invocation waits for the timer + * callback to finish. * @expires_next: absolute time of the next event, is required for remote * hrtimer enqueue; it is the total first expiry time (hard * and soft hrtimer are taken into account) @@ -218,6 +222,10 @@ struct hrtimer_cpu_base { unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; +#endif +#ifdef CONFIG_PREEMPT_RT + spinlock_t softirq_expiry_lock; + atomic_ttimer_waiters; #endif ktime_t expires_next; struct hrtimer *next_timer; @@ -350,6 +358,14 @@ extern void hrtimers_resume(void); DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +#ifdef CONFIG_PREEMPT_RT +void hrtimer_cancel_wait_running(const struct hrtimer *timer); +#else +static inline void hrtimer_cancel_wait_running(struct hrtimer *timer) +{ + cpu_relax(); +} +#endif /* Exported timer functions: */ diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index c101f88ae8aa..499122752649 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) } EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel); +#ifdef CONFIG_PREEMPT_RT +static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base) +{ + spin_lock_init(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base) +{ + spin_lock(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base) +{ + spin_unlock(>softirq_expiry_lock); +} + +/* + * The counterpart to hrtimer_cancel_wait_running(). + * + * If there is a waiter for cpu_base->expiry_lock, then it was waiting for + * the timer callback to finish. Drop expiry_lock and reaquire it. That + * allows the waiter to acquire the lock and make progress. + */ +static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base, + unsigned long flags) +{ + if (atomic_read(_base-&g
[tip:timers/core] timers: Prepare support for PREEMPT_RT
Commit-ID: 51503dcd6118d627a0c1b5829191d4fa6f16 Gitweb: https://git.kernel.org/tip/51503dcd6118d627a0c1b5829191d4fa6f16 Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200 Committer: Thomas Gleixner CommitDate: Tue, 30 Jul 2019 23:57:57 +0200 timers: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de --- include/linux/timer.h | 2 +- kernel/time/timer.c | 103 ++ 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 282e4f2a532a..1e6650ed066d 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer); extern int try_to_del_timer_sync(struct timer_list *timer); -#ifdef CONFIG_SMP +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) extern int del_timer_sync(struct timer_list *timer); #else # define del_timer_sync(t) del_timer(t) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 343c7ba33b1c..673c6a0f0c45 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64); struct timer_base { raw_spinlock_t lock; struct timer_list *running_timer; +#ifdef CONFIG_PREEMPT_RT + spinlock_t expiry_lock; + atomic_ttimer_waiters; +#endif unsigned long clk; unsigned long next_expiry; unsigned intcpu; @@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer) } EXPORT_SYMBOL(try_to_del_timer_sync); -#ifdef CONFIG_SMP +#ifdef CONFIG_PREEMPT_RT +static __init void timer_base_init_expiry_lock(struct timer_base *base) +{ + spin_lock_init(>expiry_lock); +} + +static inline void timer_base_lock_expiry(struct timer_base *base) +{ + spin_lock(>expiry_lock); +} + +static inline void timer_base_unlock_expiry(struct timer_base *base) +{ + spin_unlock(>expiry_lock); +} + +/* + * The counterpart to del_timer_wait_running(). + * + * If there is a waiter for base->expiry_lock, then it was waiting for the + * timer callback to finish. Drop expiry_lock and reaquire it. That allows + * the waiter to acquire the lock and make progress. + */ +static void timer_sync_wait_running(struct timer_base *base) +{ + if (atomic_read(>timer_waiters)) { + spin_unlock(>expiry_lock); + spin_lock(>expiry_lock); + } +} + +/* + * This function is called on PREEMPT_RT kernels when the fast path + * deletion of a timer failed because the timer callback function was + * running. + * + * This prevents priority inversion, if the softirq thread on a remote CPU + * got preempted, and it prevents a life lock when the task which tries to + * delete a timer preempted the softirq thread running the timer callback + * function. + */ +static void del_timer_wait_runn
[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT
Commit-ID: 10521d890c650472e494cf415f0fa6c29d4f Gitweb: https://git.kernel.org/tip/10521d890c650472e494cf415f0fa6c29d4f Author: Anna-Maria Gleixner AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200 Committer: Thomas Gleixner CommitDate: Tue, 30 Jul 2019 23:57:57 +0200 hrtimer: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de --- include/linux/hrtimer.h | 16 + kernel/time/hrtimer.c | 95 + 2 files changed, 105 insertions(+), 6 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 7d0d0a36a8f4..5df4bcff96d5 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -192,6 +192,10 @@ enum hrtimer_base_type { * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are + * expired + * @timer_waiters: A hrtimer_cancel() invocation waits for the timer + * callback to finish. * @expires_next: absolute time of the next event, is required for remote * hrtimer enqueue; it is the total first expiry time (hard * and soft hrtimer are taken into account) @@ -218,6 +222,10 @@ struct hrtimer_cpu_base { unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; +#endif +#ifdef CONFIG_PREEMPT_RT + spinlock_t softirq_expiry_lock; + atomic_ttimer_waiters; #endif ktime_t expires_next; struct hrtimer *next_timer; @@ -350,6 +358,14 @@ extern void hrtimers_resume(void); DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +#ifdef CONFIG_PREEMPT_RT +void hrtimer_cancel_wait_running(const struct hrtimer *timer); +#else +static inline void hrtimer_cancel_wait_running(struct hrtimer *timer) +{ + cpu_relax(); +} +#endif /* Exported timer functions: */ diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index c101f88ae8aa..499122752649 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) } EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel); +#ifdef CONFIG_PREEMPT_RT +static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base) +{ + spin_lock_init(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base) +{ + spin_lock(>softirq_expiry_lock); +} + +static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base) +{ + spin_unlock(>softirq_expiry_lock); +} + +/* + * The counterpart to hrtimer_cancel_wait_running(). + * + * If there is a waiter for cpu_base->expiry_lock, then it was waiting for + * the timer callback to finish. Drop expiry_lock and reaquire it. That + * allows the waiter to acquire the lock and make progress. + */ +static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base, + unsigned long flags) +{ + if (atomic_read(_base-&g
Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)
On Fri, 31 May 2019, Anna-Maria Gleixner wrote: [...] > I will think about the problem and your solution a little bit more and > give you feedback hopefully on monday. I'm sorry for the delay. But now I'm able to give you a detailed feedback: The general problem is, that your solution is customized to a single use-case: preventing softirq raise but only if there is _no_ timer pending. To reach this goal without using locks, overhead is added to the formerly optimized add/mod path of a timer. With your code the timer softirq is raised even when there is a pending timer which does not has to be expired right now. But there have been requests in the past for this use case already. I discussed with Thomas several approaches during the last week how to solve the unconditional softirq timer raise in a more general way without loosing the fast add/mod path of a timer. The approach which seems to be the best has a dependency on a timer code change from a push to pull model which is still under developement (see v2 patchset: http://lkml.kernel.org/r/2017041802.490432...@linutronix.de). The patchset v2 has several problems but we are working on a solution for those problems right now. When the timer pull model is in place the approach to solve the unconditional timer softirq raise could look like the following: ---8<--- The next_expiry value of timer_base struct is used to store the next expiry value even if timer_base is not idle. Therefore it is udpated after adding or modifying a timer and also at the end of timer softirq. In case timer softirq does not has to be raised, the timer_base->clk is incremented to prevent stale clocks. Checking whether timer softirq has to be raised cannot be done lockless. This code is not compile tested nor boot tested. --- kernel/time/timer.c | 60 +++- 1 file changed, 36 insertions(+), 24 deletions(-) --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -552,37 +552,32 @@ static void static void trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer) { - if (!is_timers_nohz_active()) - return; - - /* -* TODO: This wants some optimizing similar to the code below, but we -* will do that when we switch from push to pull for deferrable timers. -*/ - if (timer->flags & TIMER_DEFERRABLE) { - if (tick_nohz_full_cpu(base->cpu)) - wake_up_nohz_cpu(base->cpu); - return; + if (is_timers_nohz_active()) { + /* +* TODO: This wants some optimizing similar to the code +* below, but we will do that when we switch from push to +* pull for deferrable timers. +*/ + if (timer->flags & TIMER_DEFERRABLE) { + if (tick_nohz_full_cpu(base->cpu)) + wake_up_nohz_cpu(base->cpu); + return; + } } - /* -* We might have to IPI the remote CPU if the base is idle and the -* timer is not deferrable. If the other CPU is on the way to idle -* then it can't set base->is_idle as we hold the base lock: -*/ - if (!base->is_idle) - return; - /* Check whether this is the new first expiring timer: */ if (time_after_eq(timer->expires, base->next_expiry)) return; + /* Update next expiry time */ + base->next_expiry = timer->expires; /* -* Set the next expiry time and kick the CPU so it can reevaluate the -* wheel: +* We might have to IPI the remote CPU if the base is idle and the +* timer is not deferrable. If the other CPU is on the way to idle +* then it can't set base->is_idle as we hold the base lock: */ - base->next_expiry = timer->expires; - wake_up_nohz_cpu(base->cpu); + if (is_timers_nohz_active() && base->is_idle) + wake_up_nohz_cpu(base->cpu); } static void @@ -1684,6 +1679,7 @@ static inline void __run_timers(struct t while (levels--) expire_timers(base, heads + levels); } + base->next_expiry = __next_timer_interrupt(base); base->running_timer = NULL; raw_spin_unlock_irq(>lock); } @@ -1716,8 +1712,24 @@ void run_local_timers(void) base++; if (time_before(jiffies, base->clk)) return; + base--; + } + + /* +* check for next expiry +* +* deferrable base is igonred here - it is only usable when +* switching from push to pull model for deferrable timers +*/ + raw_spin_lock_irq(>lock); + if (base->clk == base->next_expiry) { + raw_spin
Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)
On Thu, 30 May 2019, Marcelo Tosatti wrote: > On Wed, May 29, 2019 at 04:53:26PM +0200, Anna-Maria Gleixner wrote: > > On Mon, 15 Apr 2019, Marcelo Tosatti wrote: > > > > > --- linux-rt-devel.orig/kernel/time/timer.c 2019-04-15 > > > 14:21:02.788704354 -0300 > > > +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:22:56.755047354 > > > -0300 > > > @@ -1776,6 +1776,24 @@ > > > if (time_before(jiffies, base->clk)) > > > return; > > > } > > > + > > > +#ifdef CONFIG_PREEMPT_RT_FULL > > > +/* On RT, irq work runs from softirq */ > > > + if (irq_work_needs_cpu()) > > > + goto raise; > > > > So with this patch and the change you made in the patch before, timers on > > RT are expired only when there is pending irq work or after modifying a > > timer on a non housekeeping cpu? > > Well, run_timer_softirq execute only if pending_map contains a bit set. > > > With your patches I could create the following problematic situation on RT > > (if I understood everything properly): I add a timer which should expire in > > 50 jiffies to the wheel of a non housekeeping cpu. So it ends up 50 buckets > > away form now in the first wheel. This timer is the only timer in the wheel > > and the next timer softirq raise is required in 50 jiffies. After adding > > the timer, the timer interrupt is raised, and no timer has to be expired, > > because there is no timer pending. > > But the softirq will be raised, because pending_map will be set: > > + if (!bitmap_empty(base->pending_map, WHEEL_SIZE)) > + goto raise; > > No? I'm sorry! I read the #endif of the CONFIG_PREEMPT_RT_FULL section as an #else... This is where my confusion comes from. I will think about the problem and your solution a little bit more and give you feedback hopefully on monday. Thanks, Anna-Maria
Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)
On Mon, 15 Apr 2019, Marcelo Tosatti wrote: > Check base->pending_map locklessly and skip raising timer softirq > if empty. > > What allows the lockless (and potentially racy against mod_timer) > check is that mod_timer will raise another timer softirq after > modifying base->pending_map. The raise of the timer softirq after adding the timer is done unconditionally - so there are timer softirqs raised which are not required at all, as mentioned before. This check is for !CONFIG_PREEMPT_RT_FULL only implemented. The commit message totally igonres that you are implementing something CONFIG_PREEMPT_RT_FULL dependent as well. > Signed-off-by: Marcelo Tosatti > > --- > kernel/time/timer.c | 18 ++ > 1 file changed, 18 insertions(+) > > Index: linux-rt-devel/kernel/time/timer.c > === > --- linux-rt-devel.orig/kernel/time/timer.c 2019-04-15 14:21:02.788704354 > -0300 > +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:22:56.755047354 > -0300 > @@ -1776,6 +1776,24 @@ > if (time_before(jiffies, base->clk)) > return; > } > + > +#ifdef CONFIG_PREEMPT_RT_FULL > +/* On RT, irq work runs from softirq */ > + if (irq_work_needs_cpu()) > + goto raise; So with this patch and the change you made in the patch before, timers on RT are expired only when there is pending irq work or after modifying a timer on a non housekeeping cpu? With your patches I could create the following problematic situation on RT (if I understood everything properly): I add a timer which should expire in 50 jiffies to the wheel of a non housekeeping cpu. So it ends up 50 buckets away form now in the first wheel. This timer is the only timer in the wheel and the next timer softirq raise is required in 50 jiffies. After adding the timer, the timer interrupt is raised, and no timer has to be expired, because there is no timer pending. If there is no irq work required during the next 51 jiffies and also no timer changed, the timer I added, will not expire in time. The timer_base will come out of idle but will not forward the base clk. This makes it even worse: When then adding a timer, the timer base is forwarded - but without checking for the next pending timer, so the first added timer will be delayed even more. So your implementation lacks forwarding the timer_base->clk when timer_base comes out of idle with respect to the next pending timer. > +#endif > + base = this_cpu_ptr(_bases[BASE_STD]); > + if (!housekeeping_cpu(base->cpu, HK_FLAG_TIMER)) { > + if (!bitmap_empty(base->pending_map, WHEEL_SIZE)) > + goto raise; > + base++; > + if (!bitmap_empty(base->pending_map, WHEEL_SIZE)) > + goto raise; > + > + return; > + } > + > +raise: > raise_softirq(TIMER_SOFTIRQ); > } > > Thanks, Anna-Maria
Re: [patch 1/3] timers: raise timer softirq on __mod_timer/add_timer_on
On Mon, 15 Apr 2019, Marcelo Tosatti wrote: [...] > The patch "timers: do not raise softirq unconditionally" from Thomas > attempts to address that by checking, in the sched tick, whether its > necessary to raise the timer softirq. Unfortunately, it attempts to grab > the tvec base spinlock which generates the issue described in the patch > "Revert "timers: do not raise softirq unconditionally"". Both patches are not available in the version your patch set is based on. Better pointers would be helpful. > tvec_base->lock protects addition of timers to the wheel versus > timer interrupt execution. The timer_base->lock (formally known as tvec_base->lock), synchronizes all accesses to timer_base and not only addition of timers versus timer interrupt execution. Deletion of timers, getting the next timer interrupt, forwarding the base clock and migration of timers are protected as well by timer_base->lock. > This patch does not grab the tvec base spinlock from irq context, > but rather performs a lockless access to base->pending_map. I cannot see where this patch performs a lockless access to timer_base->pending_map. > It handles the the race between timer addition and timer interrupt > execution by unconditionally (in case of isolated CPUs) raising the > timer softirq after making sure the updated bitmap is visible > on remote CPUs. So after modifying a timer on a non housekeeping timer base, the timer softirq is raised - even if there is no pending timer in the next bucket. Only with this patch, this shouldn't be a problem - but it is an additional raise of timer softirq and an overhead when adding a timer, because the normal timer softirq is raised from sched tick anyway. > Signed-off-by: Marcelo Tosatti > > --- > kernel/time/timer.c | 38 ++ > 1 file changed, 38 insertions(+) > > Index: linux-rt-devel/kernel/time/timer.c > === > --- linux-rt-devel.orig/kernel/time/timer.c 2019-04-15 13:56:06.974210992 > -0300 > +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:21:02.788704354 > -0300 > @@ -1056,6 +1063,17 @@ > internal_add_timer(base, timer); > } > > + if (!housekeeping_cpu(base->cpu, HK_FLAG_TIMER) && > + !(timer->flags & TIMER_DEFERRABLE)) { > + call_single_data_t *c; > + > + c = per_cpu_ptr(_timer_csd, base->cpu); > + > + /* Make sure bitmap updates are visible on remote CPUs */ > + smp_wmb(); > + smp_call_function_single_async(base->cpu, c); > + } > + > out_unlock: > raw_spin_unlock_irqrestore(>lock, flags); > Could you please explain me, why you decided to use the above implementation for raising the timer softirq after modifying a timer? Thanks, Anna-Maria
Re: [patch 0/3] do not raise timer softirq unconditionally (spinlockless version)
Hi, I had a look at the queue and have several questions about your implementation. First of all, I had some troubles to understand your commit messages. So I first had to read the code and then tried to understand the commit messages. It is easier, if it works the other way round. On Mon, 15 Apr 2019, Marcelo Tosatti wrote: > For isolated CPUs, we'd like to skip awakening ktimersoftd > (the switch to and then back from ktimersoftd takes 10us in > virtualized environments, in addition to other OS overhead, > which exceeds telco requirements for packet forwarding for > 5G) from the sched tick. You would like to prevent raising the timer softirq in general from the sched tick for isolated CPUs? Or you would like to prevent raising the timer softirq if no pending timer is available? Nevertheless, this change is not PREEMPT_RT specific. It is a NOHZ dependand change. So it would be nice, if the queue is against mainline. But please correct me, if I'm wrong. [...] > This patchset reduces cyclictest latency from 25us to 14us > on my testbox. > A lot of information is missing: How does your environment looks like for this test, what is your workload,...? Did you also run other tests? Thanks, Anna-Maria
[PATCH v3] hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls
From: Sebastian Andrzej Siewior hrtimer_init_sleeper() calls require a prior initialisation of the hrtimer object with hrtimer_init(). Lets make the initialisation of the hrtimer object part of hrtimer_init_sleeper(). To remain consistent consider init_on_stack as well. Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call sites need to be updated as well. Link: http://lkml.kernel.org/r/20180703092541.2870-1-anna-ma...@linutronix.de [anna-maria: Updating the commit message] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Anna-Maria Gleixner --- v2..v3: Update to current version v1..v2: Fix missing call site in drivers/staging/android/vsoc.c block/blk-mq.c | 3 +-- drivers/staging/android/vsoc.c | 6 ++--- include/linux/hrtimer.h| 19 +++--- include/linux/wait.h | 4 +-- kernel/futex.c | 19 ++ kernel/time/hrtimer.c | 46 ++ net/core/pktgen.c | 4 +-- 7 files changed, 67 insertions(+), 34 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 70b210a308c4..f378e2f8ec2c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3365,10 +3365,9 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, kt = nsecs; mode = HRTIMER_MODE_REL; - hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode); + hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current); hrtimer_set_expires(, kt); - hrtimer_init_sleeper(, current); do { if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) break; diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 8a75bd27c413..27daa8ae56a4 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -436,12 +436,10 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); + hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC, + HRTIMER_MODE_ABS, current); hrtimer_set_expires_range_ns(>timer, wake_time, current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); } while (1) { diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 2e8957eac4d4..f669dc5b63e7 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -361,10 +361,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); /* Initialize timers: */ extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, +enum hrtimer_mode mode, +struct task_struct *task); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task); extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else @@ -374,6 +381,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer *timer, { hrtimer_init(timer, which_clock, mode); } + +static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task) +{ + hrtimer_init_sleeper(sl, clock_id, mode, task); +} + static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif @@ -477,9 +493,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp, const enum hrtimer_mode mode, const clockid_t clockid); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, -struct task_struct *tsk); - extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta, const enum hrtimer_mode mode); extern int schedule_hrtimeout_range_clock(ktime_t *expires, diff --git a/include/linux/wait.h b/include/linux/wait.h index 5f3efabc36f4..671e8ceaac15 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -488
[tip:timers/core] timer/trace: Improve timer tracing
Commit-ID: f28d3d5346e97e60c81f933ac89ccf015430e5cf Gitweb: https://git.kernel.org/tip/f28d3d5346e97e60c81f933ac89ccf015430e5cf Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Mar 2019 13:09:21 +0100 Committer: Thomas Gleixner CommitDate: Sun, 24 Mar 2019 20:29:33 +0100 timer/trace: Improve timer tracing Timers are added to the timer wheel off by one. This is required in case a timer is queued directly before incrementing jiffies to prevent early timer expiry. When reading a timer trace and relying only on the expiry time of the timer in the timer_start trace point and on the now in the timer_expiry_entry trace point, it seems that the timer fires late. With the current timer_expiry_entry trace point information only now=jiffies is printed but not the value of base->clk. This makes it impossible to draw a conclusion to the index of base->clk and makes it impossible to examine timer problems without additional trace points. Therefore add the base->clk value to the timer_expire_entry trace point, to be able to calculate the index the timer base is located at during collecting expired timers. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Cc: fweis...@gmail.com Cc: pet...@infradead.org Cc: Steven Rostedt Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-ma...@linutronix.de --- include/trace/events/timer.h | 11 +++ kernel/time/timer.c | 17 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index da975d69c453..b7a904825e7d 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -89,24 +89,27 @@ TRACE_EVENT(timer_start, */ TRACE_EVENT(timer_expire_entry, - TP_PROTO(struct timer_list *timer), + TP_PROTO(struct timer_list *timer, unsigned long baseclk), - TP_ARGS(timer), + TP_ARGS(timer, baseclk), TP_STRUCT__entry( __field( void *,timer ) __field( unsigned long, now ) __field( void *,function) + __field( unsigned long, baseclk ) ), TP_fast_assign( __entry->timer = timer; __entry->now= jiffies; __entry->function = timer->function; + __entry->baseclk= baseclk; ), - TP_printk("timer=%p function=%ps now=%lu", - __entry->timer, __entry->function, __entry->now) + TP_printk("timer=%p function=%ps now=%lu baseclk=%lu", + __entry->timer, __entry->function, __entry->now, + __entry->baseclk) ); /** diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 8d7918ae4d0c..a9b1bbc2d88d 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1293,7 +1293,9 @@ int del_timer_sync(struct timer_list *timer) EXPORT_SYMBOL(del_timer_sync); #endif -static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list *)) +static void call_timer_fn(struct timer_list *timer, + void (*fn)(struct timer_list *), + unsigned long baseclk) { int count = preempt_count(); @@ -1316,7 +1318,7 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list */ lock_map_acquire(_map); - trace_timer_expire_entry(timer); + trace_timer_expire_entry(timer, baseclk); fn(timer); trace_timer_expire_exit(timer); @@ -1337,6 +1339,13 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list static void expire_timers(struct timer_base *base, struct hlist_head *head) { + /* +* This value is required only for tracing. base->clk was +* incremented directly before expire_timers was called. But expiry +* is related to the old base->clk value. +*/ + unsigned long baseclk = base->clk - 1; + while (!hlist_empty(head)) { struct timer_list *timer; void (*fn)(struct timer_list *); @@ -1350,11 +1359,11 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) if (timer->flags & TIMER_IRQSAFE) { raw_spin_unlock(>lock); - call_timer_fn(timer, fn); + call_timer_fn(timer, fn, baseclk); raw_spin_lock(>lock); } else { raw_spin_unlock_irq(>lock); - call_timer_fn(timer, fn); + call_timer_fn(timer, fn, baseclk); raw_spin_lock_irq(>lock); } }
[tip:timers/core] timer: Move trace point to get proper index
Commit-ID: dc1e7dc5ac6254ba0502323381a7ec847e408f1d Gitweb: https://git.kernel.org/tip/dc1e7dc5ac6254ba0502323381a7ec847e408f1d Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Mar 2019 13:09:19 +0100 Committer: Thomas Gleixner CommitDate: Sun, 24 Mar 2019 20:29:32 +0100 timer: Move trace point to get proper index When placing the timer_start trace point before the timer wheel bucket index is calculated, the index information in the trace point is useless. It is not possible to simply move the debug_activate() call after the index calculation, because debug_object_activate() needs to be called before touching the object. Therefore split debug_activate() and move the trace point into enqueue_timer() after the new index has been calculated. The debug_object_activate() call remains at the original place. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Cc: fweis...@gmail.com Cc: pet...@infradead.org Cc: Steven Rostedt Link: https://lkml.kernel.org/r/20190321120921.16463-3-anna-ma...@linutronix.de --- kernel/time/timer.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2fce056f8a49..8d7918ae4d0c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -536,6 +536,8 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer, hlist_add_head(>entry, base->vectors + idx); __set_bit(idx, base->pending_map); timer_set_idx(timer, idx); + + trace_timer_start(timer, timer->expires, timer->flags); } static void @@ -757,13 +759,6 @@ static inline void debug_init(struct timer_list *timer) trace_timer_init(timer); } -static inline void -debug_activate(struct timer_list *timer, unsigned long expires) -{ - debug_timer_activate(timer); - trace_timer_start(timer, expires, timer->flags); -} - static inline void debug_deactivate(struct timer_list *timer) { debug_timer_deactivate(timer); @@ -1037,7 +1032,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option } } - debug_activate(timer, expires); + debug_timer_activate(timer); timer->expires = expires; /* @@ -1171,7 +1166,7 @@ void add_timer_on(struct timer_list *timer, int cpu) } forward_timer_base(base); - debug_activate(timer, timer->expires); + debug_timer_activate(timer); internal_add_timer(base, timer); raw_spin_unlock_irqrestore(>lock, flags); }
[tip:timers/core] timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
Commit-ID: 6849cbb0f9a8dbc1ba56e9abc6955613103e01e3 Gitweb: https://git.kernel.org/tip/6849cbb0f9a8dbc1ba56e9abc6955613103e01e3 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Mar 2019 13:09:20 +0100 Committer: Thomas Gleixner CommitDate: Sun, 24 Mar 2019 20:29:33 +0100 timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps Since commit 04b8eb7a4ccd ("symbol lookup: introduce dereference_symbol_descriptor()") %pf is deprecated, because %ps is smart enough to handle function pointer dereference on platforms where such a dereference is required. While at it add proper line breaks to stay in the 80 character limit. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Cc: fweis...@gmail.com Cc: pet...@infradead.org Cc: Steven Rostedt Link: https://lkml.kernel.org/r/20190321120921.16463-4-anna-ma...@linutronix.de --- include/trace/events/timer.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index a57e4ee989d6..da975d69c453 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -73,7 +73,7 @@ TRACE_EVENT(timer_start, __entry->flags = flags; ), - TP_printk("timer=%p function=%pf expires=%lu [timeout=%ld] cpu=%u idx=%u flags=%s", + TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] cpu=%u idx=%u flags=%s", __entry->timer, __entry->function, __entry->expires, (long)__entry->expires - __entry->now, __entry->flags & TIMER_CPUMASK, @@ -105,7 +105,8 @@ TRACE_EVENT(timer_expire_entry, __entry->function = timer->function; ), - TP_printk("timer=%p function=%pf now=%lu", __entry->timer, __entry->function,__entry->now) + TP_printk("timer=%p function=%ps now=%lu", + __entry->timer, __entry->function, __entry->now) ); /** @@ -210,7 +211,7 @@ TRACE_EVENT(hrtimer_start, __entry->mode = mode; ), - TP_printk("hrtimer=%p function=%pf expires=%llu softexpires=%llu " + TP_printk("hrtimer=%p function=%ps expires=%llu softexpires=%llu " "mode=%s", __entry->hrtimer, __entry->function, (unsigned long long) __entry->expires, (unsigned long long) __entry->softexpires, @@ -243,7 +244,8 @@ TRACE_EVENT(hrtimer_expire_entry, __entry->function = hrtimer->function; ), - TP_printk("hrtimer=%p function=%pf now=%llu", __entry->hrtimer, __entry->function, + TP_printk("hrtimer=%p function=%ps now=%llu", + __entry->hrtimer, __entry->function, (unsigned long long) __entry->now) );
[tip:timers/core] tick/sched: Update tick_sched struct documentation
Commit-ID: d6b87eaf10bd061914f6d277d7428b3285d8850e Gitweb: https://git.kernel.org/tip/d6b87eaf10bd061914f6d277d7428b3285d8850e Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Mar 2019 13:09:18 +0100 Committer: Thomas Gleixner CommitDate: Sun, 24 Mar 2019 20:29:32 +0100 tick/sched: Update tick_sched struct documentation Adapt the documentation order of struct members to the effective order of struct members and add missing descriptions. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Cc: fweis...@gmail.com Cc: pet...@infradead.org Link: https://lkml.kernel.org/r/20190321120921.16463-2-anna-ma...@linutronix.de --- kernel/time/tick-sched.h | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index 6de959a854b2..4fb06527cf64 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -24,12 +24,19 @@ enum tick_nohz_mode { * struct tick_sched - sched tick emulation and no idle tick control/stats * @sched_timer: hrtimer to schedule the periodic tick in high * resolution mode + * @check_clocks: Notification mechanism about clocksource changes + * @nohz_mode: Mode - one state of tick_nohz_mode + * @inidle:Indicator that the CPU is in the tick idle mode + * @tick_stopped: Indicator that the idle tick has been stopped + * @idle_active: Indicator that the CPU is actively in the tick idle mode; + * it is resetted during irq handling phases. + * @do_timer_lst: CPU was the last one doing do_timer before going idle + * @got_idle_tick: Tick timer function has run with @inidle set * @last_tick: Store the last tick expiry time when the tick * timer is modified for nohz sleeps. This is necessary * to resume the tick timer operation in the timeline * when the CPU returns from nohz sleep. * @next_tick: Next tick to be fired when in dynticks mode. - * @tick_stopped: Indicator that the idle tick has been stopped * @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_calls:Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped @@ -40,8 +47,8 @@ enum tick_nohz_mode { * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped) * @timer_expires_base:Base time clock monotonic for @timer_expires - * @do_timer_lst: CPU was the last one doing do_timer before going idle - * @got_idle_tick: Tick timer function has run with @inidle set + * @next_timer:Expiry time of next expiring timer for debugging purpose only + * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick */ struct tick_sched { struct hrtimer sched_timer;
[PATCH 3/4] timer: Replace deprecated vsprintf pointer extension %pf by %ps
Since commit 04b8eb7a4ccd ("symbol lookup: introduce dereference_symbol_descriptor()") %pf is deprecated, because %ps is smart enough to handle function pointer dereference on platforms where such dereference is required. While at it shorten touched lines not to contain more than 80 characters. Signed-off-by: Anna-Maria Gleixner --- include/trace/events/timer.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index a57e4ee989d6..da975d69c453 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -73,7 +73,7 @@ TRACE_EVENT(timer_start, __entry->flags = flags; ), - TP_printk("timer=%p function=%pf expires=%lu [timeout=%ld] cpu=%u idx=%u flags=%s", + TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] cpu=%u idx=%u flags=%s", __entry->timer, __entry->function, __entry->expires, (long)__entry->expires - __entry->now, __entry->flags & TIMER_CPUMASK, @@ -105,7 +105,8 @@ TRACE_EVENT(timer_expire_entry, __entry->function = timer->function; ), - TP_printk("timer=%p function=%pf now=%lu", __entry->timer, __entry->function,__entry->now) + TP_printk("timer=%p function=%ps now=%lu", + __entry->timer, __entry->function, __entry->now) ); /** @@ -210,7 +211,7 @@ TRACE_EVENT(hrtimer_start, __entry->mode = mode; ), - TP_printk("hrtimer=%p function=%pf expires=%llu softexpires=%llu " + TP_printk("hrtimer=%p function=%ps expires=%llu softexpires=%llu " "mode=%s", __entry->hrtimer, __entry->function, (unsigned long long) __entry->expires, (unsigned long long) __entry->softexpires, @@ -243,7 +244,8 @@ TRACE_EVENT(hrtimer_expire_entry, __entry->function = hrtimer->function; ), - TP_printk("hrtimer=%p function=%pf now=%llu", __entry->hrtimer, __entry->function, + TP_printk("hrtimer=%p function=%ps now=%llu", + __entry->hrtimer, __entry->function, (unsigned long long) __entry->now) ); -- 2.20.1
[PATCH 0/4] timers: Fix and improve tracing and documentation
Hi, the patch series was developed during investigating timer problems and timer improvements. It contains a struct documentation fix in tick-sched and a fixes as well as an improvement for timer tracing. Thanks, Anna-Maria Anna-Maria Gleixner (4): tick-sched: Update tick_sched struct documentation timer: Move trace point to get proper index timer: Replace deprecated vsprintf pointer extension %pf by %ps trace/timer: Improve timer tracing include/trace/events/timer.h | 17 +++-- kernel/time/tick-sched.h | 13 ++--- kernel/time/timer.c | 30 +- 3 files changed, 38 insertions(+), 22 deletions(-) -- 2.20.1
[PATCH 4/4] trace/timer: Improve timer tracing
Timers are added to the timer wheel off by one. This is required in case a timer is queued directly before incrementing jiffies to prevent early timer expiry. When reading a timer trace and relying only on the expiry time of the timer in timer_start trace point and on the now in timer_expiry_entry trace point, it seems that the timer fires late. With the current timer_expiry_entry trace point information only now=jiffies is printed but not the value of base->clk. This makes it impossible to draw a conclusion to the index of base->clk and makes it impossible to examine timer problems without additional trace points. Therefore add the base->clk value to the timer_expire_entry trace point, to be able to calculate the index the timer base is located at during collecting expired timers. Signed-off-by: Anna-Maria Gleixner --- include/trace/events/timer.h | 11 +++ kernel/time/timer.c | 17 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index da975d69c453..dade735657ef 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -89,24 +89,27 @@ TRACE_EVENT(timer_start, */ TRACE_EVENT(timer_expire_entry, - TP_PROTO(struct timer_list *timer), + TP_PROTO(struct timer_list *timer, unsigned long baseclk), - TP_ARGS(timer), + TP_ARGS(timer, baseclk), TP_STRUCT__entry( __field( void *,timer ) __field( unsigned long, now ) __field( void *,function) + __field( unsigned long, baseclk ) ), TP_fast_assign( __entry->timer = timer; __entry->now= jiffies; __entry->function = timer->function; + __entry->baseclk= baseclk; ), - TP_printk("timer=%p function=%ps now=%lu", - __entry->timer, __entry->function, __entry->now) + TP_printk("timer=%p function=%ps now=%lu base->clk=%lu", + __entry->timer, __entry->function, __entry->now, + __entry->baseclk) ); /** diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 8d7918ae4d0c..c0233c1a4ccb 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1293,7 +1293,8 @@ int del_timer_sync(struct timer_list *timer) EXPORT_SYMBOL(del_timer_sync); #endif -static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list *)) +static void call_timer_fn(unsigned long baseclk, struct timer_list *timer, + void (*fn)(struct timer_list *)) { int count = preempt_count(); @@ -1316,7 +1317,7 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list */ lock_map_acquire(_map); - trace_timer_expire_entry(timer); + trace_timer_expire_entry(timer, baseclk); fn(timer); trace_timer_expire_exit(timer); @@ -1337,6 +1338,14 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list static void expire_timers(struct timer_base *base, struct hlist_head *head) { + /* +* this value is required for tracing only +* +* base->clk was incremented directly before expire_timers was +* called. But expiry is related to the old base->clk value. +*/ + unsigned long baseclk = base->clk - 1; + while (!hlist_empty(head)) { struct timer_list *timer; void (*fn)(struct timer_list *); @@ -1350,11 +1359,11 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) if (timer->flags & TIMER_IRQSAFE) { raw_spin_unlock(>lock); - call_timer_fn(timer, fn); + call_timer_fn(baseclk, timer, fn); raw_spin_lock(>lock); } else { raw_spin_unlock_irq(>lock); - call_timer_fn(timer, fn); + call_timer_fn(baseclk, timer, fn); raw_spin_lock_irq(>lock); } } -- 2.20.1
[PATCH 2/4] timer: Move trace point to get proper index
When placing the timer_start trace point before timer wheel bucket index is calculated, index information in trace point is useless. It is not possible to simply move debug_activate() call after index calculation, because debug_object_activate() function needs to be called before touching the object. Therefore split debug_activate() function and move trace point into timer enqueue after index calculation. debug_object_activate() call remains at the original place. Signed-off-by: Anna-Maria Gleixner --- kernel/time/timer.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2fce056f8a49..8d7918ae4d0c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -536,6 +536,8 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer, hlist_add_head(>entry, base->vectors + idx); __set_bit(idx, base->pending_map); timer_set_idx(timer, idx); + + trace_timer_start(timer, timer->expires, timer->flags); } static void @@ -757,13 +759,6 @@ static inline void debug_init(struct timer_list *timer) trace_timer_init(timer); } -static inline void -debug_activate(struct timer_list *timer, unsigned long expires) -{ - debug_timer_activate(timer); - trace_timer_start(timer, expires, timer->flags); -} - static inline void debug_deactivate(struct timer_list *timer) { debug_timer_deactivate(timer); @@ -1037,7 +1032,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option } } - debug_activate(timer, expires); + debug_timer_activate(timer); timer->expires = expires; /* @@ -1171,7 +1166,7 @@ void add_timer_on(struct timer_list *timer, int cpu) } forward_timer_base(base); - debug_activate(timer, timer->expires); + debug_timer_activate(timer); internal_add_timer(base, timer); raw_spin_unlock_irqrestore(>lock, flags); } -- 2.20.1
[PATCH 1/4] tick-sched: Update tick_sched struct documentation
Adapt the documentation order of struct members to effective order of struct members and add missing descriptions. Signed-off-by: Anna-Maria Gleixner --- kernel/time/tick-sched.h | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index 6de959a854b2..4fb06527cf64 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -24,12 +24,19 @@ enum tick_nohz_mode { * struct tick_sched - sched tick emulation and no idle tick control/stats * @sched_timer: hrtimer to schedule the periodic tick in high * resolution mode + * @check_clocks: Notification mechanism about clocksource changes + * @nohz_mode: Mode - one state of tick_nohz_mode + * @inidle:Indicator that the CPU is in the tick idle mode + * @tick_stopped: Indicator that the idle tick has been stopped + * @idle_active: Indicator that the CPU is actively in the tick idle mode; + * it is resetted during irq handling phases. + * @do_timer_lst: CPU was the last one doing do_timer before going idle + * @got_idle_tick: Tick timer function has run with @inidle set * @last_tick: Store the last tick expiry time when the tick * timer is modified for nohz sleeps. This is necessary * to resume the tick timer operation in the timeline * when the CPU returns from nohz sleep. * @next_tick: Next tick to be fired when in dynticks mode. - * @tick_stopped: Indicator that the idle tick has been stopped * @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_calls:Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped @@ -40,8 +47,8 @@ enum tick_nohz_mode { * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped) * @timer_expires_base:Base time clock monotonic for @timer_expires - * @do_timer_lst: CPU was the last one doing do_timer before going idle - * @got_idle_tick: Tick timer function has run with @inidle set + * @next_timer:Expiry time of next expiring timer for debugging purpose only + * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick */ struct tick_sched { struct hrtimer sched_timer; -- 2.20.1
[PATCH] bitops/find: Fix function description argument ordering
The order of the arguments in function documentation doesn't fit with the implementation. Change the documentation so that it corresponds to the code. This prevents to confuse people reading the documentation. While at it fixing the line breaks between the type of an argument and the arguments name in function declaration for better readability. Signed-off-by: Anna-Maria Gleixner --- include/asm-generic/bitops/find.h | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h index 8a1ee10014de..30f0f8d0bd79 100644 --- a/include/asm-generic/bitops/find.h +++ b/include/asm-generic/bitops/find.h @@ -6,14 +6,15 @@ /** * find_next_bit - find the next set bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. */ -extern unsigned long find_next_bit(const unsigned long *addr, unsigned long - size, unsigned long offset); +extern unsigned long +find_next_bit(const unsigned long *addr, unsigned long size, + unsigned long offset); #endif #ifndef find_next_and_bit @@ -21,29 +22,30 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long * find_next_and_bit - find the next set bit in both memory regions * @addr1: The first address to base the search on * @addr2: The second address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. */ -extern unsigned long find_next_and_bit(const unsigned long *addr1, - const unsigned long *addr2, unsigned long size, - unsigned long offset); +extern unsigned long +find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long offset); #endif #ifndef find_next_zero_bit /** * find_next_zero_bit - find the next cleared bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number of the next zero bit * If no bits are zero, returns @size. */ -extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned - long size, unsigned long offset); +extern unsigned long +find_next_zero_bit(const unsigned long *addr, unsigned long size, + unsigned long offset); #endif #ifdef CONFIG_GENERIC_FIND_FIRST_BIT -- 2.18.0
[PATCH] bitops/find: Fix function description argument ordering
The order of the arguments in function documentation doesn't fit with the implementation. Change the documentation so that it corresponds to the code. This prevents to confuse people reading the documentation. While at it fixing the line breaks between the type of an argument and the arguments name in function declaration for better readability. Signed-off-by: Anna-Maria Gleixner --- include/asm-generic/bitops/find.h | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h index 8a1ee10014de..30f0f8d0bd79 100644 --- a/include/asm-generic/bitops/find.h +++ b/include/asm-generic/bitops/find.h @@ -6,14 +6,15 @@ /** * find_next_bit - find the next set bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. */ -extern unsigned long find_next_bit(const unsigned long *addr, unsigned long - size, unsigned long offset); +extern unsigned long +find_next_bit(const unsigned long *addr, unsigned long size, + unsigned long offset); #endif #ifndef find_next_and_bit @@ -21,29 +22,30 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long * find_next_and_bit - find the next set bit in both memory regions * @addr1: The first address to base the search on * @addr2: The second address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. */ -extern unsigned long find_next_and_bit(const unsigned long *addr1, - const unsigned long *addr2, unsigned long size, - unsigned long offset); +extern unsigned long +find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long offset); #endif #ifndef find_next_zero_bit /** * find_next_zero_bit - find the next cleared bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number of the next zero bit * If no bits are zero, returns @size. */ -extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned - long size, unsigned long offset); +extern unsigned long +find_next_zero_bit(const unsigned long *addr, unsigned long size, + unsigned long offset); #endif #ifdef CONFIG_GENERIC_FIND_FIRST_BIT -- 2.18.0
Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq
On Wed, 1 Aug 2018, Frederic Weisbecker wrote: > Before updating the full nohz tick or the idle time on IRQ exit, we > check first if we are not in a nesting interrupt, whether the inner > interrupt is a hard or a soft IRQ. > > There is a historical reason for that: the dyntick idle mode used to > reprogram the tick on IRQ exit, after softirq processing, and there was > no point in doing that job in the outer nesting interrupt because the > tick update will be performed through the end of the inner interrupt > eventually, with even potential new timer updates. > > One corner case could show up though: if an idle tick interrupts a softirq > executing inline in the idle loop (through a call to local_bh_enable()) > after we entered in dynticks mode, the IRQ won't reprogram the tick > because it assumes the softirq executes on an inner IRQ-tail. As a > result we might put the CPU in sleep mode with the tick completely > stopped whereas a timer can still be enqueued. Indeed there is no tick > reprogramming in local_bh_enable(). We probably asssumed there was no bh > disabled section in idle, although there didn't seem to be debug code > ensuring that. > > Nowadays the nesting interrupt optimization still stands but only concern > full dynticks. The tick is stopped on IRQ exit in full dynticks mode > and we want to wait for the end of the inner IRQ to reprogramm the tick. > But in_interrupt() doesn't make a difference between softirqs executing > on IRQ tail and those executing inline. What was to be considered a > corner case in dynticks-idle mode now becomes a serious opportunity for > a bug in full dynticks mode: if a tick interrupts a task executing > softirq inline, the tick reprogramming will be ignored and we may exit > to userspace after local_bh_enable() with an enqueued timer that will > never fire. > > To fix this, simply keep reprogramming the tick if we are in a hardirq > interrupting softirq. We can still figure out a way later to restore > this optimization while excluding inline softirq processing. > > Reported-by: Anna-Maria Gleixner > Signed-off-by: Frederic Weisbecker > Cc: Thomas Gleixner > Cc: Ingo Molnar Tested-by: Anna-Maria Gleixner Thanks, Anna-Maria
Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq
On Wed, 1 Aug 2018, Frederic Weisbecker wrote: > Before updating the full nohz tick or the idle time on IRQ exit, we > check first if we are not in a nesting interrupt, whether the inner > interrupt is a hard or a soft IRQ. > > There is a historical reason for that: the dyntick idle mode used to > reprogram the tick on IRQ exit, after softirq processing, and there was > no point in doing that job in the outer nesting interrupt because the > tick update will be performed through the end of the inner interrupt > eventually, with even potential new timer updates. > > One corner case could show up though: if an idle tick interrupts a softirq > executing inline in the idle loop (through a call to local_bh_enable()) > after we entered in dynticks mode, the IRQ won't reprogram the tick > because it assumes the softirq executes on an inner IRQ-tail. As a > result we might put the CPU in sleep mode with the tick completely > stopped whereas a timer can still be enqueued. Indeed there is no tick > reprogramming in local_bh_enable(). We probably asssumed there was no bh > disabled section in idle, although there didn't seem to be debug code > ensuring that. > > Nowadays the nesting interrupt optimization still stands but only concern > full dynticks. The tick is stopped on IRQ exit in full dynticks mode > and we want to wait for the end of the inner IRQ to reprogramm the tick. > But in_interrupt() doesn't make a difference between softirqs executing > on IRQ tail and those executing inline. What was to be considered a > corner case in dynticks-idle mode now becomes a serious opportunity for > a bug in full dynticks mode: if a tick interrupts a task executing > softirq inline, the tick reprogramming will be ignored and we may exit > to userspace after local_bh_enable() with an enqueued timer that will > never fire. > > To fix this, simply keep reprogramming the tick if we are in a hardirq > interrupting softirq. We can still figure out a way later to restore > this optimization while excluding inline softirq processing. > > Reported-by: Anna-Maria Gleixner > Signed-off-by: Frederic Weisbecker > Cc: Thomas Gleixner > Cc: Ingo Molnar Tested-by: Anna-Maria Gleixner Thanks, Anna-Maria
[tip:timers/urgent] nohz: Fix local_timer_softirq_pending()
Commit-ID: 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e Gitweb: https://git.kernel.org/tip/80d20d35af1edd632a5e7a3b9c0ab7ceff92769e Author: Anna-Maria Gleixner AuthorDate: Tue, 31 Jul 2018 18:13:58 +0200 Committer: Thomas Gleixner CommitDate: Tue, 31 Jul 2018 22:08:44 +0200 nohz: Fix local_timer_softirq_pending() local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Reviewed-by: Daniel Bristot de Oliveira Acked-by: Frederic Weisbecker Cc: bige...@linutronix.de Cc: pet...@infradead.org Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-ma...@linutronix.de --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a6b42b..5b33e2f5c0ed 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
[tip:timers/urgent] nohz: Fix local_timer_softirq_pending()
Commit-ID: 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e Gitweb: https://git.kernel.org/tip/80d20d35af1edd632a5e7a3b9c0ab7ceff92769e Author: Anna-Maria Gleixner AuthorDate: Tue, 31 Jul 2018 18:13:58 +0200 Committer: Thomas Gleixner CommitDate: Tue, 31 Jul 2018 22:08:44 +0200 nohz: Fix local_timer_softirq_pending() local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Reviewed-by: Daniel Bristot de Oliveira Acked-by: Frederic Weisbecker Cc: bige...@linutronix.de Cc: pet...@infradead.org Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-ma...@linutronix.de --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a6b42b..5b33e2f5c0ed 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
[PATCH] nohz: Fix local_timer_softirq_pending()
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a6b42b..5b33e2f5c0ed 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu) -- 2.18.0
[PATCH] nohz: Fix local_timer_softirq_pending()
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a6b42b..5b33e2f5c0ed 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu) -- 2.18.0
Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick
Hi Haris, On Thu, 28 Jun 2018, Haris Okanovic wrote: > Collect expired timers in interrupt context to avoid overhead of waking > ktimersoftd on every scheduler tick. > > This is implemented by storing lists of expired timers in the timer_base > struct, which is updated by the interrupt routing on each tick in > run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when > one or more expired timers are collected. > > Performance impact on a 2core Intel Atom E3825 system: > * reduction in small latency spikes measured by cyclictest > * ~30% fewer context-switches measured by perf > * run_local_timers() execution time increases by 0.2 measured by TSC > I'm also working on timer improvements at the moment. When I fixed all my bugs in my implementation (there is a last horrible one), I'm very interested in integrating your patches into my testing to be able to give you a tested-by. Thanks, Anna-Maria
Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick
Hi Haris, On Thu, 28 Jun 2018, Haris Okanovic wrote: > Collect expired timers in interrupt context to avoid overhead of waking > ktimersoftd on every scheduler tick. > > This is implemented by storing lists of expired timers in the timer_base > struct, which is updated by the interrupt routing on each tick in > run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when > one or more expired timers are collected. > > Performance impact on a 2core Intel Atom E3825 system: > * reduction in small latency spikes measured by cyclictest > * ~30% fewer context-switches measured by perf > * run_local_timers() execution time increases by 0.2 measured by TSC > I'm also working on timer improvements at the moment. When I fixed all my bugs in my implementation (there is a last horrible one), I'm very interested in integrating your patches into my testing to be able to give you a tested-by. Thanks, Anna-Maria
[PATCH v2] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls
From: Sebastian Andrzej Siewior hrtimer_init_sleeper() calls require a prior initialisation of the hrtimer object with hrtimer_init(). Lets make the initialisation of the hrtimer object part of hrtimer_init_sleeper(). To remain consistent consider init_on_stack as well. Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call sites need to be updated as well. Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de Signed-off-by: Sebastian Andrzej Siewior [anna-maria: Updating the commit message] Signed-off-by: Anna-Maria Gleixner --- v1..v2: Fix missing call site in drivers/staging/android/vsoc.c block/blk-mq.c | 3 +-- drivers/staging/android/vsoc.c | 6 ++--- include/linux/hrtimer.h| 19 +++--- include/linux/wait.h | 4 +-- kernel/futex.c | 19 ++ kernel/time/hrtimer.c | 46 ++ net/core/pktgen.c | 4 +-- 7 files changed, 67 insertions(+), 34 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 95919268564b..f95ad9ede0f6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, kt = nsecs; mode = HRTIMER_MODE_REL; - hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode); + hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current); hrtimer_set_expires(, kt); - hrtimer_init_sleeper(, current); do { if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) break; diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 806beda1040b..6c7f666c0e33 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -438,12 +438,10 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) if (!timespec_valid()) return -EINVAL; - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); + hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC, + HRTIMER_MODE_ABS, current); hrtimer_set_expires_range_ns(>timer, timespec_to_ktime(ts), current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); } while (1) { diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 3892e9c8b2de..b8bbaabd5aff 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); /* Initialize timers: */ extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, +enum hrtimer_mode mode, +struct task_struct *task); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task); extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else @@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer *timer, { hrtimer_init(timer, which_clock, mode); } + +static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task) +{ + hrtimer_init_sleeper(sl, clock_id, mode, task); +} + static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif @@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp, const enum hrtimer_mode mode, const clockid_t clockid); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, -struct task_struct *tsk); - extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta, const enum hrtimer_mode mode); extern int schedule_hrtimeout_range_clock(ktime_t *expires, diff --git a/include/linux/wait.h b/include/linux/wait.h index d9f131ecf708..a0938fc8dcdb 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -488,8 +488,8 @@ do { \ int
[PATCH v2] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls
From: Sebastian Andrzej Siewior hrtimer_init_sleeper() calls require a prior initialisation of the hrtimer object with hrtimer_init(). Lets make the initialisation of the hrtimer object part of hrtimer_init_sleeper(). To remain consistent consider init_on_stack as well. Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call sites need to be updated as well. Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de Signed-off-by: Sebastian Andrzej Siewior [anna-maria: Updating the commit message] Signed-off-by: Anna-Maria Gleixner --- v1..v2: Fix missing call site in drivers/staging/android/vsoc.c block/blk-mq.c | 3 +-- drivers/staging/android/vsoc.c | 6 ++--- include/linux/hrtimer.h| 19 +++--- include/linux/wait.h | 4 +-- kernel/futex.c | 19 ++ kernel/time/hrtimer.c | 46 ++ net/core/pktgen.c | 4 +-- 7 files changed, 67 insertions(+), 34 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 95919268564b..f95ad9ede0f6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, kt = nsecs; mode = HRTIMER_MODE_REL; - hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode); + hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current); hrtimer_set_expires(, kt); - hrtimer_init_sleeper(, current); do { if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) break; diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 806beda1040b..6c7f666c0e33 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -438,12 +438,10 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) if (!timespec_valid()) return -EINVAL; - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); + hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC, + HRTIMER_MODE_ABS, current); hrtimer_set_expires_range_ns(>timer, timespec_to_ktime(ts), current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); } while (1) { diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 3892e9c8b2de..b8bbaabd5aff 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); /* Initialize timers: */ extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, +enum hrtimer_mode mode, +struct task_struct *task); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task); extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else @@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer *timer, { hrtimer_init(timer, which_clock, mode); } + +static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task) +{ + hrtimer_init_sleeper(sl, clock_id, mode, task); +} + static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif @@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp, const enum hrtimer_mode mode, const clockid_t clockid); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, -struct task_struct *tsk); - extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta, const enum hrtimer_mode mode); extern int schedule_hrtimeout_range_clock(ktime_t *expires, diff --git a/include/linux/wait.h b/include/linux/wait.h index d9f131ecf708..a0938fc8dcdb 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -488,8 +488,8 @@ do { \ int
[PATCH] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls
From: Sebastian Andrzej Siewior hrtimer_init_sleeper() calls require a prior initialisation of the hrtimer object with hrtimer_init(). Lets make the initialisation of the hrtimer object part of hrtimer_init_sleeper(). To remain consistent consider init_on_stack as well. Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call sites need to be updated as well. Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de Signed-off-by: Sebastian Andrzej Siewior [anna-maria: Updating the commit message] Signed-off-by: Anna-Maria Gleixner --- block/blk-mq.c | 3 +-- include/linux/hrtimer.h | 19 ++--- include/linux/wait.h| 4 ++-- kernel/futex.c | 19 +++-- kernel/time/hrtimer.c | 46 - net/core/pktgen.c | 4 ++-- 6 files changed, 65 insertions(+), 30 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 95919268564b..f95ad9ede0f6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, kt = nsecs; mode = HRTIMER_MODE_REL; - hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode); + hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current); hrtimer_set_expires(, kt); - hrtimer_init_sleeper(, current); do { if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) break; diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 3892e9c8b2de..b8bbaabd5aff 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); /* Initialize timers: */ extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, +enum hrtimer_mode mode, +struct task_struct *task); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task); extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else @@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer *timer, { hrtimer_init(timer, which_clock, mode); } + +static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task) +{ + hrtimer_init_sleeper(sl, clock_id, mode, task); +} + static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif @@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp, const enum hrtimer_mode mode, const clockid_t clockid); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, -struct task_struct *tsk); - extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta, const enum hrtimer_mode mode); extern int schedule_hrtimeout_range_clock(ktime_t *expires, diff --git a/include/linux/wait.h b/include/linux/wait.h index d9f131ecf708..a0938fc8dcdb 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -488,8 +488,8 @@ do { \ int __ret = 0; \ struct hrtimer_sleeper __t; \ \ - hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); \ - hrtimer_init_sleeper(&__t, current); \ + hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, HRTIMER_MODE_REL, \ + current); \ if ((timeout) != KTIME_MAX) \ hrtimer_start_range_ns(&__t.timer, timeout, \ current->timer_slack_ns, \ diff --git a/kernel/futex.c b/kernel/futex.c index 1f450e092c74..146432d78e06 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2624,10 +2624,9 @@ static int futex_wait(u32 __user *uaddr, unsigned
[PATCH] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls
From: Sebastian Andrzej Siewior hrtimer_init_sleeper() calls require a prior initialisation of the hrtimer object with hrtimer_init(). Lets make the initialisation of the hrtimer object part of hrtimer_init_sleeper(). To remain consistent consider init_on_stack as well. Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call sites need to be updated as well. Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de Signed-off-by: Sebastian Andrzej Siewior [anna-maria: Updating the commit message] Signed-off-by: Anna-Maria Gleixner --- block/blk-mq.c | 3 +-- include/linux/hrtimer.h | 19 ++--- include/linux/wait.h| 4 ++-- kernel/futex.c | 19 +++-- kernel/time/hrtimer.c | 46 - net/core/pktgen.c | 4 ++-- 6 files changed, 65 insertions(+), 30 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 95919268564b..f95ad9ede0f6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, kt = nsecs; mode = HRTIMER_MODE_REL; - hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode); + hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current); hrtimer_set_expires(, kt); - hrtimer_init_sleeper(, current); do { if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE) break; diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 3892e9c8b2de..b8bbaabd5aff 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); /* Initialize timers: */ extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, +enum hrtimer_mode mode, +struct task_struct *task); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, enum hrtimer_mode mode); +extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task); extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else @@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer *timer, { hrtimer_init(timer, which_clock, mode); } + +static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, + clockid_t clock_id, + enum hrtimer_mode mode, + struct task_struct *task) +{ + hrtimer_init_sleeper(sl, clock_id, mode, task); +} + static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif @@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp, const enum hrtimer_mode mode, const clockid_t clockid); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, -struct task_struct *tsk); - extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta, const enum hrtimer_mode mode); extern int schedule_hrtimeout_range_clock(ktime_t *expires, diff --git a/include/linux/wait.h b/include/linux/wait.h index d9f131ecf708..a0938fc8dcdb 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -488,8 +488,8 @@ do { \ int __ret = 0; \ struct hrtimer_sleeper __t; \ \ - hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); \ - hrtimer_init_sleeper(&__t, current); \ + hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, HRTIMER_MODE_REL, \ + current); \ if ((timeout) != KTIME_MAX) \ hrtimer_start_range_ns(&__t.timer, timeout, \ current->timer_slack_ns, \ diff --git a/kernel/futex.c b/kernel/futex.c index 1f450e092c74..146432d78e06 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2624,10 +2624,9 @@ static int futex_wait(u32 __user *uaddr, unsigned
sched/core warning triggers on rcu torture test
Hi, during rcu torture tests (TREE04 and TREE07) I noticed, that a WARN_ON_ONCE() in sched core triggers on a recent 4.18-rc2 based kernel (6f0d349d922b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) as well as on a 4.17.3. I'm running the tests on a machine with 144 cores: tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "9*TREE07" tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "18*TREE04" The warning was introduced by commit d84b31313ef8 ("sched/isolation: Offload residual 1Hz scheduler tick"). Output looks similar for all tests I did (this one is the output of the 4.18-rc2 based kernel): WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0 Modules linked in: CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xb6/0xc0 Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6 Call Trace: process_one_work+0x1df/0x3b0 worker_thread+0x44/0x3d0 kthread+0xf3/0x130 ? set_worker_desc+0xb0/0xb0 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 ---[ end trace 7c99b83eb0ec64e8 ]--- Do you need some more information? Thanks, Anna-Maria
sched/core warning triggers on rcu torture test
Hi, during rcu torture tests (TREE04 and TREE07) I noticed, that a WARN_ON_ONCE() in sched core triggers on a recent 4.18-rc2 based kernel (6f0d349d922b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) as well as on a 4.17.3. I'm running the tests on a machine with 144 cores: tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "9*TREE07" tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "18*TREE04" The warning was introduced by commit d84b31313ef8 ("sched/isolation: Offload residual 1Hz scheduler tick"). Output looks similar for all tests I did (this one is the output of the 4.18-rc2 based kernel): WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0 Modules linked in: CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xb6/0xc0 Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6 Call Trace: process_one_work+0x1df/0x3b0 worker_thread+0x44/0x3d0 kthread+0xf3/0x130 ? set_worker_desc+0xb0/0xb0 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 ---[ end trace 7c99b83eb0ec64e8 ]--- Do you need some more information? Thanks, Anna-Maria
[tip:core/urgent] signal: Remove no longer required irqsave/restore
Commit-ID: 59dc6f3c6d81c0c4379025c4eb56919391d62b67 Gitweb: https://git.kernel.org/tip/59dc6f3c6d81c0c4379025c4eb56919391d62b67 Author: Anna-Maria Gleixner AuthorDate: Fri, 25 May 2018 11:05:07 +0200 Committer: Thomas Gleixner CommitDate: Sun, 10 Jun 2018 06:14:01 +0200 signal: Remove no longer required irqsave/restore Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") for a different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep enabled as suggested by Paul McKenney. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Acked-by: Paul E. McKenney Acked-by: "Eric W. Biederman" Cc: bige...@linutronix.de Link: https://lkml.kernel.org/r/20180525090507.22248-3-anna-ma...@linutronix.de --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 0f865d67415d..8d8a940422a8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; }
[tip:core/urgent] rcu: Update documentation of rcu_read_unlock()
Commit-ID: ec84b27f9b3b569f9235413d1945a2006b97b0aa Gitweb: https://git.kernel.org/tip/ec84b27f9b3b569f9235413d1945a2006b97b0aa Author: Anna-Maria Gleixner AuthorDate: Fri, 25 May 2018 11:05:06 +0200 Committer: Thomas Gleixner CommitDate: Sun, 10 Jun 2018 06:14:01 +0200 rcu: Update documentation of rcu_read_unlock() Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the explanation in rcu_read_unlock() documentation about irq unsafe rtmutex wait_lock is no longer valid. Remove it to prevent kernel developers reading the documentation to rely on it. Suggested-by: Eric W. Biederman Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Acked-by: "Eric W. Biederman" Cc: bige...@linutronix.de Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-ma...@linutronix.de --- include/linux/rcupdate.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e679b175b411..65163aa0bb04 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -652,9 +652,7 @@ static inline void rcu_read_lock(void) * Unfortunately, this function acquires the scheduler's runqueue and * priority-inheritance spinlocks. This means that deadlock could result * if the caller of rcu_read_unlock() already holds one of these locks or - * any lock that is ever acquired while holding them; or any lock which - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() - * does not disable irqs while taking ->wait_lock. + * any lock that is ever acquired while holding them. * * That said, RCU readers are never priority boosted unless they were * preempted. Therefore, one way to avoid deadlock is to make sure
[tip:core/urgent] signal: Remove no longer required irqsave/restore
Commit-ID: 59dc6f3c6d81c0c4379025c4eb56919391d62b67 Gitweb: https://git.kernel.org/tip/59dc6f3c6d81c0c4379025c4eb56919391d62b67 Author: Anna-Maria Gleixner AuthorDate: Fri, 25 May 2018 11:05:07 +0200 Committer: Thomas Gleixner CommitDate: Sun, 10 Jun 2018 06:14:01 +0200 signal: Remove no longer required irqsave/restore Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") for a different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep enabled as suggested by Paul McKenney. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Acked-by: Paul E. McKenney Acked-by: "Eric W. Biederman" Cc: bige...@linutronix.de Link: https://lkml.kernel.org/r/20180525090507.22248-3-anna-ma...@linutronix.de --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 0f865d67415d..8d8a940422a8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; }
[tip:core/urgent] rcu: Update documentation of rcu_read_unlock()
Commit-ID: ec84b27f9b3b569f9235413d1945a2006b97b0aa Gitweb: https://git.kernel.org/tip/ec84b27f9b3b569f9235413d1945a2006b97b0aa Author: Anna-Maria Gleixner AuthorDate: Fri, 25 May 2018 11:05:06 +0200 Committer: Thomas Gleixner CommitDate: Sun, 10 Jun 2018 06:14:01 +0200 rcu: Update documentation of rcu_read_unlock() Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the explanation in rcu_read_unlock() documentation about irq unsafe rtmutex wait_lock is no longer valid. Remove it to prevent kernel developers reading the documentation to rely on it. Suggested-by: Eric W. Biederman Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Acked-by: "Eric W. Biederman" Cc: bige...@linutronix.de Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-ma...@linutronix.de --- include/linux/rcupdate.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e679b175b411..65163aa0bb04 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -652,9 +652,7 @@ static inline void rcu_read_lock(void) * Unfortunately, this function acquires the scheduler's runqueue and * priority-inheritance spinlocks. This means that deadlock could result * if the caller of rcu_read_unlock() already holds one of these locks or - * any lock that is ever acquired while holding them; or any lock which - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() - * does not disable irqs while taking ->wait_lock. + * any lock that is ever acquired while holding them. * * That said, RCU readers are never priority boosted unless they were * preempted. Therefore, one way to avoid deadlock is to make sure
[tip:core/urgent] signal: Remove no longer required irqsave/restore
Commit-ID: e79e0f38083e607da5d7b493e7a0f78ba38d788e Gitweb: https://git.kernel.org/tip/e79e0f38083e607da5d7b493e7a0f78ba38d788e Author: Anna-Maria Gleixner AuthorDate: Fri, 4 May 2018 16:40:14 +0200 Committer: Thomas Gleixner CommitDate: Thu, 7 Jun 2018 22:18:55 +0200 signal: Remove no longer required irqsave/restore Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make > wait_lock irq safe") for different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. Testing was done over an extensive period with RCU torture, especially with TREE03 as requested by Paul. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: "Paul E . McKenney" Acked-by: "Eric W. Biederman" Link: https://lkml.kernel.org/r/20180504144014.5378-1-bige...@linutronix.de --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 0f865d67415d..8d8a940422a8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; }
[tip:core/urgent] signal: Remove no longer required irqsave/restore
Commit-ID: e79e0f38083e607da5d7b493e7a0f78ba38d788e Gitweb: https://git.kernel.org/tip/e79e0f38083e607da5d7b493e7a0f78ba38d788e Author: Anna-Maria Gleixner AuthorDate: Fri, 4 May 2018 16:40:14 +0200 Committer: Thomas Gleixner CommitDate: Thu, 7 Jun 2018 22:18:55 +0200 signal: Remove no longer required irqsave/restore Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make > wait_lock irq safe") for different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. Testing was done over an extensive period with RCU torture, especially with TREE03 as requested by Paul. Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: "Paul E . McKenney" Acked-by: "Eric W. Biederman" Link: https://lkml.kernel.org/r/20180504144014.5378-1-bige...@linutronix.de --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 0f865d67415d..8d8a940422a8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; }
Re: [PATCH v6 0/4] enable early printing of hashed pointers
On Tue, 5 Jun 2018, Anna-Maria Gleixner wrote: > On Thu, 31 May 2018, Steven Rostedt wrote: > > > On Mon, 28 May 2018 11:46:38 +1000 > > "Tobin C. Harding" wrote: > > > > > Steve, > > > > Hi Tobin, > > > > Sorry for the late reply, I'm currently at a conference and have had > > little time to read email. > > > > > > > > Could you please take a quick squiz at the final 2 patches if you get a > > > chance. I assumed we are in preemptible context during early_init based > > > on your code (and code comment) and called static_branch_disable() > > > directly if hw RNG returned keying material. It's a pretty simple > > > change but I'd love to get someone else to check I've not noob'ed it. > > > > I can take a look, and perhaps do some tests. But it was Anna-Maria > > that originally triggered the issue. She's on Cc, perhaps she can try > > this and see if it works. > > I'll test it today - sorry for the delay. > I tested it with command line option enabled. This works early enough because it works right after early_trace_init(). Thanks, Anna-Maria
Re: [PATCH v6 0/4] enable early printing of hashed pointers
On Tue, 5 Jun 2018, Anna-Maria Gleixner wrote: > On Thu, 31 May 2018, Steven Rostedt wrote: > > > On Mon, 28 May 2018 11:46:38 +1000 > > "Tobin C. Harding" wrote: > > > > > Steve, > > > > Hi Tobin, > > > > Sorry for the late reply, I'm currently at a conference and have had > > little time to read email. > > > > > > > > Could you please take a quick squiz at the final 2 patches if you get a > > > chance. I assumed we are in preemptible context during early_init based > > > on your code (and code comment) and called static_branch_disable() > > > directly if hw RNG returned keying material. It's a pretty simple > > > change but I'd love to get someone else to check I've not noob'ed it. > > > > I can take a look, and perhaps do some tests. But it was Anna-Maria > > that originally triggered the issue. She's on Cc, perhaps she can try > > this and see if it works. > > I'll test it today - sorry for the delay. > I tested it with command line option enabled. This works early enough because it works right after early_trace_init(). Thanks, Anna-Maria
Re: [PATCH v6 0/4] enable early printing of hashed pointers
On Thu, 31 May 2018, Steven Rostedt wrote: > On Mon, 28 May 2018 11:46:38 +1000 > "Tobin C. Harding" wrote: > > > Steve, > > Hi Tobin, > > Sorry for the late reply, I'm currently at a conference and have had > little time to read email. > > > > > Could you please take a quick squiz at the final 2 patches if you get a > > chance. I assumed we are in preemptible context during early_init based > > on your code (and code comment) and called static_branch_disable() > > directly if hw RNG returned keying material. It's a pretty simple > > change but I'd love to get someone else to check I've not noob'ed it. > > I can take a look, and perhaps do some tests. But it was Anna-Maria > that originally triggered the issue. She's on Cc, perhaps she can try > this and see if it works. I'll test it today - sorry for the delay. Anna-Maria
Re: [PATCH v6 0/4] enable early printing of hashed pointers
On Thu, 31 May 2018, Steven Rostedt wrote: > On Mon, 28 May 2018 11:46:38 +1000 > "Tobin C. Harding" wrote: > > > Steve, > > Hi Tobin, > > Sorry for the late reply, I'm currently at a conference and have had > little time to read email. > > > > > Could you please take a quick squiz at the final 2 patches if you get a > > chance. I assumed we are in preemptible context during early_init based > > on your code (and code comment) and called static_branch_disable() > > directly if hw RNG returned keying material. It's a pretty simple > > change but I'd love to get someone else to check I've not noob'ed it. > > I can take a look, and perhaps do some tests. But it was Anna-Maria > that originally triggered the issue. She's on Cc, perhaps she can try > this and see if it works. I'll test it today - sorry for the delay. Anna-Maria
Re: [PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()
On Fri, 25 May 2018, Paul E. McKenney wrote: > On Fri, May 25, 2018 at 11:05:06AM +0200, Anna-Maria Gleixner wrote: > > Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the > > explanation in rcu_read_unlock() documentation about irq unsafe rtmutex > > wait_lock is no longer valid. > > > > Remove it to prevent kernel developers reading the documentation to rely on > > it. > > > > Suggested-by: Eric W. Biederman <ebied...@xmission.com> > > Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> > > Reviewed-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > Or let me know if you would like me to carry this patch. Either way, > just let me know! > Thanks! Thomas told be he will take both. Anna-Maria > > > --- > > include/linux/rcupdate.h | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 36360d07f25b..64644fda3b22 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void) > > * Unfortunately, this function acquires the scheduler's runqueue and > > * priority-inheritance spinlocks. This means that deadlock could result > > * if the caller of rcu_read_unlock() already holds one of these locks or > > - * any lock that is ever acquired while holding them; or any lock which > > - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() > > - * does not disable irqs while taking ->wait_lock. > > + * any lock that is ever acquired while holding them. > > * > > * That said, RCU readers are never priority boosted unless they were > > * preempted. Therefore, one way to avoid deadlock is to make sure > > -- > > 2.15.1 > > > >
Re: [PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()
On Fri, 25 May 2018, Paul E. McKenney wrote: > On Fri, May 25, 2018 at 11:05:06AM +0200, Anna-Maria Gleixner wrote: > > Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the > > explanation in rcu_read_unlock() documentation about irq unsafe rtmutex > > wait_lock is no longer valid. > > > > Remove it to prevent kernel developers reading the documentation to rely on > > it. > > > > Suggested-by: Eric W. Biederman > > Signed-off-by: Anna-Maria Gleixner > > Reviewed-by: Paul E. McKenney > > Or let me know if you would like me to carry this patch. Either way, > just let me know! > Thanks! Thomas told be he will take both. Anna-Maria > > > --- > > include/linux/rcupdate.h | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 36360d07f25b..64644fda3b22 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void) > > * Unfortunately, this function acquires the scheduler's runqueue and > > * priority-inheritance spinlocks. This means that deadlock could result > > * if the caller of rcu_read_unlock() already holds one of these locks or > > - * any lock that is ever acquired while holding them; or any lock which > > - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() > > - * does not disable irqs while taking ->wait_lock. > > + * any lock that is ever acquired while holding them. > > * > > * That said, RCU readers are never priority boosted unless they were > > * preempted. Therefore, one way to avoid deadlock is to make sure > > -- > > 2.15.1 > > > >
[PATCH] afs/server: Remove leftover variable
Variable ret is set two times in afs_install_server() but never dereferenced. It is a leftover of a rework of afs_install_server() by commit d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation"). Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> --- fs/afs/server.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/afs/server.c b/fs/afs/server.c index 3af4625e2f8c..b69a70362bd5 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -153,7 +153,7 @@ static struct afs_server *afs_install_server(struct afs_net *net, const struct afs_addr_list *alist; struct afs_server *server; struct rb_node **pp, *p; - int ret = -EEXIST, diff; + int diff; _enter("%p", candidate); @@ -198,7 +198,6 @@ static struct afs_server *afs_install_server(struct afs_net *net, hlist_add_head_rcu(>addr6_link, >fs_addresses6); write_sequnlock(>fs_addr_lock); - ret = 0; exists: afs_get_server(server); -- 2.15.1
[PATCH] afs/server: Remove leftover variable
Variable ret is set two times in afs_install_server() but never dereferenced. It is a leftover of a rework of afs_install_server() by commit d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation"). Signed-off-by: Anna-Maria Gleixner --- fs/afs/server.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/afs/server.c b/fs/afs/server.c index 3af4625e2f8c..b69a70362bd5 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -153,7 +153,7 @@ static struct afs_server *afs_install_server(struct afs_net *net, const struct afs_addr_list *alist; struct afs_server *server; struct rb_node **pp, *p; - int ret = -EEXIST, diff; + int diff; _enter("%p", candidate); @@ -198,7 +198,6 @@ static struct afs_server *afs_install_server(struct afs_net *net, hlist_add_head_rcu(>addr6_link, >fs_addresses6); write_sequnlock(>fs_addr_lock); - ret = 0; exists: afs_get_server(server); -- 2.15.1
[PATCH v2 0/2] rtmutex wait_lock is irq safe
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the rtmutex wait_lock is irq safe. Therefore the irqsave/restore in kernel/signal is no longer required (see Patch 2/2). During discussions about v1 of this patch, Eric Biederman noticed, that there is a no longer valid rcu_read_unlock() documentation. Therefore sending a short queue: fixing first the documentation of rcu_read_unlock() and afterwards removing irqsave/restore in kernel/signal. v1..v2: - Add new patch updating rcu documentation as suggested by Eric Biederman - Udpate commit message of kernel/signal patch Thanks, Anna-Maria Anna-Maria Gleixner (2): rcu: Update documentation of rcu_read_unlock() signal: Remove no longer required irqsave/restore include/linux/rcupdate.h | 4 +--- kernel/signal.c | 24 +++- 2 files changed, 8 insertions(+), 20 deletions(-) -- 2.15.1
[PATCH v2 2/2] signal: Remove no longer required irqsave/restore
Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") for a different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep enabled as suggested by Paul McKenney. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Acked-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 9c33163a6165..19679ad77aa6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; } -- 2.15.1
[PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the explanation in rcu_read_unlock() documentation about irq unsafe rtmutex wait_lock is no longer valid. Remove it to prevent kernel developers reading the documentation to rely on it. Suggested-by: Eric W. Biederman <ebied...@xmission.com> Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> --- include/linux/rcupdate.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 36360d07f25b..64644fda3b22 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void) * Unfortunately, this function acquires the scheduler's runqueue and * priority-inheritance spinlocks. This means that deadlock could result * if the caller of rcu_read_unlock() already holds one of these locks or - * any lock that is ever acquired while holding them; or any lock which - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() - * does not disable irqs while taking ->wait_lock. + * any lock that is ever acquired while holding them. * * That said, RCU readers are never priority boosted unless they were * preempted. Therefore, one way to avoid deadlock is to make sure -- 2.15.1
[PATCH v2 0/2] rtmutex wait_lock is irq safe
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the rtmutex wait_lock is irq safe. Therefore the irqsave/restore in kernel/signal is no longer required (see Patch 2/2). During discussions about v1 of this patch, Eric Biederman noticed, that there is a no longer valid rcu_read_unlock() documentation. Therefore sending a short queue: fixing first the documentation of rcu_read_unlock() and afterwards removing irqsave/restore in kernel/signal. v1..v2: - Add new patch updating rcu documentation as suggested by Eric Biederman - Udpate commit message of kernel/signal patch Thanks, Anna-Maria Anna-Maria Gleixner (2): rcu: Update documentation of rcu_read_unlock() signal: Remove no longer required irqsave/restore include/linux/rcupdate.h | 4 +--- kernel/signal.c | 24 +++- 2 files changed, 8 insertions(+), 20 deletions(-) -- 2.15.1
[PATCH v2 2/2] signal: Remove no longer required irqsave/restore
Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and RCU") introduced a rcu read side critical section with interrupts disabled. The changelog suggested that a better long-term fix would be "to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock". This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") for a different reason. Therefore revert commit a841796f11c9 ("signal: align > __lock_task_sighand() irq disabling and RCU") as the interrupt disable dance is not longer required. The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep enabled as suggested by Paul McKenney. Signed-off-by: Anna-Maria Gleixner Acked-by: Paul E. McKenney --- kernel/signal.c | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 9c33163a6165..19679ad77aa6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, { struct sighand_struct *sighand; + rcu_read_lock(); for (;;) { - /* -* Disable interrupts early to avoid deadlocks. -* See rcu_read_unlock() comment header for details. -*/ - local_irq_save(*flags); - rcu_read_lock(); sighand = rcu_dereference(tsk->sighand); - if (unlikely(sighand == NULL)) { - rcu_read_unlock(); - local_irq_restore(*flags); + if (unlikely(sighand == NULL)) break; - } + /* * This sighand can be already freed and even reused, but * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which @@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, * __exit_signal(). In the latter case the next iteration * must see ->sighand == NULL. */ - spin_lock(>siglock); - if (likely(sighand == tsk->sighand)) { - rcu_read_unlock(); + spin_lock_irqsave(>siglock, *flags); + if (likely(sighand == tsk->sighand)) break; - } - spin_unlock(>siglock); - rcu_read_unlock(); - local_irq_restore(*flags); + spin_unlock_irqrestore(>siglock, *flags); } + rcu_read_unlock(); return sighand; } -- 2.15.1
[PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the explanation in rcu_read_unlock() documentation about irq unsafe rtmutex wait_lock is no longer valid. Remove it to prevent kernel developers reading the documentation to rely on it. Suggested-by: Eric W. Biederman Signed-off-by: Anna-Maria Gleixner --- include/linux/rcupdate.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 36360d07f25b..64644fda3b22 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void) * Unfortunately, this function acquires the scheduler's runqueue and * priority-inheritance spinlocks. This means that deadlock could result * if the caller of rcu_read_unlock() already holds one of these locks or - * any lock that is ever acquired while holding them; or any lock which - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() - * does not disable irqs while taking ->wait_lock. + * any lock that is ever acquired while holding them. * * That said, RCU readers are never priority boosted unless they were * preempted. Therefore, one way to avoid deadlock is to make sure -- 2.15.1
Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore
On Tue, 8 May 2018, Paul E. McKenney wrote: > On Tue, May 08, 2018 at 03:42:25PM +0200, Anna-Maria Gleixner wrote: > > On Sat, 5 May 2018, Thomas Gleixner wrote: > > > > > On Fri, 4 May 2018, Paul E. McKenney wrote: > > > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote: > > > > > > (Me, I would run rcutorture scenario TREE03 for an extended time > > > > > > period > > > > > > on b4abf91047cf with your patch applied. > > > > > > > > And with lockdep enabled, which TREE03 does not do by default. > > > > > > Will run that again to make sure. > > > > I ran the rcutorture scenario TREE03 for 4 hours with the above > > described setup. It was successful and without any lockdep splats. > > Thank you for the testing, Anna-Maria! If you give them a Tested-by, > I will give them an ack. ;-) > If it is ok to give a Tested-by to the patch I wrote, I will do this to get your ack :) Tested-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore
On Tue, 8 May 2018, Paul E. McKenney wrote: > On Tue, May 08, 2018 at 03:42:25PM +0200, Anna-Maria Gleixner wrote: > > On Sat, 5 May 2018, Thomas Gleixner wrote: > > > > > On Fri, 4 May 2018, Paul E. McKenney wrote: > > > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote: > > > > > > (Me, I would run rcutorture scenario TREE03 for an extended time > > > > > > period > > > > > > on b4abf91047cf with your patch applied. > > > > > > > > And with lockdep enabled, which TREE03 does not do by default. > > > > > > Will run that again to make sure. > > > > I ran the rcutorture scenario TREE03 for 4 hours with the above > > described setup. It was successful and without any lockdep splats. > > Thank you for the testing, Anna-Maria! If you give them a Tested-by, > I will give them an ack. ;-) > If it is ok to give a Tested-by to the patch I wrote, I will do this to get your ack :) Tested-by: Anna-Maria Gleixner
Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore
On Sat, 5 May 2018, Thomas Gleixner wrote: > On Fri, 4 May 2018, Paul E. McKenney wrote: > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote: > > > > (Me, I would run rcutorture scenario TREE03 for an extended time period > > > > on b4abf91047cf with your patch applied. > > > > And with lockdep enabled, which TREE03 does not do by default. > > Will run that again to make sure. > I ran the rcutorture scenario TREE03 for 4 hours with the above described setup. It was successful and without any lockdep splats. Anna-Maria
Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore
On Sat, 5 May 2018, Thomas Gleixner wrote: > On Fri, 4 May 2018, Paul E. McKenney wrote: > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote: > > > > (Me, I would run rcutorture scenario TREE03 for an extended time period > > > > on b4abf91047cf with your patch applied. > > > > And with lockdep enabled, which TREE03 does not do by default. > > Will run that again to make sure. > I ran the rcutorture scenario TREE03 for 4 hours with the above described setup. It was successful and without any lockdep splats. Anna-Maria
Hashed pointer issues
Hi, I stumbled over an issue with hashed pointers and tracing. I'm using trace points for examination and on error the trace buffers are dumped. The error occurs when entropy has not been set up, so the pointers are not hashed and only (ptrval) is printed instead. The pointers are required to distinguish the different objects in the trace. Beside workarounds like patching lib/vsprintf.c helpers before testing or dumping trace buffers later (given that kernel comes up properly and entropy is set up), is there a possible generic solution for this issue? A commandline option for disabling the pointer obfuscation would be a pretty handy tool. Thanks, Anna-Maria
Hashed pointer issues
Hi, I stumbled over an issue with hashed pointers and tracing. I'm using trace points for examination and on error the trace buffers are dumped. The error occurs when entropy has not been set up, so the pointers are not hashed and only (ptrval) is printed instead. The pointers are required to distinguish the different objects in the trace. Beside workarounds like patching lib/vsprintf.c helpers before testing or dumping trace buffers later (given that kernel comes up properly and entropy is set up), is there a possible generic solution for this issue? A commandline option for disabling the pointer obfuscation would be a pretty handy tool. Thanks, Anna-Maria
[tip:timers/core] hrtimer: Implement SOFT/HARD clock base selection
Commit-ID: 42f42da41b54c191ae6a775e84a86c100d66c5e8 Gitweb: https://git.kernel.org/tip/42f42da41b54c191ae6a775e84a86c100d66c5e8 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:58 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 09:51:22 +0100 hrtimer: Implement SOFT/HARD clock base selection All prerequisites to handle hrtimers for expiry in either hard or soft interrupt context are in place. Add the missing bit in hrtimer_init() which associates the timer to the hard or the softirq clock base. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index d93e3e7..3d20158 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1220,8 +1220,9 @@ static inline int hrtimer_clockid_to_base(clockid_t clock_id) static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode) { + bool softtimer = !!(mode & HRTIMER_MODE_SOFT); + int base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0; struct hrtimer_cpu_base *cpu_base; - int base; memset(timer, 0, sizeof(struct hrtimer)); @@ -1235,7 +1236,8 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, if (clock_id == CLOCK_REALTIME && mode & HRTIMER_MODE_REL) clock_id = CLOCK_MONOTONIC; - base = hrtimer_clockid_to_base(clock_id); + base += hrtimer_clockid_to_base(clock_id); + timer->is_soft = softtimer; timer->base = _base->clock_base[base]; timerqueue_init(>node); } @@ -1244,8 +1246,13 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, * hrtimer_init - initialize a timer to the given clock * @timer: the timer to be initialized * @clock_id: the clock to be used - * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or - * relative (HRTIMER_MODE_REL); pinned is not considered here! + * @mode: The modes which are relevant for intitialization: + * HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT, + * HRTIMER_MODE_REL_SOFT + * + * The PINNED variants of the above can be handed in, + * but the PINNED bit is ignored as pinning happens + * when the hrtimer is started */ void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode)
[tip:timers/core] hrtimer: Implement SOFT/HARD clock base selection
Commit-ID: 42f42da41b54c191ae6a775e84a86c100d66c5e8 Gitweb: https://git.kernel.org/tip/42f42da41b54c191ae6a775e84a86c100d66c5e8 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:58 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 09:51:22 +0100 hrtimer: Implement SOFT/HARD clock base selection All prerequisites to handle hrtimers for expiry in either hard or soft interrupt context are in place. Add the missing bit in hrtimer_init() which associates the timer to the hard or the softirq clock base. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index d93e3e7..3d20158 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1220,8 +1220,9 @@ static inline int hrtimer_clockid_to_base(clockid_t clock_id) static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode) { + bool softtimer = !!(mode & HRTIMER_MODE_SOFT); + int base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0; struct hrtimer_cpu_base *cpu_base; - int base; memset(timer, 0, sizeof(struct hrtimer)); @@ -1235,7 +1236,8 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, if (clock_id == CLOCK_REALTIME && mode & HRTIMER_MODE_REL) clock_id = CLOCK_MONOTONIC; - base = hrtimer_clockid_to_base(clock_id); + base += hrtimer_clockid_to_base(clock_id); + timer->is_soft = softtimer; timer->base = _base->clock_base[base]; timerqueue_init(>node); } @@ -1244,8 +1246,13 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, * hrtimer_init - initialize a timer to the given clock * @timer: the timer to be initialized * @clock_id: the clock to be used - * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or - * relative (HRTIMER_MODE_REL); pinned is not considered here! + * @mode: The modes which are relevant for intitialization: + * HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT, + * HRTIMER_MODE_REL_SOFT + * + * The PINNED variants of the above can be handed in, + * but the PINNED bit is ignored as pinning happens + * when the hrtimer is started */ void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode)
[tip:timers/core] hrtimer: Implement support for softirq based hrtimers
Commit-ID: 5da70160462e80b0ab8a6960cdd0cdd476907523 Gitweb: https://git.kernel.org/tip/5da70160462e80b0ab8a6960cdd0cdd476907523 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:57 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 09:51:22 +0100 hrtimer: Implement support for softirq based hrtimers hrtimer callbacks are always invoked in hard interrupt context. Several users in tree require soft interrupt context for their callbacks and achieve this by combining a hrtimer with a tasklet. The hrtimer schedules the tasklet in hard interrupt context and the tasklet callback gets invoked in softirq context later. That's suboptimal and aside of that the real-time patch moves most of the hrtimers into softirq context. So adding native support for hrtimers expiring in softirq context is a valuable extension for both mainline and the RT patch set. Each valid hrtimer clock id has two associated hrtimer clock bases: one for timers expiring in hardirq context and one for timers expiring in softirq context. Implement the functionality to associate a hrtimer with the hard or softirq related clock bases and update the relevant functions to take them into account when the next expiry time needs to be evaluated. Add a check into the hard interrupt context handler functions to check whether the first expiring softirq based timer has expired. If it's expired the softirq is raised and the accounting of softirq based timers to evaluate the next expiry time for programming the timer hardware is skipped until the softirq processing has finished. At the end of the softirq processing the regular processing is resumed. Suggested-by: Thomas Gleixner <t...@linutronix.de> Suggested-by: Peter Zijlstra <pet...@infradead.org> Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/hrtimer.h | 21 -- kernel/time/hrtimer.c | 196 ++-- 2 files changed, 188 insertions(+), 29 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 26ae8a8..c7902ca 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -103,6 +103,7 @@ enum hrtimer_restart { * @base: pointer to the timer base (per cpu and per clock) * @state: state information (See bit values above) * @is_rel:Set if the timer was armed relative + * @is_soft: Set if hrtimer will be expired in soft interrupt context. * * The hrtimer structure must be initialized by hrtimer_init() */ @@ -113,6 +114,7 @@ struct hrtimer { struct hrtimer_clock_base *base; u8 state; u8 is_rel; + u8 is_soft; }; /** @@ -178,13 +180,18 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang + * @softirq_activated: displays, if the softirq is raised - update of softirq + * related settings is not required then. * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt * @expires_next: absolute time of the next event, is required for remote - * hrtimer enqueue + * hrtimer enqueue; it is the total first expiry time (hard + * and soft hrtimer are taken into account) * @next_timer:Pointer to the first expiring timer + * @softirq_expires_next: Time to check, if soft queues needs also to be expired + * @softirq_next_timer: Pointer to the first expiring softirq based timer * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -196,9 +203,10 @@ struct hrtimer_cpu_base { unsigned intcpu; unsigned intactive_bases; unsigned intclock_was_set_seq; - unsigned inthres_active : 1, - in_hrtirq : 1, - hang_detected : 1; + unsigned inthres_active : 1, + in_hrtirq : 1, +
[tip:timers/core] hrtimer: Implement support for softirq based hrtimers
Commit-ID: 5da70160462e80b0ab8a6960cdd0cdd476907523 Gitweb: https://git.kernel.org/tip/5da70160462e80b0ab8a6960cdd0cdd476907523 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:57 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 09:51:22 +0100 hrtimer: Implement support for softirq based hrtimers hrtimer callbacks are always invoked in hard interrupt context. Several users in tree require soft interrupt context for their callbacks and achieve this by combining a hrtimer with a tasklet. The hrtimer schedules the tasklet in hard interrupt context and the tasklet callback gets invoked in softirq context later. That's suboptimal and aside of that the real-time patch moves most of the hrtimers into softirq context. So adding native support for hrtimers expiring in softirq context is a valuable extension for both mainline and the RT patch set. Each valid hrtimer clock id has two associated hrtimer clock bases: one for timers expiring in hardirq context and one for timers expiring in softirq context. Implement the functionality to associate a hrtimer with the hard or softirq related clock bases and update the relevant functions to take them into account when the next expiry time needs to be evaluated. Add a check into the hard interrupt context handler functions to check whether the first expiring softirq based timer has expired. If it's expired the softirq is raised and the accounting of softirq based timers to evaluate the next expiry time for programming the timer hardware is skipped until the softirq processing has finished. At the end of the softirq processing the regular processing is resumed. Suggested-by: Thomas Gleixner Suggested-by: Peter Zijlstra Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/hrtimer.h | 21 -- kernel/time/hrtimer.c | 196 ++-- 2 files changed, 188 insertions(+), 29 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 26ae8a8..c7902ca 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -103,6 +103,7 @@ enum hrtimer_restart { * @base: pointer to the timer base (per cpu and per clock) * @state: state information (See bit values above) * @is_rel:Set if the timer was armed relative + * @is_soft: Set if hrtimer will be expired in soft interrupt context. * * The hrtimer structure must be initialized by hrtimer_init() */ @@ -113,6 +114,7 @@ struct hrtimer { struct hrtimer_clock_base *base; u8 state; u8 is_rel; + u8 is_soft; }; /** @@ -178,13 +180,18 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang + * @softirq_activated: displays, if the softirq is raised - update of softirq + * related settings is not required then. * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt * @expires_next: absolute time of the next event, is required for remote - * hrtimer enqueue + * hrtimer enqueue; it is the total first expiry time (hard + * and soft hrtimer are taken into account) * @next_timer:Pointer to the first expiring timer + * @softirq_expires_next: Time to check, if soft queues needs also to be expired + * @softirq_next_timer: Pointer to the first expiring softirq based timer * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -196,9 +203,10 @@ struct hrtimer_cpu_base { unsigned intcpu; unsigned intactive_bases; unsigned intclock_was_set_seq; - unsigned inthres_active : 1, - in_hrtirq : 1, - hang_detected : 1; + unsigned inthres_active : 1, + in_hrtirq : 1, + hang_detected : 1, + softirq_activated : 1; #ifdef CONFIG_HIGH_RES_TIMERS unsigned intnr_events; unsigned short nr_retries; @@ -207,6 +215,8 @@ struct
[tip:timers/core] hrtimer: Prepare handling of hard and softirq based hrtimers
Commit-ID: c458b1d102036eaa2c70e03000c959bd491c2037 Gitweb: https://git.kernel.org/tip/c458b1d102036eaa2c70e03000c959bd491c2037 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:56 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 03:01:20 +0100 hrtimer: Prepare handling of hard and softirq based hrtimers The softirq based hrtimer can utilize most of the existing hrtimers functions, but need to operate on a different data set. Add an 'active_mask' parameter to various functions so the hard and soft bases can be selected. Fixup the existing callers and hand in the ACTIVE_HARD mask. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e2353f5..ba4674e 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -60,6 +60,15 @@ #include "tick-internal.h" /* + * Masks for selecting the soft and hard context timers from + * cpu_base->active + */ +#define MASK_SHIFT (HRTIMER_BASE_MONOTONIC_SOFT) +#define HRTIMER_ACTIVE_HARD((1U << MASK_SHIFT) - 1) +#define HRTIMER_ACTIVE_SOFT(HRTIMER_ACTIVE_HARD << MASK_SHIFT) +#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD) + +/* * The timer bases: * * There are more clockids than hrtimer bases. Thus, we index @@ -507,13 +516,24 @@ static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, return expires_next; } -static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +/* + * Recomputes cpu_base::*next_timer and returns the earliest expires_next but + * does not set cpu_base::*expires_next, that is done by hrtimer_reprogram. + * + * @active_mask must be one of: + * - HRTIMER_ACTIVE, + * - HRTIMER_ACTIVE_SOFT, or + * - HRTIMER_ACTIVE_HARD. + */ +static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base, + unsigned int active_mask) { - unsigned int active = cpu_base->active_bases; + unsigned int active; ktime_t expires_next = KTIME_MAX; cpu_base->next_timer = NULL; + active = cpu_base->active_bases & active_mask; expires_next = __hrtimer_next_event_base(cpu_base, active, expires_next); return expires_next; @@ -553,7 +573,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) { ktime_t expires_next; - expires_next = __hrtimer_get_next_event(cpu_base); + expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); if (skip_equal && expires_next == cpu_base->expires_next) return; @@ -1074,7 +1094,7 @@ u64 hrtimer_get_next_event(void) raw_spin_lock_irqsave(_base->lock, flags); if (!__hrtimer_hres_active(cpu_base)) - expires = __hrtimer_get_next_event(cpu_base); + expires = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); raw_spin_unlock_irqrestore(_base->lock, flags); @@ -1248,10 +1268,10 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, } static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now, -unsigned long flags) +unsigned long flags, unsigned int active_mask) { struct hrtimer_clock_base *base; - unsigned int active = cpu_base->active_bases; + unsigned int active = cpu_base->active_bases & active_mask; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *node; @@ -1314,10 +1334,10 @@ retry: */ cpu_base->expires_next = KTIME_MAX; - __hrtimer_run_queues(cpu_base, now, flags); + __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); /* Reevaluate the clock bases for the next expiry */ - expires_next = __hrtimer_get_next_event(cpu_base); + expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); /* * Store the new expiry value so the migration code can verify * against it. @@ -1421,7 +1441,7 @@ void hrtimer_run_queues(void) raw_spin_lock_irqsave(_base->lock, flags); now = hrtimer_update_base(cpu_base); - __hrtimer_run_queues(cpu_base, now, flags); + __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); raw_spin_unlock_irqrestore(_base->lock, flags); }
[tip:timers/core] hrtimer: Prepare handling of hard and softirq based hrtimers
Commit-ID: c458b1d102036eaa2c70e03000c959bd491c2037 Gitweb: https://git.kernel.org/tip/c458b1d102036eaa2c70e03000c959bd491c2037 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:56 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 03:01:20 +0100 hrtimer: Prepare handling of hard and softirq based hrtimers The softirq based hrtimer can utilize most of the existing hrtimers functions, but need to operate on a different data set. Add an 'active_mask' parameter to various functions so the hard and soft bases can be selected. Fixup the existing callers and hand in the ACTIVE_HARD mask. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e2353f5..ba4674e 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -60,6 +60,15 @@ #include "tick-internal.h" /* + * Masks for selecting the soft and hard context timers from + * cpu_base->active + */ +#define MASK_SHIFT (HRTIMER_BASE_MONOTONIC_SOFT) +#define HRTIMER_ACTIVE_HARD((1U << MASK_SHIFT) - 1) +#define HRTIMER_ACTIVE_SOFT(HRTIMER_ACTIVE_HARD << MASK_SHIFT) +#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD) + +/* * The timer bases: * * There are more clockids than hrtimer bases. Thus, we index @@ -507,13 +516,24 @@ static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, return expires_next; } -static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +/* + * Recomputes cpu_base::*next_timer and returns the earliest expires_next but + * does not set cpu_base::*expires_next, that is done by hrtimer_reprogram. + * + * @active_mask must be one of: + * - HRTIMER_ACTIVE, + * - HRTIMER_ACTIVE_SOFT, or + * - HRTIMER_ACTIVE_HARD. + */ +static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base, + unsigned int active_mask) { - unsigned int active = cpu_base->active_bases; + unsigned int active; ktime_t expires_next = KTIME_MAX; cpu_base->next_timer = NULL; + active = cpu_base->active_bases & active_mask; expires_next = __hrtimer_next_event_base(cpu_base, active, expires_next); return expires_next; @@ -553,7 +573,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) { ktime_t expires_next; - expires_next = __hrtimer_get_next_event(cpu_base); + expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); if (skip_equal && expires_next == cpu_base->expires_next) return; @@ -1074,7 +1094,7 @@ u64 hrtimer_get_next_event(void) raw_spin_lock_irqsave(_base->lock, flags); if (!__hrtimer_hres_active(cpu_base)) - expires = __hrtimer_get_next_event(cpu_base); + expires = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); raw_spin_unlock_irqrestore(_base->lock, flags); @@ -1248,10 +1268,10 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, } static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now, -unsigned long flags) +unsigned long flags, unsigned int active_mask) { struct hrtimer_clock_base *base; - unsigned int active = cpu_base->active_bases; + unsigned int active = cpu_base->active_bases & active_mask; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *node; @@ -1314,10 +1334,10 @@ retry: */ cpu_base->expires_next = KTIME_MAX; - __hrtimer_run_queues(cpu_base, now, flags); + __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); /* Reevaluate the clock bases for the next expiry */ - expires_next = __hrtimer_get_next_event(cpu_base); + expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD); /* * Store the new expiry value so the migration code can verify * against it. @@ -1421,7 +1441,7 @@ void hrtimer_run_queues(void) raw_spin_lock_irqsave(_base->lock, flags); now = hrtimer_update_base(cpu_base); - __hrtimer_run_queues(cpu_base, now, flags); + __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); raw_spin_unlock_irqrestore(_base->lock, flags); }
[tip:timers/core] hrtimer: Add clock bases and hrtimer mode for softirq context
Commit-ID: 98ecadd4305d8677ba77162152485798d47dcc85 Gitweb: https://git.kernel.org/tip/98ecadd4305d8677ba77162152485798d47dcc85 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:55 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 03:00:50 +0100 hrtimer: Add clock bases and hrtimer mode for softirq context Currently hrtimer callback functions are always executed in hard interrupt context. Users of hrtimers, which need their timer function to be executed in soft interrupt context, make use of tasklets to get the proper context. Add additional hrtimer clock bases for timers which must expire in softirq context, so the detour via the tasklet can be avoided. This is also required for RT, where the majority of hrtimer is moved into softirq hrtimer context. The selection of the expiry mode happens via a mode bit. Introduce HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED bits and update the decoding of hrtimer_mode in tracepoints. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/hrtimer.h | 14 ++ include/trace/events/timer.h | 6 +- kernel/time/hrtimer.c| 20 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 98ed357..26ae8a8 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -33,14 +33,24 @@ struct hrtimer_cpu_base; * HRTIMER_MODE_REL- Time value is relative to now * HRTIMER_MODE_PINNED - Timer is bound to CPU (is only considered * when starting the timer) + * HRTIMER_MODE_SOFT - Timer callback function will be executed in + * soft irq context */ enum hrtimer_mode { HRTIMER_MODE_ABS= 0x00, HRTIMER_MODE_REL= 0x01, HRTIMER_MODE_PINNED = 0x02, + HRTIMER_MODE_SOFT = 0x04, HRTIMER_MODE_ABS_PINNED = HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED, HRTIMER_MODE_REL_PINNED = HRTIMER_MODE_REL | HRTIMER_MODE_PINNED, + + HRTIMER_MODE_ABS_SOFT = HRTIMER_MODE_ABS | HRTIMER_MODE_SOFT, + HRTIMER_MODE_REL_SOFT = HRTIMER_MODE_REL | HRTIMER_MODE_SOFT, + + HRTIMER_MODE_ABS_PINNED_SOFT = HRTIMER_MODE_ABS_PINNED | HRTIMER_MODE_SOFT, + HRTIMER_MODE_REL_PINNED_SOFT = HRTIMER_MODE_REL_PINNED | HRTIMER_MODE_SOFT, + }; /* @@ -151,6 +161,10 @@ enum hrtimer_base_type { HRTIMER_BASE_REALTIME, HRTIMER_BASE_BOOTTIME, HRTIMER_BASE_TAI, + HRTIMER_BASE_MONOTONIC_SOFT, + HRTIMER_BASE_REALTIME_SOFT, + HRTIMER_BASE_BOOTTIME_SOFT, + HRTIMER_BASE_TAI_SOFT, HRTIMER_MAX_CLOCK_BASES, }; diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 744b431..a57e4ee 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -148,7 +148,11 @@ DEFINE_EVENT(timer_class, timer_cancel, { HRTIMER_MODE_ABS, "ABS" }, \ { HRTIMER_MODE_REL, "REL" }, \ { HRTIMER_MODE_ABS_PINNED, "ABS|PINNED"}, \ - { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}) + { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}, \ + { HRTIMER_MODE_ABS_SOFT,"ABS|SOFT" }, \ + { HRTIMER_MODE_REL_SOFT,"REL|SOFT" }, \ + { HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" },\ + { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" }) /** * hrtimer_init - called when the hrtimer is initialized diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 31ccd86..e2353f5 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -92,6 +92,26 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) = .clockid = CLOCK_TAI, .get_time = _get_clocktai, }, + { + .index = HRTIMER_BASE_MONOTONIC_SOFT, + .clockid = CLOCK_MONOTONIC, + .get_time = _get, + }, + { + .index = HRTIMER_BASE_REALTIME_SOFT, + .clockid = CLOCK_REALTIME, + .get_time = _get_real, +
[tip:timers/core] hrtimer: Add clock bases and hrtimer mode for softirq context
Commit-ID: 98ecadd4305d8677ba77162152485798d47dcc85 Gitweb: https://git.kernel.org/tip/98ecadd4305d8677ba77162152485798d47dcc85 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:55 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 03:00:50 +0100 hrtimer: Add clock bases and hrtimer mode for softirq context Currently hrtimer callback functions are always executed in hard interrupt context. Users of hrtimers, which need their timer function to be executed in soft interrupt context, make use of tasklets to get the proper context. Add additional hrtimer clock bases for timers which must expire in softirq context, so the detour via the tasklet can be avoided. This is also required for RT, where the majority of hrtimer is moved into softirq hrtimer context. The selection of the expiry mode happens via a mode bit. Introduce HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED bits and update the decoding of hrtimer_mode in tracepoints. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/hrtimer.h | 14 ++ include/trace/events/timer.h | 6 +- kernel/time/hrtimer.c| 20 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 98ed357..26ae8a8 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -33,14 +33,24 @@ struct hrtimer_cpu_base; * HRTIMER_MODE_REL- Time value is relative to now * HRTIMER_MODE_PINNED - Timer is bound to CPU (is only considered * when starting the timer) + * HRTIMER_MODE_SOFT - Timer callback function will be executed in + * soft irq context */ enum hrtimer_mode { HRTIMER_MODE_ABS= 0x00, HRTIMER_MODE_REL= 0x01, HRTIMER_MODE_PINNED = 0x02, + HRTIMER_MODE_SOFT = 0x04, HRTIMER_MODE_ABS_PINNED = HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED, HRTIMER_MODE_REL_PINNED = HRTIMER_MODE_REL | HRTIMER_MODE_PINNED, + + HRTIMER_MODE_ABS_SOFT = HRTIMER_MODE_ABS | HRTIMER_MODE_SOFT, + HRTIMER_MODE_REL_SOFT = HRTIMER_MODE_REL | HRTIMER_MODE_SOFT, + + HRTIMER_MODE_ABS_PINNED_SOFT = HRTIMER_MODE_ABS_PINNED | HRTIMER_MODE_SOFT, + HRTIMER_MODE_REL_PINNED_SOFT = HRTIMER_MODE_REL_PINNED | HRTIMER_MODE_SOFT, + }; /* @@ -151,6 +161,10 @@ enum hrtimer_base_type { HRTIMER_BASE_REALTIME, HRTIMER_BASE_BOOTTIME, HRTIMER_BASE_TAI, + HRTIMER_BASE_MONOTONIC_SOFT, + HRTIMER_BASE_REALTIME_SOFT, + HRTIMER_BASE_BOOTTIME_SOFT, + HRTIMER_BASE_TAI_SOFT, HRTIMER_MAX_CLOCK_BASES, }; diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 744b431..a57e4ee 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -148,7 +148,11 @@ DEFINE_EVENT(timer_class, timer_cancel, { HRTIMER_MODE_ABS, "ABS" }, \ { HRTIMER_MODE_REL, "REL" }, \ { HRTIMER_MODE_ABS_PINNED, "ABS|PINNED"}, \ - { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}) + { HRTIMER_MODE_REL_PINNED, "REL|PINNED"}, \ + { HRTIMER_MODE_ABS_SOFT,"ABS|SOFT" }, \ + { HRTIMER_MODE_REL_SOFT,"REL|SOFT" }, \ + { HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" },\ + { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" }) /** * hrtimer_init - called when the hrtimer is initialized diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 31ccd86..e2353f5 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -92,6 +92,26 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) = .clockid = CLOCK_TAI, .get_time = _get_clocktai, }, + { + .index = HRTIMER_BASE_MONOTONIC_SOFT, + .clockid = CLOCK_MONOTONIC, + .get_time = _get, + }, + { + .index = HRTIMER_BASE_REALTIME_SOFT, + .clockid = CLOCK_REALTIME, + .get_time = _get_real, + }, + { + .index = HRTIMER_BASE_BOOTTIME_SOFT, + .clockid = CLOCK_BOOTTIME, + .get_time = _get_boottime, + }, + { +
[tip:timers/core] hrtimer: Use irqsave/irqrestore around __run_hrtimer()
Commit-ID: dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff Gitweb: https://git.kernel.org/tip/dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:54 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 03:00:47 +0100 hrtimer: Use irqsave/irqrestore around __run_hrtimer() __run_hrtimer() is called with the hrtimer_cpu_base.lock held and interrupts disabled. Before invoking the timer callback the base lock is dropped, but interrupts stay disabled. The upcoming support for softirq based hrtimers requires that interrupts are enabled before the timer callback is invoked. To avoid code duplication, take hrtimer_cpu_base.lock with raw_spin_lock_irqsave(flags) at the call site and hand in the flags as a parameter. So raw_spin_unlock_irqrestore() before the callback invocation will either keep interrupts disabled in interrupt context or restore to interrupt enabled state when called from softirq context. Suggested-by: Peter Zijlstra <pet...@infradead.org> Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5d9b81d..31ccd86 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1159,7 +1159,8 @@ EXPORT_SYMBOL_GPL(hrtimer_active); static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, struct hrtimer_clock_base *base, - struct hrtimer *timer, ktime_t *now) + struct hrtimer *timer, ktime_t *now, + unsigned long flags) { enum hrtimer_restart (*fn)(struct hrtimer *); int restart; @@ -1194,11 +1195,11 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, * protected against migration to a different CPU even if the lock * is dropped. */ - raw_spin_unlock(_base->lock); + raw_spin_unlock_irqrestore(_base->lock, flags); trace_hrtimer_expire_entry(timer, now); restart = fn(timer); trace_hrtimer_expire_exit(timer); - raw_spin_lock(_base->lock); + raw_spin_lock_irq(_base->lock); /* * Note: We clear the running state after enqueue_hrtimer and @@ -1226,7 +1227,8 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, base->running = NULL; } -static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now) +static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now, +unsigned long flags) { struct hrtimer_clock_base *base; unsigned int active = cpu_base->active_bases; @@ -1257,7 +1259,7 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now) if (basenow < hrtimer_get_softexpires_tv64(timer)) break; - __run_hrtimer(cpu_base, base, timer, ); + __run_hrtimer(cpu_base, base, timer, , flags); } } } @@ -1272,13 +1274,14 @@ void hrtimer_interrupt(struct clock_event_device *dev) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); ktime_t expires_next, now, entry_time, delta; + unsigned long flags; int retries = 0; BUG_ON(!cpu_base->hres_active); cpu_base->nr_events++; dev->next_event = KTIME_MAX; - raw_spin_lock(_base->lock); + raw_spin_lock_irqsave(_base->lock, flags); entry_time = now = hrtimer_update_base(cpu_base); retry: cpu_base->in_hrtirq = 1; @@ -1291,7 +1294,7 @@ retry: */ cpu_base->expires_next = KTIME_MAX; - __hrtimer_run_queues(cpu_base, now); + __hrtimer_run_queues(cpu_base, now, flags); /* Reevaluate the clock bases for the next expiry */ expires_next = __hrtimer_get_next_event(cpu_base); @@ -1301,7 +1304,7 @@ retry: */ cpu_base->expires_next = expires_next; cpu_base->in_hrtirq = 0; - raw_spin_unlock(_base->lock); + raw_spin_unlock_irqrestore(_base->lock, flags); /* Reprogramming necessary ? */ if (!tick_program_event(expires_next, 0)) { @@ -1322,7 +1325,7 @@ retry: * Acquire base lock for updating the offsets and retrieving * the current time. */ - raw_spin_lock(_base->lock); +
[tip:timers/core] hrtimer: Use irqsave/irqrestore around __run_hrtimer()
Commit-ID: dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff Gitweb: https://git.kernel.org/tip/dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:54 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 03:00:47 +0100 hrtimer: Use irqsave/irqrestore around __run_hrtimer() __run_hrtimer() is called with the hrtimer_cpu_base.lock held and interrupts disabled. Before invoking the timer callback the base lock is dropped, but interrupts stay disabled. The upcoming support for softirq based hrtimers requires that interrupts are enabled before the timer callback is invoked. To avoid code duplication, take hrtimer_cpu_base.lock with raw_spin_lock_irqsave(flags) at the call site and hand in the flags as a parameter. So raw_spin_unlock_irqrestore() before the callback invocation will either keep interrupts disabled in interrupt context or restore to interrupt enabled state when called from softirq context. Suggested-by: Peter Zijlstra Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5d9b81d..31ccd86 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1159,7 +1159,8 @@ EXPORT_SYMBOL_GPL(hrtimer_active); static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, struct hrtimer_clock_base *base, - struct hrtimer *timer, ktime_t *now) + struct hrtimer *timer, ktime_t *now, + unsigned long flags) { enum hrtimer_restart (*fn)(struct hrtimer *); int restart; @@ -1194,11 +1195,11 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, * protected against migration to a different CPU even if the lock * is dropped. */ - raw_spin_unlock(_base->lock); + raw_spin_unlock_irqrestore(_base->lock, flags); trace_hrtimer_expire_entry(timer, now); restart = fn(timer); trace_hrtimer_expire_exit(timer); - raw_spin_lock(_base->lock); + raw_spin_lock_irq(_base->lock); /* * Note: We clear the running state after enqueue_hrtimer and @@ -1226,7 +1227,8 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, base->running = NULL; } -static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now) +static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now, +unsigned long flags) { struct hrtimer_clock_base *base; unsigned int active = cpu_base->active_bases; @@ -1257,7 +1259,7 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now) if (basenow < hrtimer_get_softexpires_tv64(timer)) break; - __run_hrtimer(cpu_base, base, timer, ); + __run_hrtimer(cpu_base, base, timer, , flags); } } } @@ -1272,13 +1274,14 @@ void hrtimer_interrupt(struct clock_event_device *dev) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); ktime_t expires_next, now, entry_time, delta; + unsigned long flags; int retries = 0; BUG_ON(!cpu_base->hres_active); cpu_base->nr_events++; dev->next_event = KTIME_MAX; - raw_spin_lock(_base->lock); + raw_spin_lock_irqsave(_base->lock, flags); entry_time = now = hrtimer_update_base(cpu_base); retry: cpu_base->in_hrtirq = 1; @@ -1291,7 +1294,7 @@ retry: */ cpu_base->expires_next = KTIME_MAX; - __hrtimer_run_queues(cpu_base, now); + __hrtimer_run_queues(cpu_base, now, flags); /* Reevaluate the clock bases for the next expiry */ expires_next = __hrtimer_get_next_event(cpu_base); @@ -1301,7 +1304,7 @@ retry: */ cpu_base->expires_next = expires_next; cpu_base->in_hrtirq = 0; - raw_spin_unlock(_base->lock); + raw_spin_unlock_irqrestore(_base->lock, flags); /* Reprogramming necessary ? */ if (!tick_program_event(expires_next, 0)) { @@ -1322,7 +1325,7 @@ retry: * Acquire base lock for updating the offsets and retrieving * the current time. */ - raw_spin_lock(_base->lock); + raw_spin_lock_irqsave(_base->lock, flags); now = hrtimer_update_base(cpu_base); cpu_base->nr_retries++; if (++retries < 3) @@ -1335,7 +1338,8 @@ retry: */ cpu_base->nr_hangs++;
[tip:timers/core] hrtimer: Factor out __hrtimer_start_range_ns()
Commit-ID: 138a6b7ae4dedde5513678f57b275eee19c41b6a Gitweb: https://git.kernel.org/tip/138a6b7ae4dedde5513678f57b275eee19c41b6a Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:52 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:59 +0100 hrtimer: Factor out __hrtimer_start_range_ns() Preparatory patch for softirq based hrtimers to avoid code duplication, factor out the __hrtimer_start_range_ns() function from hrtimer_start_range_ns(). No functional change. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 33a6c99..4142e6f 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -905,22 +905,11 @@ static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim, return tim; } -/** - * hrtimer_start_range_ns - (re)start an hrtimer - * @timer: the timer to be added - * @tim: expiry time - * @delta_ns: "slack" range for the timer - * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or - * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED) - */ -void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, - u64 delta_ns, const enum hrtimer_mode mode) +static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, + u64 delta_ns, const enum hrtimer_mode mode, + struct hrtimer_clock_base *base) { - struct hrtimer_clock_base *base, *new_base; - unsigned long flags; - int leftmost; - - base = lock_hrtimer_base(timer, ); + struct hrtimer_clock_base *new_base; /* Remove an active timer from the queue: */ remove_hrtimer(timer, base, true); @@ -935,12 +924,27 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, /* Switch the timer base, if necessary: */ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED); - leftmost = enqueue_hrtimer(timer, new_base, mode); - if (!leftmost) - goto unlock; + return enqueue_hrtimer(timer, new_base, mode); +} +/** + * hrtimer_start_range_ns - (re)start an hrtimer + * @timer: the timer to be added + * @tim: expiry time + * @delta_ns: "slack" range for the timer + * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or + * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED) + */ +void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, + u64 delta_ns, const enum hrtimer_mode mode) +{ + struct hrtimer_clock_base *base; + unsigned long flags; + + base = lock_hrtimer_base(timer, ); + + if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base)) + hrtimer_reprogram(timer); - hrtimer_reprogram(timer); -unlock: unlock_hrtimer_base(timer, ); } EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
[tip:timers/core] hrtimer: Factor out __hrtimer_start_range_ns()
Commit-ID: 138a6b7ae4dedde5513678f57b275eee19c41b6a Gitweb: https://git.kernel.org/tip/138a6b7ae4dedde5513678f57b275eee19c41b6a Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:52 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:59 +0100 hrtimer: Factor out __hrtimer_start_range_ns() Preparatory patch for softirq based hrtimers to avoid code duplication, factor out the __hrtimer_start_range_ns() function from hrtimer_start_range_ns(). No functional change. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 33a6c99..4142e6f 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -905,22 +905,11 @@ static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim, return tim; } -/** - * hrtimer_start_range_ns - (re)start an hrtimer - * @timer: the timer to be added - * @tim: expiry time - * @delta_ns: "slack" range for the timer - * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or - * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED) - */ -void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, - u64 delta_ns, const enum hrtimer_mode mode) +static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, + u64 delta_ns, const enum hrtimer_mode mode, + struct hrtimer_clock_base *base) { - struct hrtimer_clock_base *base, *new_base; - unsigned long flags; - int leftmost; - - base = lock_hrtimer_base(timer, ); + struct hrtimer_clock_base *new_base; /* Remove an active timer from the queue: */ remove_hrtimer(timer, base, true); @@ -935,12 +924,27 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, /* Switch the timer base, if necessary: */ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED); - leftmost = enqueue_hrtimer(timer, new_base, mode); - if (!leftmost) - goto unlock; + return enqueue_hrtimer(timer, new_base, mode); +} +/** + * hrtimer_start_range_ns - (re)start an hrtimer + * @timer: the timer to be added + * @tim: expiry time + * @delta_ns: "slack" range for the timer + * @mode: timer mode: absolute (HRTIMER_MODE_ABS) or + * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED) + */ +void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, + u64 delta_ns, const enum hrtimer_mode mode) +{ + struct hrtimer_clock_base *base; + unsigned long flags; + + base = lock_hrtimer_base(timer, ); + + if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base)) + hrtimer_reprogram(timer); - hrtimer_reprogram(timer); -unlock: unlock_hrtimer_base(timer, ); } EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
[tip:timers/core] hrtimer: Factor out __hrtimer_next_event_base()
Commit-ID: ad38f596d8e4babc19be8b21a7a49debffb4a7f5 Gitweb: https://git.kernel.org/tip/ad38f596d8e4babc19be8b21a7a49debffb4a7f5 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:53 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 03:00:43 +0100 hrtimer: Factor out __hrtimer_next_event_base() Preparatory patch for softirq based hrtimers to avoid code duplication. No functional change. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-25-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 4142e6f..5d9b81d 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -458,13 +458,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) #define for_each_active_base(base, cpu_base, active) \ while ((base = __next_base((cpu_base), &(active -static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, +unsigned int active, +ktime_t expires_next) { struct hrtimer_clock_base *base; - unsigned int active = cpu_base->active_bases; - ktime_t expires, expires_next = KTIME_MAX; + ktime_t expires; - cpu_base->next_timer = NULL; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *next; struct hrtimer *timer; @@ -487,6 +487,18 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) return expires_next; } +static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +{ + unsigned int active = cpu_base->active_bases; + ktime_t expires_next = KTIME_MAX; + + cpu_base->next_timer = NULL; + + expires_next = __hrtimer_next_event_base(cpu_base, active, expires_next); + + return expires_next; +} + static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base) { ktime_t *offs_real = >clock_base[HRTIMER_BASE_REALTIME].offset;
[tip:timers/core] hrtimer: Factor out __hrtimer_next_event_base()
Commit-ID: ad38f596d8e4babc19be8b21a7a49debffb4a7f5 Gitweb: https://git.kernel.org/tip/ad38f596d8e4babc19be8b21a7a49debffb4a7f5 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:53 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 03:00:43 +0100 hrtimer: Factor out __hrtimer_next_event_base() Preparatory patch for softirq based hrtimers to avoid code duplication. No functional change. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-25-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 4142e6f..5d9b81d 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -458,13 +458,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) #define for_each_active_base(base, cpu_base, active) \ while ((base = __next_base((cpu_base), &(active -static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, +unsigned int active, +ktime_t expires_next) { struct hrtimer_clock_base *base; - unsigned int active = cpu_base->active_bases; - ktime_t expires, expires_next = KTIME_MAX; + ktime_t expires; - cpu_base->next_timer = NULL; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *next; struct hrtimer *timer; @@ -487,6 +487,18 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) return expires_next; } +static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) +{ + unsigned int active = cpu_base->active_bases; + ktime_t expires_next = KTIME_MAX; + + cpu_base->next_timer = NULL; + + expires_next = __hrtimer_next_event_base(cpu_base, active, expires_next); + + return expires_next; +} + static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base) { ktime_t *offs_real = >clock_base[HRTIMER_BASE_REALTIME].offset;
[tip:timers/core] hrtimer: Remove the 'base' parameter from hrtimer_reprogram()
Commit-ID: 3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d Gitweb: https://git.kernel.org/tip/3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:51 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:59 +0100 hrtimer: Remove the 'base' parameter from hrtimer_reprogram() hrtimer_reprogram() must have access to the hrtimer_clock_base of the new first expiring timer to access hrtimer_clock_base.offset for adjusting the expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the new left most timer in the hrtimer_clock_base is the first expiring timer of all clock bases in a hrtimer_cpu_base. The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has a pointer to hrtimer_clock_base() already and hands it in as a parameter. But hrtimer_start_range_ns() will be split for the upcoming support for softirq based hrtimers to avoid code duplication and will lose the direct access to the clock base pointer. Instead of handing in timer and timer->base as a parameter remove the base parameter from hrtimer_reprogram() instead and retrieve the clock base internally. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index f4a56fb..33a6c99 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -648,10 +648,10 @@ static inline void retrigger_next_event(void *arg) { } * * Called with interrupts disabled and base->cpu_base.lock held */ -static void hrtimer_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) +static void hrtimer_reprogram(struct hrtimer *timer) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); + struct hrtimer_clock_base *base = timer->base; ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset); WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); @@ -939,7 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, if (!leftmost) goto unlock; - hrtimer_reprogram(timer, new_base); + hrtimer_reprogram(timer); unlock: unlock_hrtimer_base(timer, ); }
[tip:timers/core] hrtimer: Remove the 'base' parameter from hrtimer_reprogram()
Commit-ID: 3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d Gitweb: https://git.kernel.org/tip/3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:51 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:59 +0100 hrtimer: Remove the 'base' parameter from hrtimer_reprogram() hrtimer_reprogram() must have access to the hrtimer_clock_base of the new first expiring timer to access hrtimer_clock_base.offset for adjusting the expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the new left most timer in the hrtimer_clock_base is the first expiring timer of all clock bases in a hrtimer_cpu_base. The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has a pointer to hrtimer_clock_base() already and hands it in as a parameter. But hrtimer_start_range_ns() will be split for the upcoming support for softirq based hrtimers to avoid code duplication and will lose the direct access to the clock base pointer. Instead of handing in timer and timer->base as a parameter remove the base parameter from hrtimer_reprogram() instead and retrieve the clock base internally. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index f4a56fb..33a6c99 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -648,10 +648,10 @@ static inline void retrigger_next_event(void *arg) { } * * Called with interrupts disabled and base->cpu_base.lock held */ -static void hrtimer_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) +static void hrtimer_reprogram(struct hrtimer *timer) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); + struct hrtimer_clock_base *base = timer->base; ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset); WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); @@ -939,7 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, if (!leftmost) goto unlock; - hrtimer_reprogram(timer, new_base); + hrtimer_reprogram(timer); unlock: unlock_hrtimer_base(timer, ); }
[tip:timers/core] hrtimer: Make remote enqueue decision less restrictive
Commit-ID: 2ac2dccce9d16a7b1a8fddf69a955d249375bce4 Gitweb: https://git.kernel.org/tip/2ac2dccce9d16a7b1a8fddf69a955d249375bce4 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:50 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Make remote enqueue decision less restrictive The current decision whether a timer can be queued on a remote CPU checks for timer->expiry <= remote_cpu_base.expires_next. This is too restrictive because a timer with the same expiry time as an existing timer will be enqueued on right-hand size of the existing timer inside the rbtree, i.e. behind the first expiring timer. So its safe to allow enqueuing timers with the same expiry time as the first expiring timer on a remote CPU base. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 1c68bf2..f4a56fb 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -168,7 +168,7 @@ hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base) ktime_t expires; expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset); - return expires <= new_base->cpu_base->expires_next; + return expires < new_base->cpu_base->expires_next; } static inline
[tip:timers/core] hrtimer: Make remote enqueue decision less restrictive
Commit-ID: 2ac2dccce9d16a7b1a8fddf69a955d249375bce4 Gitweb: https://git.kernel.org/tip/2ac2dccce9d16a7b1a8fddf69a955d249375bce4 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:50 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Make remote enqueue decision less restrictive The current decision whether a timer can be queued on a remote CPU checks for timer->expiry <= remote_cpu_base.expires_next. This is too restrictive because a timer with the same expiry time as an existing timer will be enqueued on right-hand size of the existing timer inside the rbtree, i.e. behind the first expiring timer. So its safe to allow enqueuing timers with the same expiry time as the first expiring timer on a remote CPU base. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 1c68bf2..f4a56fb 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -168,7 +168,7 @@ hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base) ktime_t expires; expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset); - return expires <= new_base->cpu_base->expires_next; + return expires < new_base->cpu_base->expires_next; } static inline
[tip:timers/core] hrtimer: Unify remote enqueue handling
Commit-ID: 14c803419de6acba08e143d51813ac5e0f3443b8 Gitweb: https://git.kernel.org/tip/14c803419de6acba08e143d51813ac5e0f3443b8 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:49 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Unify remote enqueue handling hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns() when hrtimer_cpu_base.hres_active is true. In the !hres_active case there is a special condition for the nohz_active case: If the newly enqueued timer expires before the first expiring timer on a remote CPU then the remote CPU needs to be notified and woken up from a NOHZ idle sleep to take the new first expiring timer into account. Previous changes have already established the prerequisites to make the remote enqueue behaviour the same whether high resolution mode is active or not: If the to be enqueued timer expires before the first expiring timer on a remote CPU, then it cannot be enqueued there. This was done for the high resolution mode because there is no way to access the remote CPU timer hardware. The same is true for NOHZ, but was handled differently by unconditionally enqueuing the timer and waking up the remote CPU so it can reprogram its timer. Again there is no compelling reason for this difference. hrtimer_check_target(), which makes the 'can remote enqueue' decision is already unconditional, but not yet functional because nothing updates hrtimer_cpu_base.expires_next in the !hres_active case. To unify this the following changes are required: 1) Make the store of the new first expiry time unconditonal in hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding to the actual hardware access. This check also lets the compiler eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n. 2) Invoke hrtimer_reprogram() unconditionally from hrtimer_start_range_ns() 3) Remove the remote wakeup special case for the !high_res && nohz_active case. Confine the timers_nohz_active static key to timer.c which is the only user now. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 18 ++ kernel/time/tick-internal.h | 6 -- kernel/time/timer.c | 9 - 3 files changed, 14 insertions(+), 19 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e6a78ae..1c68bf2 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -685,21 +685,24 @@ static void hrtimer_reprogram(struct hrtimer *timer, /* Update the pointer to the next expiring timer */ cpu_base->next_timer = timer; + cpu_base->expires_next = expires; /* +* If hres is not active, hardware does not have to be +* programmed yet. +* * If a hang was detected in the last timer interrupt then we * do not schedule a timer which is earlier than the expiry * which we enforced in the hang detection. We want the system * to make progress. */ - if (cpu_base->hang_detected) + if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected) return; /* * Program the timer hardware. We enforce the expiry for * events which are already in the past. */ - cpu_base->expires_next = expires; tick_program_event(expires, 1); } @@ -936,16 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, if (!leftmost) goto unlock; - if (!hrtimer_is_hres_active(timer)) { - /* -* Kick to reschedule the next tick to handle the new timer -* on dynticks target. -*/ - if (is_timers_nohz_active()) - wake_up_nohz_cpu(new_base->cpu_base->cpu); - } else { - hrtimer_reprogram(timer, new_base); - } + hrtimer_reprogram(timer, new_base); unlock: unlock_hrtimer_base(timer, ); } diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index f690628..e277284 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -151,18 +151,12 @@ static inline void tick_nohz_init(void) { } #ifdef CONFIG_NO_HZ_COMMON extern unsigned long tick_nohz_active; extern void timers_update_nohz(void); -extern struct static_key_false timers_nohz_active; -static inline b
[tip:timers/core] hrtimer: Unify remote enqueue handling
Commit-ID: 14c803419de6acba08e143d51813ac5e0f3443b8 Gitweb: https://git.kernel.org/tip/14c803419de6acba08e143d51813ac5e0f3443b8 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:49 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Unify remote enqueue handling hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns() when hrtimer_cpu_base.hres_active is true. In the !hres_active case there is a special condition for the nohz_active case: If the newly enqueued timer expires before the first expiring timer on a remote CPU then the remote CPU needs to be notified and woken up from a NOHZ idle sleep to take the new first expiring timer into account. Previous changes have already established the prerequisites to make the remote enqueue behaviour the same whether high resolution mode is active or not: If the to be enqueued timer expires before the first expiring timer on a remote CPU, then it cannot be enqueued there. This was done for the high resolution mode because there is no way to access the remote CPU timer hardware. The same is true for NOHZ, but was handled differently by unconditionally enqueuing the timer and waking up the remote CPU so it can reprogram its timer. Again there is no compelling reason for this difference. hrtimer_check_target(), which makes the 'can remote enqueue' decision is already unconditional, but not yet functional because nothing updates hrtimer_cpu_base.expires_next in the !hres_active case. To unify this the following changes are required: 1) Make the store of the new first expiry time unconditonal in hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding to the actual hardware access. This check also lets the compiler eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n. 2) Invoke hrtimer_reprogram() unconditionally from hrtimer_start_range_ns() 3) Remove the remote wakeup special case for the !high_res && nohz_active case. Confine the timers_nohz_active static key to timer.c which is the only user now. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 18 ++ kernel/time/tick-internal.h | 6 -- kernel/time/timer.c | 9 - 3 files changed, 14 insertions(+), 19 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e6a78ae..1c68bf2 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -685,21 +685,24 @@ static void hrtimer_reprogram(struct hrtimer *timer, /* Update the pointer to the next expiring timer */ cpu_base->next_timer = timer; + cpu_base->expires_next = expires; /* +* If hres is not active, hardware does not have to be +* programmed yet. +* * If a hang was detected in the last timer interrupt then we * do not schedule a timer which is earlier than the expiry * which we enforced in the hang detection. We want the system * to make progress. */ - if (cpu_base->hang_detected) + if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected) return; /* * Program the timer hardware. We enforce the expiry for * events which are already in the past. */ - cpu_base->expires_next = expires; tick_program_event(expires, 1); } @@ -936,16 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, if (!leftmost) goto unlock; - if (!hrtimer_is_hres_active(timer)) { - /* -* Kick to reschedule the next tick to handle the new timer -* on dynticks target. -*/ - if (is_timers_nohz_active()) - wake_up_nohz_cpu(new_base->cpu_base->cpu); - } else { - hrtimer_reprogram(timer, new_base); - } + hrtimer_reprogram(timer, new_base); unlock: unlock_hrtimer_base(timer, ); } diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index f690628..e277284 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -151,18 +151,12 @@ static inline void tick_nohz_init(void) { } #ifdef CONFIG_NO_HZ_COMMON extern unsigned long tick_nohz_active; extern void timers_update_nohz(void); -extern struct static_key_false timers_nohz_active; -static inline bool is_timers_nohz_active(void) -{ - return static_branch_likely(_nohz_active); -} # ifdef CONFIG_SMP extern struct static_key_false timers_migration_enabled; # endif #else /* CONFIG_NO_HZ_COMMON */ static inline void timers_update_nohz(void) { }
[tip:timers/core] hrtimer: Unify hrtimer removal handling
Commit-ID: 61bb4bcb79c7afcd0bf0d20aef4704977172fd60 Gitweb: https://git.kernel.org/tip/61bb4bcb79c7afcd0bf0d20aef4704977172fd60 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:48 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Unify hrtimer removal handling When the first hrtimer on the current CPU is removed, hrtimer_force_reprogram() is invoked but only when CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set. hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be prevented. hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as hrtimer_check_target() is unconditionally available and hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(), hrtimer_force_reprogram() needs to be available unconditionally as well to prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n: - the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is not executed - CPU goes idle (next timer is calculated and hrtimers are taken into account) - a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target() compares expiry value and hrtimer_cpu_base.expires_next. The expiry value is after expires_next, so the hrtimer is enqueued. This timer will fire late, if it expires before the effective first hrtimer on this CPU and the comparison was with an outdated expires_next value. To prevent this scenario, make hrtimer_force_reprogram() unconditional except the effective reprogramming part, which gets eliminated by the compiler in the CONFIG_HIGH_RES_TIMERS=n case. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 2b3222e..e6a78ae 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -521,9 +521,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) { ktime_t expires_next; - if (!__hrtimer_hres_active(cpu_base)) - return; - expires_next = __hrtimer_get_next_event(cpu_base); if (skip_equal && expires_next == cpu_base->expires_next) @@ -532,6 +529,9 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) cpu_base->expires_next = expires_next; /* +* If hres is not active, hardware does not have to be +* reprogrammed yet. +* * If a hang was detected in the last timer interrupt then we * leave the hang delay active in the hardware. We want the * system to make progress. That also prevents the following @@ -545,7 +545,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) * set. So we'd effectivly block all timers until the T2 event * fires. */ - if (cpu_base->hang_detected) + if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected) return; tick_program_event(cpu_base->expires_next, 1); @@ -844,7 +844,6 @@ static void __remove_hrtimer(struct hrtimer *timer, if (!timerqueue_del(>active, >node)) cpu_base->active_bases &= ~(1 << base->index); -#ifdef CONFIG_HIGH_RES_TIMERS /* * Note: If reprogram is false we do not update * cpu_base->next_timer. This happens when we remove the first @@ -855,7 +854,6 @@ static void __remove_hrtimer(struct hrtimer *timer, */ if (reprogram && timer == cpu_base->next_timer) hrtimer_force_reprogram(cpu_base, 1); -#endif } /*
[tip:timers/core] hrtimer: Unify hrtimer removal handling
Commit-ID: 61bb4bcb79c7afcd0bf0d20aef4704977172fd60 Gitweb: https://git.kernel.org/tip/61bb4bcb79c7afcd0bf0d20aef4704977172fd60 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:48 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:58 +0100 hrtimer: Unify hrtimer removal handling When the first hrtimer on the current CPU is removed, hrtimer_force_reprogram() is invoked but only when CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set. hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be prevented. hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as hrtimer_check_target() is unconditionally available and hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(), hrtimer_force_reprogram() needs to be available unconditionally as well to prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n: - the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is not executed - CPU goes idle (next timer is calculated and hrtimers are taken into account) - a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target() compares expiry value and hrtimer_cpu_base.expires_next. The expiry value is after expires_next, so the hrtimer is enqueued. This timer will fire late, if it expires before the effective first hrtimer on this CPU and the comparison was with an outdated expires_next value. To prevent this scenario, make hrtimer_force_reprogram() unconditional except the effective reprogramming part, which gets eliminated by the compiler in the CONFIG_HIGH_RES_TIMERS=n case. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 2b3222e..e6a78ae 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -521,9 +521,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) { ktime_t expires_next; - if (!__hrtimer_hres_active(cpu_base)) - return; - expires_next = __hrtimer_get_next_event(cpu_base); if (skip_equal && expires_next == cpu_base->expires_next) @@ -532,6 +529,9 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) cpu_base->expires_next = expires_next; /* +* If hres is not active, hardware does not have to be +* reprogrammed yet. +* * If a hang was detected in the last timer interrupt then we * leave the hang delay active in the hardware. We want the * system to make progress. That also prevents the following @@ -545,7 +545,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) * set. So we'd effectivly block all timers until the T2 event * fires. */ - if (cpu_base->hang_detected) + if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected) return; tick_program_event(cpu_base->expires_next, 1); @@ -844,7 +844,6 @@ static void __remove_hrtimer(struct hrtimer *timer, if (!timerqueue_del(>active, >node)) cpu_base->active_bases &= ~(1 << base->index); -#ifdef CONFIG_HIGH_RES_TIMERS /* * Note: If reprogram is false we do not update * cpu_base->next_timer. This happens when we remove the first @@ -855,7 +854,6 @@ static void __remove_hrtimer(struct hrtimer *timer, */ if (reprogram && timer == cpu_base->next_timer) hrtimer_force_reprogram(cpu_base, 1); -#endif } /*
[tip:timers/core] hrtimer: Make hrtimer_force_reprogramm() unconditionally available
Commit-ID: ebba2c723f38a766546b2eaf828c522576c791d4 Gitweb: https://git.kernel.org/tip/ebba2c723f38a766546b2eaf828c522576c791d4 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:47 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:53:28 +0100 hrtimer: Make hrtimer_force_reprogramm() unconditionally available hrtimer_force_reprogram() needs to be available unconditionally for softirq based hrtimers. Move the function and all required struct members out of the CONFIG_HIGH_RES_TIMERS #ifdef. There is no functional change because hrtimer_force_reprogram() is only invoked when hrtimer_cpu_base.hres_active is true and CONFIG_HIGH_RES_TIMERS=y. Making it unconditional increases the text size for the CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code for the upcoming softirq based hrtimers support. Most of the code gets eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-ma...@linutronix.de [ Made it build on !CONFIG_HIGH_RES_TIMERS ] Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/hrtimer.c | 60 --- 1 file changed, 28 insertions(+), 32 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 63d804a..2b3222e 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -458,7 +458,6 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) #define for_each_active_base(base, cpu_base, active) \ while ((base = __next_base((cpu_base), &(active -#if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) { struct hrtimer_clock_base *base; @@ -487,7 +486,6 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) expires_next = 0; return expires_next; } -#endif static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base) { @@ -513,34 +511,6 @@ static inline int hrtimer_hres_active(void) return __hrtimer_hres_active(this_cpu_ptr(_bases)); } -/* High resolution timer related functions */ -#ifdef CONFIG_HIGH_RES_TIMERS - -/* - * High resolution timer enabled ? - */ -static bool hrtimer_hres_enabled __read_mostly = true; -unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC; -EXPORT_SYMBOL_GPL(hrtimer_resolution); - -/* - * Enable / Disable high resolution mode - */ -static int __init setup_hrtimer_hres(char *str) -{ - return (kstrtobool(str, _hres_enabled) == 0); -} - -__setup("highres=", setup_hrtimer_hres); - -/* - * hrtimer_high_res_enabled - query, if the highres mode is enabled - */ -static inline int hrtimer_is_hres_enabled(void) -{ - return hrtimer_hres_enabled; -} - /* * Reprogram the event source with checking both queues for the * next event @@ -581,6 +551,34 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) tick_program_event(cpu_base->expires_next, 1); } +/* High resolution timer related functions */ +#ifdef CONFIG_HIGH_RES_TIMERS + +/* + * High resolution timer enabled ? + */ +static bool hrtimer_hres_enabled __read_mostly = true; +unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC; +EXPORT_SYMBOL_GPL(hrtimer_resolution); + +/* + * Enable / Disable high resolution mode + */ +static int __init setup_hrtimer_hres(char *str) +{ + return (kstrtobool(str, _hres_enabled) == 0); +} + +__setup("highres=", setup_hrtimer_hres); + +/* + * hrtimer_high_res_enabled - query, if the highres mode is enabled + */ +static inline int hrtimer_is_hres_enabled(void) +{ + return hrtimer_hres_enabled; +} + /* * Retrigger next event is called after clock was set * @@ -639,8 +637,6 @@ void clock_was_set_delayed(void) static inline int hrtimer_is_hres_enabled(void) { return 0; } static inline void hrtimer_switch_to_hres(void) { } -static inline void -hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { } static inline void retrigger_next_event(void *arg) { } #endif /* CONFIG_HIGH_RES_TIMERS */
[tip:timers/core] hrtimer: Make hrtimer_force_reprogramm() unconditionally available
Commit-ID: ebba2c723f38a766546b2eaf828c522576c791d4 Gitweb: https://git.kernel.org/tip/ebba2c723f38a766546b2eaf828c522576c791d4 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:47 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:53:28 +0100 hrtimer: Make hrtimer_force_reprogramm() unconditionally available hrtimer_force_reprogram() needs to be available unconditionally for softirq based hrtimers. Move the function and all required struct members out of the CONFIG_HIGH_RES_TIMERS #ifdef. There is no functional change because hrtimer_force_reprogram() is only invoked when hrtimer_cpu_base.hres_active is true and CONFIG_HIGH_RES_TIMERS=y. Making it unconditional increases the text size for the CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code for the upcoming softirq based hrtimers support. Most of the code gets eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-ma...@linutronix.de [ Made it build on !CONFIG_HIGH_RES_TIMERS ] Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 60 --- 1 file changed, 28 insertions(+), 32 deletions(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 63d804a..2b3222e 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -458,7 +458,6 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) #define for_each_active_base(base, cpu_base, active) \ while ((base = __next_base((cpu_base), &(active -#if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) { struct hrtimer_clock_base *base; @@ -487,7 +486,6 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) expires_next = 0; return expires_next; } -#endif static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base) { @@ -513,34 +511,6 @@ static inline int hrtimer_hres_active(void) return __hrtimer_hres_active(this_cpu_ptr(_bases)); } -/* High resolution timer related functions */ -#ifdef CONFIG_HIGH_RES_TIMERS - -/* - * High resolution timer enabled ? - */ -static bool hrtimer_hres_enabled __read_mostly = true; -unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC; -EXPORT_SYMBOL_GPL(hrtimer_resolution); - -/* - * Enable / Disable high resolution mode - */ -static int __init setup_hrtimer_hres(char *str) -{ - return (kstrtobool(str, _hres_enabled) == 0); -} - -__setup("highres=", setup_hrtimer_hres); - -/* - * hrtimer_high_res_enabled - query, if the highres mode is enabled - */ -static inline int hrtimer_is_hres_enabled(void) -{ - return hrtimer_hres_enabled; -} - /* * Reprogram the event source with checking both queues for the * next event @@ -581,6 +551,34 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) tick_program_event(cpu_base->expires_next, 1); } +/* High resolution timer related functions */ +#ifdef CONFIG_HIGH_RES_TIMERS + +/* + * High resolution timer enabled ? + */ +static bool hrtimer_hres_enabled __read_mostly = true; +unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC; +EXPORT_SYMBOL_GPL(hrtimer_resolution); + +/* + * Enable / Disable high resolution mode + */ +static int __init setup_hrtimer_hres(char *str) +{ + return (kstrtobool(str, _hres_enabled) == 0); +} + +__setup("highres=", setup_hrtimer_hres); + +/* + * hrtimer_high_res_enabled - query, if the highres mode is enabled + */ +static inline int hrtimer_is_hres_enabled(void) +{ + return hrtimer_hres_enabled; +} + /* * Retrigger next event is called after clock was set * @@ -639,8 +637,6 @@ void clock_was_set_delayed(void) static inline int hrtimer_is_hres_enabled(void) { return 0; } static inline void hrtimer_switch_to_hres(void) { } -static inline void -hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { } static inline void retrigger_next_event(void *arg) { } #endif /* CONFIG_HIGH_RES_TIMERS */
[tip:timers/core] hrtimer: Make hrtimer_reprogramm() unconditional
Commit-ID: 11a9fe069e341ac53bddb8fe1a85ea986cff1a42 Gitweb: https://git.kernel.org/tip/11a9fe069e341ac53bddb8fe1a85ea986cff1a42 Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:46 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make hrtimer_reprogramm() unconditional hrtimer_reprogram() needs to be available unconditionally for softirq based hrtimers. Move the function and all required struct members out of the CONFIG_HIGH_RES_TIMERS #ifdef. There is no functional change because hrtimer_reprogram() is only invoked when hrtimer_cpu_base.hres_active is true. Making it unconditional increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids replication of that code for the upcoming softirq based hrtimers support. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/hrtimer.h | 6 +-- kernel/time/hrtimer.c | 129 +++- 2 files changed, 65 insertions(+), 70 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 2d3e1d6..98ed357 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -182,10 +182,10 @@ struct hrtimer_cpu_base { unsigned intcpu; unsigned intactive_bases; unsigned intclock_was_set_seq; - unsigned inthres_active : 1; -#ifdef CONFIG_HIGH_RES_TIMERS - unsigned intin_hrtirq : 1, + unsigned inthres_active : 1, + in_hrtirq : 1, hang_detected : 1; +#ifdef CONFIG_HIGH_RES_TIMERS unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 26abaa7..63d804a 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -582,68 +582,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) } /* - * When a timer is enqueued and expires earlier than the already enqueued - * timers, we have to check, whether it expires earlier than the timer for - * which the clock event device was armed. - * - * Called with interrupts disabled and base->cpu_base.lock held - */ -static void hrtimer_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) -{ - struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); - ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset); - - WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); - - /* -* If the timer is not on the current cpu, we cannot reprogram -* the other cpus clock event device. -*/ - if (base->cpu_base != cpu_base) - return; - - /* -* If the hrtimer interrupt is running, then it will -* reevaluate the clock bases and reprogram the clock event -* device. The callbacks are always executed in hard interrupt -* context so we don't need an extra check for a running -* callback. -*/ - if (cpu_base->in_hrtirq) - return; - - /* -* CLOCK_REALTIME timer might be requested with an absolute -* expiry time which is less than base->offset. Set it to 0. -*/ - if (expires < 0) - expires = 0; - - if (expires >= cpu_base->expires_next) - return; - - /* Update the pointer to the next expiring timer */ - cpu_base->next_timer = timer; - - /* -* If a hang was detected in the last timer interrupt then we -* do not schedule a timer which is earlier than the expiry -* which we enforced in the hang detection. We want the system -* to make progress. -*/ - if (cpu_base->hang_detected) - return; - - /* -* Program the timer hardware. We enforce the expiry for -* events which are already in the past. -*/ - cpu_base->expires_next = expires; - tick_program_event(expires, 1); -} - -/* * Retrigger next event is called after clock was set * * Called with interrupts disabled via on_each_cpu() @@ -703,16 +641,73 @@ static inline int hrtimer_is_hres_enabled(void) { return 0; } static inline
[tip:timers/core] hrtimer: Make hrtimer_reprogramm() unconditional
Commit-ID: 11a9fe069e341ac53bddb8fe1a85ea986cff1a42 Gitweb: https://git.kernel.org/tip/11a9fe069e341ac53bddb8fe1a85ea986cff1a42 Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:46 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make hrtimer_reprogramm() unconditional hrtimer_reprogram() needs to be available unconditionally for softirq based hrtimers. Move the function and all required struct members out of the CONFIG_HIGH_RES_TIMERS #ifdef. There is no functional change because hrtimer_reprogram() is only invoked when hrtimer_cpu_base.hres_active is true. Making it unconditional increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids replication of that code for the upcoming softirq based hrtimers support. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/hrtimer.h | 6 +-- kernel/time/hrtimer.c | 129 +++- 2 files changed, 65 insertions(+), 70 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 2d3e1d6..98ed357 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -182,10 +182,10 @@ struct hrtimer_cpu_base { unsigned intcpu; unsigned intactive_bases; unsigned intclock_was_set_seq; - unsigned inthres_active : 1; -#ifdef CONFIG_HIGH_RES_TIMERS - unsigned intin_hrtirq : 1, + unsigned inthres_active : 1, + in_hrtirq : 1, hang_detected : 1; +#ifdef CONFIG_HIGH_RES_TIMERS unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 26abaa7..63d804a 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -582,68 +582,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) } /* - * When a timer is enqueued and expires earlier than the already enqueued - * timers, we have to check, whether it expires earlier than the timer for - * which the clock event device was armed. - * - * Called with interrupts disabled and base->cpu_base.lock held - */ -static void hrtimer_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) -{ - struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases); - ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset); - - WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0); - - /* -* If the timer is not on the current cpu, we cannot reprogram -* the other cpus clock event device. -*/ - if (base->cpu_base != cpu_base) - return; - - /* -* If the hrtimer interrupt is running, then it will -* reevaluate the clock bases and reprogram the clock event -* device. The callbacks are always executed in hard interrupt -* context so we don't need an extra check for a running -* callback. -*/ - if (cpu_base->in_hrtirq) - return; - - /* -* CLOCK_REALTIME timer might be requested with an absolute -* expiry time which is less than base->offset. Set it to 0. -*/ - if (expires < 0) - expires = 0; - - if (expires >= cpu_base->expires_next) - return; - - /* Update the pointer to the next expiring timer */ - cpu_base->next_timer = timer; - - /* -* If a hang was detected in the last timer interrupt then we -* do not schedule a timer which is earlier than the expiry -* which we enforced in the hang detection. We want the system -* to make progress. -*/ - if (cpu_base->hang_detected) - return; - - /* -* Program the timer hardware. We enforce the expiry for -* events which are already in the past. -*/ - cpu_base->expires_next = expires; - tick_program_event(expires, 1); -} - -/* * Retrigger next event is called after clock was set * * Called with interrupts disabled via on_each_cpu() @@ -703,16 +641,73 @@ static inline int hrtimer_is_hres_enabled(void) { return 0; } static inline void hrtimer_switch_to_hres(void) { } static inline void hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { } -static inline int hrtimer_reprogram(struct hrtimer *timer, - struct hrtimer_clock_base *base) -{
[tip:timers/core] hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional
Commit-ID: eb27926ba05233dc4f2052cc9d4f19359ec3cd2c Gitweb: https://git.kernel.org/tip/eb27926ba05233dc4f2052cc9d4f19359ec3cd2c Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:45 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer in a CPU base. This pointer cannot be dereferenced and is solely used to check whether a hrtimer which is removed is the hrtimer which is the first to expire in the CPU base. If this is the case, then the timer hardware needs to be reprogrammed to avoid an extra interrupt for nothing. Again, this is conditional functionality, but there is no compelling reason to make this conditional. As a preparation, hrtimer_cpu_base.next_timer needs to be available unconditonally. Aside of that the upcoming support for softirq based hrtimers requires access to this pointer unconditionally as well, so our motivation is not entirely simplicity based. Make the update of hrtimer_cpu_base.next_timer unconditional and remove the #ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is marginal as it's just a store on an already dirtied cacheline. No functional change. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/hrtimer.h | 4 ++-- kernel/time/hrtimer.c | 12 ++-- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index bb7270e..2d3e1d6 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -164,13 +164,13 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang - * @next_timer:Pointer to the first expiring timer * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt * @expires_next: absolute time of the next event, is required for remote * hrtimer enqueue + * @next_timer:Pointer to the first expiring timer * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -186,13 +186,13 @@ struct hrtimer_cpu_base { #ifdef CONFIG_HIGH_RES_TIMERS unsigned intin_hrtirq : 1, hang_detected : 1; - struct hrtimer *next_timer; unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; #endif ktime_t expires_next; + struct hrtimer *next_timer; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; } cacheline_aligned; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index a9ab67f..26abaa7 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -459,21 +459,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) while ((base = __next_base((cpu_base), &(active #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) -static inline void hrtimer_update_next_timer(struct hrtimer_cpu_base *cpu_base, -struct hrtimer *timer) -{ -#ifdef CONFIG_HIGH_RES_TIMERS - cpu_base->next_timer = timer; -#endif -} - static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) { struct hrtimer_clock_base *base; unsigned int active = cpu_base->active_bases; ktime_t expires, expires_next = KTIME_MAX; - hrtimer_update_next_timer(cpu_base, NULL); + cpu_base->next_timer = NULL; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *next; struct hrtimer *timer; @@ -483,7 +475,7 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
[tip:timers/core] hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional
Commit-ID: eb27926ba05233dc4f2052cc9d4f19359ec3cd2c Gitweb: https://git.kernel.org/tip/eb27926ba05233dc4f2052cc9d4f19359ec3cd2c Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:45 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer in a CPU base. This pointer cannot be dereferenced and is solely used to check whether a hrtimer which is removed is the hrtimer which is the first to expire in the CPU base. If this is the case, then the timer hardware needs to be reprogrammed to avoid an extra interrupt for nothing. Again, this is conditional functionality, but there is no compelling reason to make this conditional. As a preparation, hrtimer_cpu_base.next_timer needs to be available unconditonally. Aside of that the upcoming support for softirq based hrtimers requires access to this pointer unconditionally as well, so our motivation is not entirely simplicity based. Make the update of hrtimer_cpu_base.next_timer unconditional and remove the #ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is marginal as it's just a store on an already dirtied cacheline. No functional change. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/hrtimer.h | 4 ++-- kernel/time/hrtimer.c | 12 ++-- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index bb7270e..2d3e1d6 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -164,13 +164,13 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang - * @next_timer:Pointer to the first expiring timer * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt * @expires_next: absolute time of the next event, is required for remote * hrtimer enqueue + * @next_timer:Pointer to the first expiring timer * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -186,13 +186,13 @@ struct hrtimer_cpu_base { #ifdef CONFIG_HIGH_RES_TIMERS unsigned intin_hrtirq : 1, hang_detected : 1; - struct hrtimer *next_timer; unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; #endif ktime_t expires_next; + struct hrtimer *next_timer; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; } cacheline_aligned; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index a9ab67f..26abaa7 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -459,21 +459,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) while ((base = __next_base((cpu_base), &(active #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) -static inline void hrtimer_update_next_timer(struct hrtimer_cpu_base *cpu_base, -struct hrtimer *timer) -{ -#ifdef CONFIG_HIGH_RES_TIMERS - cpu_base->next_timer = timer; -#endif -} - static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) { struct hrtimer_clock_base *base; unsigned int active = cpu_base->active_bases; ktime_t expires, expires_next = KTIME_MAX; - hrtimer_update_next_timer(cpu_base, NULL); + cpu_base->next_timer = NULL; for_each_active_base(base, cpu_base, active) { struct timerqueue_node *next; struct hrtimer *timer; @@ -483,7 +475,7 @@ static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base) expires = ktime_sub(hrtimer_get_expires(timer), base->offset); if (expires < expires_next) { expires_next = expires; - hrtimer_update_next_timer(cpu_base, timer); + cpu_base->next_timer = timer; } } /*
[tip:timers/core] hrtimer: Make the remote enqueue check unconditional
Commit-ID: 07a9a7eae86abb796468b225586086d7c4cb59fc Gitweb: https://git.kernel.org/tip/07a9a7eae86abb796468b225586086d7c4cb59fc Author: Anna-Maria Gleixner <anna-ma...@linutronix.de> AuthorDate: Thu, 21 Dec 2017 11:41:44 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make the remote enqueue check unconditional hrtimer_cpu_base.expires_next is used to cache the next event armed in the timer hardware. The value is used to check whether an hrtimer can be enqueued remotely. If the new hrtimer is expiring before expires_next, then remote enqueue is not possible as the remote hrtimer hardware cannot be accessed for reprogramming to an earlier expiry time. The remote enqueue check is currently conditional on CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no compelling reason to make this conditional. Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y guarded area and remove the conditionals in hrtimer_check_target(). The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the !hrtimer_cpu_base.hres_active case because in these cases nothing updates hrtimer_cpu_base.expires_next yet. This will be changed with later patches which further reduce the #ifdef zoo in this code. Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de> Cc: Christoph Hellwig <h...@lst.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/hrtimer.h | 6 +++--- kernel/time/hrtimer.c | 26 ++ 2 files changed, 9 insertions(+), 23 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 22627b3..bb7270e 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -164,13 +164,13 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang - * @expires_next: absolute time of the next event, is required for remote - * hrtimer enqueue * @next_timer:Pointer to the first expiring timer * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @expires_next: absolute time of the next event, is required for remote + * hrtimer enqueue * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -186,13 +186,13 @@ struct hrtimer_cpu_base { #ifdef CONFIG_HIGH_RES_TIMERS unsigned intin_hrtirq : 1, hang_detected : 1; - ktime_t expires_next; struct hrtimer *next_timer; unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; #endif + ktime_t expires_next; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; } cacheline_aligned; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5a624f9..a9ab67f 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -154,26 +154,21 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, } /* - * With HIGHRES=y we do not migrate the timer when it is expiring - * before the next event on the target cpu because we cannot reprogram - * the target cpu hardware and we would cause it to fire late. + * We do not migrate the timer when it is expiring before the next + * event on the target cpu. When high resolution is enabled, we cannot + * reprogram the target cpu hardware and we would cause it to fire + * late. To keep it simple, we handle the high resolution enabled and + * disabled case similar. * * Called with cpu_base->lock of target cpu held. */ static int hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base) { -#ifdef CONFIG_HIGH_RES_TIMERS ktime_t expires; - if (!new_base->cpu_base->hres_active) - return 0; - expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset); return expires <= new_base->cpu_base->expires_next; -#else - return 0; -#endif } static inli
[tip:timers/core] hrtimer: Make the remote enqueue check unconditional
Commit-ID: 07a9a7eae86abb796468b225586086d7c4cb59fc Gitweb: https://git.kernel.org/tip/07a9a7eae86abb796468b225586086d7c4cb59fc Author: Anna-Maria Gleixner AuthorDate: Thu, 21 Dec 2017 11:41:44 +0100 Committer: Ingo Molnar CommitDate: Tue, 16 Jan 2018 02:35:47 +0100 hrtimer: Make the remote enqueue check unconditional hrtimer_cpu_base.expires_next is used to cache the next event armed in the timer hardware. The value is used to check whether an hrtimer can be enqueued remotely. If the new hrtimer is expiring before expires_next, then remote enqueue is not possible as the remote hrtimer hardware cannot be accessed for reprogramming to an earlier expiry time. The remote enqueue check is currently conditional on CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no compelling reason to make this conditional. Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y guarded area and remove the conditionals in hrtimer_check_target(). The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the !hrtimer_cpu_base.hres_active case because in these cases nothing updates hrtimer_cpu_base.expires_next yet. This will be changed with later patches which further reduce the #ifdef zoo in this code. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keesc...@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-ma...@linutronix.de Signed-off-by: Ingo Molnar --- include/linux/hrtimer.h | 6 +++--- kernel/time/hrtimer.c | 26 ++ 2 files changed, 9 insertions(+), 23 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 22627b3..bb7270e 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -164,13 +164,13 @@ enum hrtimer_base_type { * @hres_active: State of high resolution mode * @in_hrtirq: hrtimer_interrupt() is currently executing * @hang_detected: The last hrtimer interrupt detected a hang - * @expires_next: absolute time of the next event, is required for remote - * hrtimer enqueue * @next_timer:Pointer to the first expiring timer * @nr_events: Total number of hrtimer interrupt events * @nr_retries:Total number of hrtimer interrupt retries * @nr_hangs: Total number of hrtimer interrupt hangs * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @expires_next: absolute time of the next event, is required for remote + * hrtimer enqueue * @clock_base:array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). @@ -186,13 +186,13 @@ struct hrtimer_cpu_base { #ifdef CONFIG_HIGH_RES_TIMERS unsigned intin_hrtirq : 1, hang_detected : 1; - ktime_t expires_next; struct hrtimer *next_timer; unsigned intnr_events; unsigned short nr_retries; unsigned short nr_hangs; unsigned intmax_hang_time; #endif + ktime_t expires_next; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; } cacheline_aligned; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5a624f9..a9ab67f 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -154,26 +154,21 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, } /* - * With HIGHRES=y we do not migrate the timer when it is expiring - * before the next event on the target cpu because we cannot reprogram - * the target cpu hardware and we would cause it to fire late. + * We do not migrate the timer when it is expiring before the next + * event on the target cpu. When high resolution is enabled, we cannot + * reprogram the target cpu hardware and we would cause it to fire + * late. To keep it simple, we handle the high resolution enabled and + * disabled case similar. * * Called with cpu_base->lock of target cpu held. */ static int hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base) { -#ifdef CONFIG_HIGH_RES_TIMERS ktime_t expires; - if (!new_base->cpu_base->hres_active) - return 0; - expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset); return expires <= new_base->cpu_base->expires_next; -#else - return 0; -#endif } static inline @@ -657,14 +652,6 @@ static void hrtimer_reprogram(struct hrtimer *timer, } /* - * Initialize the high resolution related parts of cpu_base - */ -static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) -{ - base->expires_next = KTIME_MAX; -}