[tip:timers/core] itimers: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  c7e6d704a0097e59667495cf52dcc4e1085e620b
Gitweb: https://git.kernel.org/tip/c7e6d704a0097e59667495cf52dcc4e1085e620b
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:51 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 20:51:24 +0200

itimers: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

As a benefit the retry loop gains the missing cpu_relax() on !RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.690771...@linutronix.de


---
 kernel/time/itimer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c
index 02068b2d5862..9d26fd4ba4c0 100644
--- a/kernel/time/itimer.c
+++ b/kernel/time/itimer.c
@@ -213,6 +213,7 @@ again:
/* We are sharing ->siglock with it_real_fn() */
if (hrtimer_try_to_cancel(timer) < 0) {
spin_unlock_irq(>sighand->siglock);
+   hrtimer_cancel_wait_running(timer);
goto again;
}
expires = timeval_to_ktime(value->it_value);


[tip:timers/core] timerfd: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  a125ecc16453a4fe0ba865c7df87b9c722991fdf
Gitweb: https://git.kernel.org/tip/a125ecc16453a4fe0ba865c7df87b9c722991fdf
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:50 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 20:51:23 +0200

timerfd: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.600085...@linutronix.de


---
 fs/timerfd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 6a6fc8aa1de7..48305ba41e3c 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -471,7 +471,11 @@ static int do_timerfd_settime(int ufd, int flags,
break;
}
spin_unlock_irq(>wqh.lock);
-   cpu_relax();
+
+   if (isalarm(ctx))
+   hrtimer_cancel_wait_running(>t.alarm.timer);
+   else
+   hrtimer_cancel_wait_running(>t.tmr);
}
 
/*


[tip:timers/core] alarmtimer: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  51ae33092bb8320497ec75ddc5ab383d8fafd55c
Gitweb: https://git.kernel.org/tip/51ae33092bb8320497ec75ddc5ab383d8fafd55c
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:49 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 20:51:23 +0200

alarmtimer: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.508744...@linutronix.de


---
 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 57518efc3810..36947449dba2 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -432,7 +432,7 @@ int alarm_cancel(struct alarm *alarm)
int ret = alarm_try_to_cancel(alarm);
if (ret >= 0)
return ret;
-   cpu_relax();
+   hrtimer_cancel_wait_running(>timer);
}
 }
 EXPORT_SYMBOL_GPL(alarm_cancel);


[tip:timers/core] timers: Prepare support for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  030dcdd197d77374879bb5603d091eee7d8aba80
Gitweb: https://git.kernel.org/tip/030dcdd197d77374879bb5603d091eee7d8aba80
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 20:51:22 +0200

timers: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling del_timer_sync() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If del_timer_sync() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

This mechanism is not used for timers which are marked IRQSAFE as for those
preemption is disabled accross the callback and therefore this situation
cannot happen. The callbacks for such timers need to be individually
audited for RT compliance.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
del_timer_sync() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant
of del_timer_sync() needs to be used on UP as well.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de



---
 include/linux/timer.h |   2 +-
 kernel/time/timer.c   | 103 ++
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 282e4f2a532a..1e6650ed066d 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer);
 
 extern int try_to_del_timer_sync(struct timer_list *timer);
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
   extern int del_timer_sync(struct timer_list *timer);
 #else
 # define del_timer_sync(t) del_timer(t)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 343c7ba33b1c..673c6a0f0c45 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64);
 struct timer_base {
raw_spinlock_t  lock;
struct timer_list   *running_timer;
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  expiry_lock;
+   atomic_ttimer_waiters;
+#endif
unsigned long   clk;
unsigned long   next_expiry;
unsigned intcpu;
@@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer)
 }
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_PREEMPT_RT
+static __init void timer_base_init_expiry_lock(struct timer_base *base)
+{
+   spin_lock_init(>expiry_lock);
+}
+
+static inline void timer_base_lock_expiry(struct timer_base *base)
+{
+   spin_lock(>expiry_lock);
+}
+
+static inline void timer_base_unlock_expiry(struct timer_base *base)
+{
+   spin_unlock(>expiry_lock);
+}
+
+/*
+ * The counterpart to del_timer_wait_running().
+ *
+ * If there is a waiter for base->expiry_lock, then it was waiting for the
+ * timer callback to finish. Drop expiry_lock and reaquire it. That allows
+ * the waiter to acquire the lock and make progress.
+ */
+static void timer_sync_wait_running(struct timer_base *base)
+{
+   if (atomic_read(>timer_waiters)) {
+   spin_unlock(>expiry_lock);
+   spin_lock(>expiry_lock);
+   }
+}
+
+/*
+ * This function is called on PREEMPT_RT kernels when the fast path
+ * deletion of a timer failed because the timer callback function was
+ * running.
+ *
+ * This prevents priority inversion, if the softirq thread on a remote CPU
+ * got preempted, and it prevents a life lock when the task which tries to
+ * delete a timer preempted the softirq thread running the timer callback
+ * function.
+ */
+static void del_time

[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  f61eff83cec9cfab31fd30a2ca8856be379cdcd5
Gitweb: https://git.kernel.org/tip/f61eff83cec9cfab31fd30a2ca8856be379cdcd5
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 20:51:22 +0200

hrtimer: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de



---
 include/linux/hrtimer.h | 16 +
 kernel/time/hrtimer.c   | 95 +
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 7d0d0a36a8f4..5df4bcff96d5 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -192,6 +192,10 @@ enum  hrtimer_base_type {
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
+ * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are
+ *  expired
+ * @timer_waiters: A hrtimer_cancel() invocation waits for the timer
+ * callback to finish.
  * @expires_next:  absolute time of the next event, is required for remote
  * hrtimer enqueue; it is the total first expiry time (hard
  * and soft hrtimer are taken into account)
@@ -218,6 +222,10 @@ struct hrtimer_cpu_base {
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
+#endif
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  softirq_expiry_lock;
+   atomic_ttimer_waiters;
 #endif
ktime_t expires_next;
struct hrtimer  *next_timer;
@@ -350,6 +358,14 @@ extern void hrtimers_resume(void);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
+#ifdef CONFIG_PREEMPT_RT
+void hrtimer_cancel_wait_running(const struct hrtimer *timer);
+#else
+static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
+{
+   cpu_relax();
+}
+#endif
 
 /* Exported timer functions: */
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index c101f88ae8aa..499122752649 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
 }
 EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel);
 
+#ifdef CONFIG_PREEMPT_RT
+static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base)
+{
+   spin_lock_init(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_lock(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_unlock(>softirq_expiry_lock);
+}
+
+/*
+ * The counterpart to hrtimer_cancel_wait_running().
+ *
+ * If there is a waiter for cpu_base->expiry_lock, then it was waiting for
+ * the timer callback to finish. Drop expiry_lock and reaquire it. That
+ * allows the waiter to acquire the lock and make progress.
+ */
+static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base,
+ unsigned long flags)
+{
+   if (atomi

[tip:timers/core] posix-timers: Cleanup the flag/flags confusion

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  b0ccc6eb0d7e0b7d346b118ccc8b38bf18e39b7f
Gitweb: https://git.kernel.org/tip/b0ccc6eb0d7e0b7d346b118ccc8b38bf18e39b7f
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:52 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:46:42 +0200

posix-timers: Cleanup the flag/flags confusion

do_timer_settime() has a 'flags' argument and uses 'flag' for the interrupt
flags, which is confusing at best.

Rename the argument so 'flags' can be used for interrupt flags as usual.

Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.782664...@linutronix.de

---
 kernel/time/posix-timers.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index d7f2d91acdac..f5aedd2f60df 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -844,13 +844,13 @@ int common_timer_set(struct k_itimer *timr, int flags,
return 0;
 }
 
-static int do_timer_settime(timer_t timer_id, int flags,
+static int do_timer_settime(timer_t timer_id, int tmr_flags,
struct itimerspec64 *new_spec64,
struct itimerspec64 *old_spec64)
 {
const struct k_clock *kc;
struct k_itimer *timr;
-   unsigned long flag;
+   unsigned long flags;
int error = 0;
 
if (!timespec64_valid(_spec64->it_interval) ||
@@ -860,7 +860,7 @@ static int do_timer_settime(timer_t timer_id, int flags,
if (old_spec64)
memset(old_spec64, 0, sizeof(*old_spec64));
 retry:
-   timr = lock_timer(timer_id, );
+   timr = lock_timer(timer_id, );
if (!timr)
return -EINVAL;
 
@@ -868,9 +868,9 @@ retry:
if (WARN_ON_ONCE(!kc || !kc->timer_set))
error = -EINVAL;
else
-   error = kc->timer_set(timr, flags, new_spec64, old_spec64);
+   error = kc->timer_set(timr, tmr_flags, new_spec64, old_spec64);
 
-   unlock_timer(timr, flag);
+   unlock_timer(timr, flags);
if (error == TIMER_RETRY) {
old_spec64 = NULL;  // We already got the old time...
goto retry;


[tip:timers/core] itimers: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  cab46ec655eec1b5dbb0c17a25e19f67c539f00b
Gitweb: https://git.kernel.org/tip/cab46ec655eec1b5dbb0c17a25e19f67c539f00b
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:51 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:46:41 +0200

itimers: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

As a benefit the retry loop gains the missing cpu_relax() on !RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.690771...@linutronix.de

---
 kernel/time/itimer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/itimer.c b/kernel/time/itimer.c
index 02068b2d5862..9d26fd4ba4c0 100644
--- a/kernel/time/itimer.c
+++ b/kernel/time/itimer.c
@@ -213,6 +213,7 @@ again:
/* We are sharing ->siglock with it_real_fn() */
if (hrtimer_try_to_cancel(timer) < 0) {
spin_unlock_irq(>sighand->siglock);
+   hrtimer_cancel_wait_running(timer);
goto again;
}
expires = timeval_to_ktime(value->it_value);


[tip:timers/core] timerfd: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  4da1306fb920a267b5ea21ee15cd771c7bc09cc6
Gitweb: https://git.kernel.org/tip/4da1306fb920a267b5ea21ee15cd771c7bc09cc6
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:50 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:46:41 +0200

timerfd: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.600085...@linutronix.de

---
 fs/timerfd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 6a6fc8aa1de7..48305ba41e3c 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -471,7 +471,11 @@ static int do_timerfd_settime(int ufd, int flags,
break;
}
spin_unlock_irq(>wqh.lock);
-   cpu_relax();
+
+   if (isalarm(ctx))
+   hrtimer_cancel_wait_running(>t.alarm.timer);
+   else
+   hrtimer_cancel_wait_running(>t.tmr);
}
 
/*


[tip:timers/core] alarmtimer: Prepare for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  1f8e8bd8b74c8089a43bc5f1f24e4bf0f855d760
Gitweb: https://git.kernel.org/tip/1f8e8bd8b74c8089a43bc5f1f24e4bf0f855d760
Author: Anna-Maria Gleixner 
AuthorDate: Wed, 31 Jul 2019 00:33:49 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:46:41 +0200

alarmtimer: Prepare for PREEMPT_RT

Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent
priority inversion and live locks on PREEMPT_RT.

[ tglx: Split out of combo patch ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190730223828.508744...@linutronix.de

---
 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 57518efc3810..36947449dba2 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -432,7 +432,7 @@ int alarm_cancel(struct alarm *alarm)
int ret = alarm_try_to_cancel(alarm);
if (ret >= 0)
return ret;
-   cpu_relax();
+   hrtimer_cancel_wait_running(>timer);
}
 }
 EXPORT_SYMBOL_GPL(alarm_cancel);


[tip:timers/core] timers: Prepare support for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  1c2df8ac9292ea1fe6c958c198bf6bc5c768acf5
Gitweb: https://git.kernel.org/tip/1c2df8ac9292ea1fe6c958c198bf6bc5c768acf5
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:43:20 +0200

timers: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling del_timer_sync() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If del_timer_sync() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

This mechanism is not used for timers which are marked IRQSAFE as for those
preemption is disabled accross the callback and therefore this situation
cannot happen. The callbacks for such timers need to be individually
audited for RT compliance.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
del_timer_sync() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant
of del_timer_sync() needs to be used on UP as well.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de


---
 include/linux/timer.h |   2 +-
 kernel/time/timer.c   | 103 ++
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 282e4f2a532a..1e6650ed066d 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer);
 
 extern int try_to_del_timer_sync(struct timer_list *timer);
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
   extern int del_timer_sync(struct timer_list *timer);
 #else
 # define del_timer_sync(t) del_timer(t)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 343c7ba33b1c..673c6a0f0c45 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64);
 struct timer_base {
raw_spinlock_t  lock;
struct timer_list   *running_timer;
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  expiry_lock;
+   atomic_ttimer_waiters;
+#endif
unsigned long   clk;
unsigned long   next_expiry;
unsigned intcpu;
@@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer)
 }
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_PREEMPT_RT
+static __init void timer_base_init_expiry_lock(struct timer_base *base)
+{
+   spin_lock_init(>expiry_lock);
+}
+
+static inline void timer_base_lock_expiry(struct timer_base *base)
+{
+   spin_lock(>expiry_lock);
+}
+
+static inline void timer_base_unlock_expiry(struct timer_base *base)
+{
+   spin_unlock(>expiry_lock);
+}
+
+/*
+ * The counterpart to del_timer_wait_running().
+ *
+ * If there is a waiter for base->expiry_lock, then it was waiting for the
+ * timer callback to finish. Drop expiry_lock and reaquire it. That allows
+ * the waiter to acquire the lock and make progress.
+ */
+static void timer_sync_wait_running(struct timer_base *base)
+{
+   if (atomic_read(>timer_waiters)) {
+   spin_unlock(>expiry_lock);
+   spin_lock(>expiry_lock);
+   }
+}
+
+/*
+ * This function is called on PREEMPT_RT kernels when the fast path
+ * deletion of a timer failed because the timer callback function was
+ * running.
+ *
+ * This prevents priority inversion, if the softirq thread on a remote CPU
+ * got preempted, and it prevents a life lock when the task which tries to
+ * delete a timer preempted the softirq thread running the timer callback
+ * function.
+ */
+static void del_timer_wait_runn

[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT

2019-08-01 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  37226a1807c5f41537190462362e3e2739e22f13
Gitweb: https://git.kernel.org/tip/37226a1807c5f41537190462362e3e2739e22f13
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Aug 2019 17:43:19 +0200

hrtimer: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de


---
 include/linux/hrtimer.h | 16 +
 kernel/time/hrtimer.c   | 95 +
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 7d0d0a36a8f4..5df4bcff96d5 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -192,6 +192,10 @@ enum  hrtimer_base_type {
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
+ * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are
+ *  expired
+ * @timer_waiters: A hrtimer_cancel() invocation waits for the timer
+ * callback to finish.
  * @expires_next:  absolute time of the next event, is required for remote
  * hrtimer enqueue; it is the total first expiry time (hard
  * and soft hrtimer are taken into account)
@@ -218,6 +222,10 @@ struct hrtimer_cpu_base {
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
+#endif
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  softirq_expiry_lock;
+   atomic_ttimer_waiters;
 #endif
ktime_t expires_next;
struct hrtimer  *next_timer;
@@ -350,6 +358,14 @@ extern void hrtimers_resume(void);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
+#ifdef CONFIG_PREEMPT_RT
+void hrtimer_cancel_wait_running(const struct hrtimer *timer);
+#else
+static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
+{
+   cpu_relax();
+}
+#endif
 
 /* Exported timer functions: */
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index c101f88ae8aa..499122752649 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
 }
 EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel);
 
+#ifdef CONFIG_PREEMPT_RT
+static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base)
+{
+   spin_lock_init(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_lock(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_unlock(>softirq_expiry_lock);
+}
+
+/*
+ * The counterpart to hrtimer_cancel_wait_running().
+ *
+ * If there is a waiter for cpu_base->expiry_lock, then it was waiting for
+ * the timer callback to finish. Drop expiry_lock and reaquire it. That
+ * allows the waiter to acquire the lock and make progress.
+ */
+static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base,
+ unsigned long flags)
+{
+   if (atomic_read(_base-&g

[tip:timers/core] timers: Prepare support for PREEMPT_RT

2019-07-30 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  51503dcd6118d627a0c1b5829191d4fa6f16
Gitweb: https://git.kernel.org/tip/51503dcd6118d627a0c1b5829191d4fa6f16
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:31:00 +0200
Committer:  Thomas Gleixner 
CommitDate: Tue, 30 Jul 2019 23:57:57 +0200

timers: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling del_timer_sync() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If del_timer_sync() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

This mechanism is not used for timers which are marked IRQSAFE as for those
preemption is disabled accross the callback and therefore this situation
cannot happen. The callbacks for such timers need to be individually
audited for RT compliance.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
del_timer_sync() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant
of del_timer_sync() needs to be used on UP as well.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.832418...@linutronix.de

---
 include/linux/timer.h |   2 +-
 kernel/time/timer.c   | 103 ++
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 282e4f2a532a..1e6650ed066d 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,7 +183,7 @@ extern void add_timer(struct timer_list *timer);
 
 extern int try_to_del_timer_sync(struct timer_list *timer);
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
   extern int del_timer_sync(struct timer_list *timer);
 #else
 # define del_timer_sync(t) del_timer(t)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 343c7ba33b1c..673c6a0f0c45 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -196,6 +196,10 @@ EXPORT_SYMBOL(jiffies_64);
 struct timer_base {
raw_spinlock_t  lock;
struct timer_list   *running_timer;
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  expiry_lock;
+   atomic_ttimer_waiters;
+#endif
unsigned long   clk;
unsigned long   next_expiry;
unsigned intcpu;
@@ -1227,7 +1231,78 @@ int try_to_del_timer_sync(struct timer_list *timer)
 }
 EXPORT_SYMBOL(try_to_del_timer_sync);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_PREEMPT_RT
+static __init void timer_base_init_expiry_lock(struct timer_base *base)
+{
+   spin_lock_init(>expiry_lock);
+}
+
+static inline void timer_base_lock_expiry(struct timer_base *base)
+{
+   spin_lock(>expiry_lock);
+}
+
+static inline void timer_base_unlock_expiry(struct timer_base *base)
+{
+   spin_unlock(>expiry_lock);
+}
+
+/*
+ * The counterpart to del_timer_wait_running().
+ *
+ * If there is a waiter for base->expiry_lock, then it was waiting for the
+ * timer callback to finish. Drop expiry_lock and reaquire it. That allows
+ * the waiter to acquire the lock and make progress.
+ */
+static void timer_sync_wait_running(struct timer_base *base)
+{
+   if (atomic_read(>timer_waiters)) {
+   spin_unlock(>expiry_lock);
+   spin_lock(>expiry_lock);
+   }
+}
+
+/*
+ * This function is called on PREEMPT_RT kernels when the fast path
+ * deletion of a timer failed because the timer callback function was
+ * running.
+ *
+ * This prevents priority inversion, if the softirq thread on a remote CPU
+ * got preempted, and it prevents a life lock when the task which tries to
+ * delete a timer preempted the softirq thread running the timer callback
+ * function.
+ */
+static void del_timer_wait_runn

[tip:timers/core] hrtimer: Prepare support for PREEMPT_RT

2019-07-30 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  10521d890c650472e494cf415f0fa6c29d4f
Gitweb: https://git.kernel.org/tip/10521d890c650472e494cf415f0fa6c29d4f
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 26 Jul 2019 20:30:59 +0200
Committer:  Thomas Gleixner 
CommitDate: Tue, 30 Jul 2019 23:57:57 +0200

hrtimer: Prepare support for PREEMPT_RT

When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
handler on the same CPU, then spin waiting for the timer handler to
complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20190726185753.737767...@linutronix.de

---
 include/linux/hrtimer.h | 16 +
 kernel/time/hrtimer.c   | 95 +
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 7d0d0a36a8f4..5df4bcff96d5 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -192,6 +192,10 @@ enum  hrtimer_base_type {
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
+ * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are
+ *  expired
+ * @timer_waiters: A hrtimer_cancel() invocation waits for the timer
+ * callback to finish.
  * @expires_next:  absolute time of the next event, is required for remote
  * hrtimer enqueue; it is the total first expiry time (hard
  * and soft hrtimer are taken into account)
@@ -218,6 +222,10 @@ struct hrtimer_cpu_base {
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
+#endif
+#ifdef CONFIG_PREEMPT_RT
+   spinlock_t  softirq_expiry_lock;
+   atomic_ttimer_waiters;
 #endif
ktime_t expires_next;
struct hrtimer  *next_timer;
@@ -350,6 +358,14 @@ extern void hrtimers_resume(void);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
+#ifdef CONFIG_PREEMPT_RT
+void hrtimer_cancel_wait_running(const struct hrtimer *timer);
+#else
+static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
+{
+   cpu_relax();
+}
+#endif
 
 /* Exported timer functions: */
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index c101f88ae8aa..499122752649 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1162,6 +1162,82 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
 }
 EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel);
 
+#ifdef CONFIG_PREEMPT_RT
+static void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base)
+{
+   spin_lock_init(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_lock(>softirq_expiry_lock);
+}
+
+static void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base)
+{
+   spin_unlock(>softirq_expiry_lock);
+}
+
+/*
+ * The counterpart to hrtimer_cancel_wait_running().
+ *
+ * If there is a waiter for cpu_base->expiry_lock, then it was waiting for
+ * the timer callback to finish. Drop expiry_lock and reaquire it. That
+ * allows the waiter to acquire the lock and make progress.
+ */
+static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base,
+ unsigned long flags)
+{
+   if (atomic_read(_base-&g

Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)

2019-06-11 Thread Anna-Maria Gleixner
On Fri, 31 May 2019, Anna-Maria Gleixner wrote:

[...]
> I will think about the problem and your solution a little bit more and
> give you feedback hopefully on monday.

I'm sorry for the delay. But now I'm able to give you a detailed feedback:

The general problem is, that your solution is customized to a single
use-case: preventing softirq raise but only if there is _no_ timer
pending. To reach this goal without using locks, overhead is added to the
formerly optimized add/mod path of a timer. With your code the timer
softirq is raised even when there is a pending timer which does not has to
be expired right now. But there have been requests in the past for this use
case already.

I discussed with Thomas several approaches during the last week how to
solve the unconditional softirq timer raise in a more general way without
loosing the fast add/mod path of a timer. The approach which seems to be
the best has a dependency on a timer code change from a push to pull model
which is still under developement (see v2 patchset:
http://lkml.kernel.org/r/2017041802.490432...@linutronix.de). The
patchset v2 has several problems but we are working on a solution for those
problems right now. When the timer pull model is in place the approach to
solve the unconditional timer softirq raise could look like the following:

---8<---
The next_expiry value of timer_base struct is used to store the next expiry
value even if timer_base is not idle. Therefore it is udpated after adding
or modifying a timer and also at the end of timer softirq. In case timer
softirq does not has to be raised, the timer_base->clk is incremented to
prevent stale clocks. Checking whether timer softirq has to be raised
cannot be done lockless.

This code is not compile tested nor boot tested.

---
 kernel/time/timer.c |   60 +++-
 1 file changed, 36 insertions(+), 24 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -552,37 +552,32 @@ static void
 static void
 trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
 {
-   if (!is_timers_nohz_active())
-   return;
-
-   /*
-* TODO: This wants some optimizing similar to the code below, but we
-* will do that when we switch from push to pull for deferrable timers.
-*/
-   if (timer->flags & TIMER_DEFERRABLE) {
-   if (tick_nohz_full_cpu(base->cpu))
-   wake_up_nohz_cpu(base->cpu);
-   return;
+   if (is_timers_nohz_active()) {
+   /*
+* TODO: This wants some optimizing similar to the code
+* below, but we will do that when we switch from push to
+* pull for deferrable timers.
+*/
+   if (timer->flags & TIMER_DEFERRABLE) {
+   if (tick_nohz_full_cpu(base->cpu))
+   wake_up_nohz_cpu(base->cpu);
+   return;
+   }
}
 
-   /*
-* We might have to IPI the remote CPU if the base is idle and the
-* timer is not deferrable. If the other CPU is on the way to idle
-* then it can't set base->is_idle as we hold the base lock:
-*/
-   if (!base->is_idle)
-   return;
-
/* Check whether this is the new first expiring timer: */
if (time_after_eq(timer->expires, base->next_expiry))
return;
+   /* Update next expiry time */
+   base->next_expiry = timer->expires;
 
/*
-* Set the next expiry time and kick the CPU so it can reevaluate the
-* wheel:
+* We might have to IPI the remote CPU if the base is idle and the
+* timer is not deferrable. If the other CPU is on the way to idle
+* then it can't set base->is_idle as we hold the base lock:
 */
-   base->next_expiry = timer->expires;
-   wake_up_nohz_cpu(base->cpu);
+   if (is_timers_nohz_active() && base->is_idle)
+   wake_up_nohz_cpu(base->cpu);
 }
 
 static void
@@ -1684,6 +1679,7 @@ static inline void __run_timers(struct t
while (levels--)
expire_timers(base, heads + levels);
}
+   base->next_expiry = __next_timer_interrupt(base);
base->running_timer = NULL;
raw_spin_unlock_irq(>lock);
 }
@@ -1716,8 +1712,24 @@ void run_local_timers(void)
base++;
if (time_before(jiffies, base->clk))
return;
+   base--;
+   }
+
+   /*
+* check for next expiry
+*
+* deferrable base is igonred here - it is only usable when
+* switching from push to pull model for deferrable timers
+*/
+   raw_spin_lock_irq(>lock);
+   if (base->clk == base->next_expiry) {
+   raw_spin

Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)

2019-05-31 Thread Anna-Maria Gleixner
On Thu, 30 May 2019, Marcelo Tosatti wrote:

> On Wed, May 29, 2019 at 04:53:26PM +0200, Anna-Maria Gleixner wrote:
> > On Mon, 15 Apr 2019, Marcelo Tosatti wrote:
> > 
> > > --- linux-rt-devel.orig/kernel/time/timer.c   2019-04-15 
> > > 14:21:02.788704354 -0300
> > > +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:22:56.755047354 
> > > -0300
> > > @@ -1776,6 +1776,24 @@
> > >   if (time_before(jiffies, base->clk))
> > >   return;
> > >   }
> > > +
> > > +#ifdef CONFIG_PREEMPT_RT_FULL
> > > +/* On RT, irq work runs from softirq */
> > > + if (irq_work_needs_cpu())
> > > + goto raise;
> > 
> > So with this patch and the change you made in the patch before, timers on
> > RT are expired only when there is pending irq work or after modifying a
> > timer on a non housekeeping cpu?
> 
> Well, run_timer_softirq execute only if pending_map contains a bit set.
> 
> > With your patches I could create the following problematic situation on RT
> > (if I understood everything properly): I add a timer which should expire in
> > 50 jiffies to the wheel of a non housekeeping cpu. So it ends up 50 buckets
> > away form now in the first wheel. This timer is the only timer in the wheel
> > and the next timer softirq raise is required in 50 jiffies. After adding
> > the timer, the timer interrupt is raised, and no timer has to be expired,
> > because there is no timer pending.
> 
> But the softirq will be raised, because pending_map will be set:
> 
> +   if (!bitmap_empty(base->pending_map, WHEEL_SIZE))
> +   goto raise;
> 
> No?

I'm sorry! I read the #endif of the CONFIG_PREEMPT_RT_FULL section as an
#else... This is where my confusion comes from. I will think about the
problem and your solution a little bit more and give you feedback hopefully
on monday.

Thanks,
Anna-Maria



Re: [patch 2/3] timers: do not raise softirq unconditionally (spinlockless version)

2019-05-29 Thread Anna-Maria Gleixner
On Mon, 15 Apr 2019, Marcelo Tosatti wrote:

> Check base->pending_map locklessly and skip raising timer softirq 
> if empty.
> 
> What allows the lockless (and potentially racy against mod_timer) 
> check is that mod_timer will raise another timer softirq after
> modifying base->pending_map.

The raise of the timer softirq after adding the timer is done
unconditionally - so there are timer softirqs raised which are not required
at all, as mentioned before.

This check is for !CONFIG_PREEMPT_RT_FULL only implemented. The commit
message totally igonres that you are implementing something
CONFIG_PREEMPT_RT_FULL dependent as well.

> Signed-off-by: Marcelo Tosatti 
> 
> ---
>  kernel/time/timer.c |   18 ++
>  1 file changed, 18 insertions(+)
> 
> Index: linux-rt-devel/kernel/time/timer.c
> ===
> --- linux-rt-devel.orig/kernel/time/timer.c   2019-04-15 14:21:02.788704354 
> -0300
> +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:22:56.755047354 
> -0300
> @@ -1776,6 +1776,24 @@
>   if (time_before(jiffies, base->clk))
>   return;
>   }
> +
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +/* On RT, irq work runs from softirq */
> + if (irq_work_needs_cpu())
> + goto raise;

So with this patch and the change you made in the patch before, timers on
RT are expired only when there is pending irq work or after modifying a
timer on a non housekeeping cpu?

With your patches I could create the following problematic situation on RT
(if I understood everything properly): I add a timer which should expire in
50 jiffies to the wheel of a non housekeeping cpu. So it ends up 50 buckets
away form now in the first wheel. This timer is the only timer in the wheel
and the next timer softirq raise is required in 50 jiffies. After adding
the timer, the timer interrupt is raised, and no timer has to be expired,
because there is no timer pending. If there is no irq work required during
the next 51 jiffies and also no timer changed, the timer I added, will not
expire in time. The timer_base will come out of idle but will not forward
the base clk. This makes it even worse: When then adding a timer, the timer
base is forwarded - but without checking for the next pending timer, so the
first added timer will be delayed even more.

So your implementation lacks forwarding the timer_base->clk when timer_base
comes out of idle with respect to the next pending timer.


> +#endif
> + base = this_cpu_ptr(_bases[BASE_STD]);
> + if (!housekeeping_cpu(base->cpu, HK_FLAG_TIMER)) {
> + if (!bitmap_empty(base->pending_map, WHEEL_SIZE))
> + goto raise;
> + base++;
> + if (!bitmap_empty(base->pending_map, WHEEL_SIZE))
> + goto raise;
> +
> + return;
> + }
> +
> +raise:
>   raise_softirq(TIMER_SOFTIRQ);
>  }
>  
>

Thanks,

Anna-Maria



Re: [patch 1/3] timers: raise timer softirq on __mod_timer/add_timer_on

2019-05-29 Thread Anna-Maria Gleixner
On Mon, 15 Apr 2019, Marcelo Tosatti wrote:

[...]

> The patch "timers: do not raise softirq unconditionally" from Thomas
> attempts to address that by checking, in the sched tick, whether its
> necessary to raise the timer softirq. Unfortunately, it attempts to grab
> the tvec base spinlock which generates the issue described in the patch
> "Revert "timers: do not raise softirq unconditionally"".

Both patches are not available in the version your patch set is based
on. Better pointers would be helpful.

> tvec_base->lock protects addition of timers to the wheel versus
> timer interrupt execution.

The timer_base->lock (formally known as tvec_base->lock), synchronizes all
accesses to timer_base and not only addition of timers versus timer
interrupt execution. Deletion of timers, getting the next timer interrupt,
forwarding the base clock and migration of timers are protected as well by
timer_base->lock.

> This patch does not grab the tvec base spinlock from irq context,
> but rather performs a lockless access to base->pending_map.

I cannot see where this patch performs a lockless access to
timer_base->pending_map.

> It handles the the race between timer addition and timer interrupt
> execution by unconditionally (in case of isolated CPUs) raising the
> timer softirq after making sure the updated bitmap is visible 
> on remote CPUs.

So after modifying a timer on a non housekeeping timer base, the timer
softirq is raised - even if there is no pending timer in the next
bucket. Only with this patch, this shouldn't be a problem - but it is an
additional raise of timer softirq and an overhead when adding a timer,
because the normal timer softirq is raised from sched tick anyway.

> Signed-off-by: Marcelo Tosatti 
> 
> ---
>  kernel/time/timer.c |   38 ++
>  1 file changed, 38 insertions(+)
> 
> Index: linux-rt-devel/kernel/time/timer.c
> ===
> --- linux-rt-devel.orig/kernel/time/timer.c   2019-04-15 13:56:06.974210992 
> -0300
> +++ linux-rt-devel/kernel/time/timer.c2019-04-15 14:21:02.788704354 
> -0300
> @@ -1056,6 +1063,17 @@
>   internal_add_timer(base, timer);
>   }
>  
> + if (!housekeeping_cpu(base->cpu, HK_FLAG_TIMER) &&
> + !(timer->flags & TIMER_DEFERRABLE)) {
> + call_single_data_t *c;
> +
> + c = per_cpu_ptr(_timer_csd, base->cpu);
> +
> + /* Make sure bitmap updates are visible on remote CPUs */
> + smp_wmb();
> + smp_call_function_single_async(base->cpu, c);
> + }
> +
>  out_unlock:
>   raw_spin_unlock_irqrestore(>lock, flags);
>

Could you please explain me, why you decided to use the above
implementation for raising the timer softirq after modifying a timer?

Thanks,

Anna-Maria



Re: [patch 0/3] do not raise timer softirq unconditionally (spinlockless version)

2019-05-29 Thread Anna-Maria Gleixner
Hi,

I had a look at the queue and have several questions about your
implementation. 

First of all, I had some troubles to understand your commit messages. So I
first had to read the code and then tried to understand the commit
messages. It is easier, if it works the other way round.

On Mon, 15 Apr 2019, Marcelo Tosatti wrote:

> For isolated CPUs, we'd like to skip awakening ktimersoftd
> (the switch to and then back from ktimersoftd takes 10us in
> virtualized environments, in addition to other OS overhead,
> which exceeds telco requirements for packet forwarding for
> 5G) from the sched tick.

You would like to prevent raising the timer softirq in general from the
sched tick for isolated CPUs? Or you would like to prevent raising the
timer softirq if no pending timer is available?

Nevertheless, this change is not PREEMPT_RT specific. It is a NOHZ
dependand change. So it would be nice, if the queue is against
mainline. But please correct me, if I'm wrong.

[...]

> This patchset reduces cyclictest latency from 25us to 14us
> on my testbox. 
> 

A lot of information is missing: How does your environment looks like for
this test, what is your workload,...?

Did you also run other tests?

Thanks,

Anna-Maria


[PATCH v3] hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2019-03-25 Thread Anna-Maria Gleixner
From: Sebastian Andrzej Siewior 

hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.

Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.

Link: http://lkml.kernel.org/r/20180703092541.2870-1-anna-ma...@linutronix.de
[anna-maria: Updating the commit message]
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Anna-Maria Gleixner 
---
v2..v3: Update to current version

v1..v2: Fix missing call site in drivers/staging/android/vsoc.c

 block/blk-mq.c |  3 +--
 drivers/staging/android/vsoc.c |  6 ++---
 include/linux/hrtimer.h| 19 +++---
 include/linux/wait.h   |  4 +--
 kernel/futex.c | 19 ++
 kernel/time/hrtimer.c  | 46 ++
 net/core/pktgen.c  |  4 +--
 7 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 70b210a308c4..f378e2f8ec2c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3365,10 +3365,9 @@ static bool blk_mq_poll_hybrid_sleep(struct 
request_queue *q,
kt = nsecs;
 
mode = HRTIMER_MODE_REL;
-   hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode);
+   hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current);
hrtimer_set_expires(, kt);
 
-   hrtimer_init_sleeper(, current);
do {
if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
break;
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 8a75bd27c413..27daa8ae56a4 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -436,12 +436,10 @@ static int handle_vsoc_cond_wait(struct file *filp, 
struct vsoc_cond_wait *arg)
return -EINVAL;
wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec);
 
-   hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS);
+   hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS, current);
hrtimer_set_expires_range_ns(>timer, wake_time,
 current->timer_slack_ns);
-
-   hrtimer_init_sleeper(to, current);
}
 
while (1) {
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 2e8957eac4d4..f669dc5b63e7 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -361,10 +361,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t 
clock_id,
+enum hrtimer_mode mode,
+struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
  enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+ clockid_t clock_id,
+ enum hrtimer_mode mode,
+ struct task_struct *task);
 
 extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
 #else
@@ -374,6 +381,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer 
*timer,
 {
hrtimer_init(timer, which_clock, mode);
 }
+
+static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+   clockid_t clock_id,
+   enum hrtimer_mode mode,
+   struct task_struct *task)
+{
+   hrtimer_init_sleeper(sl, clock_id, mode, task);
+}
+
 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
 #endif
 
@@ -477,9 +493,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
  const enum hrtimer_mode mode,
  const clockid_t clockid);
 
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
-struct task_struct *tsk);
-
 extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode);
 extern int schedule_hrtimeout_range_clock(ktime_t *expires,
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 5f3efabc36f4..671e8ceaac15 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -488

[tip:timers/core] timer/trace: Improve timer tracing

2019-03-24 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  f28d3d5346e97e60c81f933ac89ccf015430e5cf
Gitweb: https://git.kernel.org/tip/f28d3d5346e97e60c81f933ac89ccf015430e5cf
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Mar 2019 13:09:21 +0100
Committer:  Thomas Gleixner 
CommitDate: Sun, 24 Mar 2019 20:29:33 +0100

timer/trace: Improve timer tracing

Timers are added to the timer wheel off by one. This is required in
case a timer is queued directly before incrementing jiffies to prevent
early timer expiry.

When reading a timer trace and relying only on the expiry time of the timer
in the timer_start trace point and on the now in the timer_expiry_entry
trace point, it seems that the timer fires late. With the current
timer_expiry_entry trace point information only now=jiffies is printed but
not the value of base->clk. This makes it impossible to draw a conclusion
to the index of base->clk and makes it impossible to examine timer problems
without additional trace points.

Therefore add the base->clk value to the timer_expire_entry trace
point, to be able to calculate the index the timer base is located at
during collecting expired timers.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Cc: fweis...@gmail.com
Cc: pet...@infradead.org
Cc: Steven Rostedt 
Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-ma...@linutronix.de

---
 include/trace/events/timer.h | 11 +++
 kernel/time/timer.c  | 17 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index da975d69c453..b7a904825e7d 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -89,24 +89,27 @@ TRACE_EVENT(timer_start,
  */
 TRACE_EVENT(timer_expire_entry,
 
-   TP_PROTO(struct timer_list *timer),
+   TP_PROTO(struct timer_list *timer, unsigned long baseclk),
 
-   TP_ARGS(timer),
+   TP_ARGS(timer, baseclk),
 
TP_STRUCT__entry(
__field( void *,timer   )
__field( unsigned long, now )
__field( void *,function)
+   __field( unsigned long, baseclk )
),
 
TP_fast_assign(
__entry->timer  = timer;
__entry->now= jiffies;
__entry->function   = timer->function;
+   __entry->baseclk= baseclk;
),
 
-   TP_printk("timer=%p function=%ps now=%lu",
- __entry->timer, __entry->function, __entry->now)
+   TP_printk("timer=%p function=%ps now=%lu baseclk=%lu",
+ __entry->timer, __entry->function, __entry->now,
+ __entry->baseclk)
 );
 
 /**
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8d7918ae4d0c..a9b1bbc2d88d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1293,7 +1293,9 @@ int del_timer_sync(struct timer_list *timer)
 EXPORT_SYMBOL(del_timer_sync);
 #endif
 
-static void call_timer_fn(struct timer_list *timer, void (*fn)(struct 
timer_list *))
+static void call_timer_fn(struct timer_list *timer,
+ void (*fn)(struct timer_list *),
+ unsigned long baseclk)
 {
int count = preempt_count();
 
@@ -1316,7 +1318,7 @@ static void call_timer_fn(struct timer_list *timer, void 
(*fn)(struct timer_list
 */
lock_map_acquire(_map);
 
-   trace_timer_expire_entry(timer);
+   trace_timer_expire_entry(timer, baseclk);
fn(timer);
trace_timer_expire_exit(timer);
 
@@ -1337,6 +1339,13 @@ static void call_timer_fn(struct timer_list *timer, void 
(*fn)(struct timer_list
 
 static void expire_timers(struct timer_base *base, struct hlist_head *head)
 {
+   /*
+* This value is required only for tracing. base->clk was
+* incremented directly before expire_timers was called. But expiry
+* is related to the old base->clk value.
+*/
+   unsigned long baseclk = base->clk - 1;
+
while (!hlist_empty(head)) {
struct timer_list *timer;
void (*fn)(struct timer_list *);
@@ -1350,11 +1359,11 @@ static void expire_timers(struct timer_base *base, 
struct hlist_head *head)
 
if (timer->flags & TIMER_IRQSAFE) {
raw_spin_unlock(>lock);
-   call_timer_fn(timer, fn);
+   call_timer_fn(timer, fn, baseclk);
raw_spin_lock(>lock);
} else {
raw_spin_unlock_irq(>lock);
-   call_timer_fn(timer, fn);
+   call_timer_fn(timer, fn, baseclk);
raw_spin_lock_irq(>lock);
}
}


[tip:timers/core] timer: Move trace point to get proper index

2019-03-24 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  dc1e7dc5ac6254ba0502323381a7ec847e408f1d
Gitweb: https://git.kernel.org/tip/dc1e7dc5ac6254ba0502323381a7ec847e408f1d
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Mar 2019 13:09:19 +0100
Committer:  Thomas Gleixner 
CommitDate: Sun, 24 Mar 2019 20:29:32 +0100

timer: Move trace point to get proper index

When placing the timer_start trace point before the timer wheel bucket
index is calculated, the index information in the trace point is useless.

It is not possible to simply move the debug_activate() call after the index
calculation, because debug_object_activate() needs to be called before
touching the object.

Therefore split debug_activate() and move the trace point into
enqueue_timer() after the new index has been calculated. The
debug_object_activate() call remains at the original place.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Cc: fweis...@gmail.com
Cc: pet...@infradead.org
Cc: Steven Rostedt 
Link: https://lkml.kernel.org/r/20190321120921.16463-3-anna-ma...@linutronix.de

---
 kernel/time/timer.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2fce056f8a49..8d7918ae4d0c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -536,6 +536,8 @@ static void enqueue_timer(struct timer_base *base, struct 
timer_list *timer,
hlist_add_head(>entry, base->vectors + idx);
__set_bit(idx, base->pending_map);
timer_set_idx(timer, idx);
+
+   trace_timer_start(timer, timer->expires, timer->flags);
 }
 
 static void
@@ -757,13 +759,6 @@ static inline void debug_init(struct timer_list *timer)
trace_timer_init(timer);
 }
 
-static inline void
-debug_activate(struct timer_list *timer, unsigned long expires)
-{
-   debug_timer_activate(timer);
-   trace_timer_start(timer, expires, timer->flags);
-}
-
 static inline void debug_deactivate(struct timer_list *timer)
 {
debug_timer_deactivate(timer);
@@ -1037,7 +1032,7 @@ __mod_timer(struct timer_list *timer, unsigned long 
expires, unsigned int option
}
}
 
-   debug_activate(timer, expires);
+   debug_timer_activate(timer);
 
timer->expires = expires;
/*
@@ -1171,7 +1166,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
}
forward_timer_base(base);
 
-   debug_activate(timer, timer->expires);
+   debug_timer_activate(timer);
internal_add_timer(base, timer);
raw_spin_unlock_irqrestore(>lock, flags);
 }


[tip:timers/core] timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps

2019-03-24 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  6849cbb0f9a8dbc1ba56e9abc6955613103e01e3
Gitweb: https://git.kernel.org/tip/6849cbb0f9a8dbc1ba56e9abc6955613103e01e3
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Mar 2019 13:09:20 +0100
Committer:  Thomas Gleixner 
CommitDate: Sun, 24 Mar 2019 20:29:33 +0100

timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps

Since commit 04b8eb7a4ccd ("symbol lookup: introduce
dereference_symbol_descriptor()") %pf is deprecated, because %ps is smart
enough to handle function pointer dereference on platforms where such a
dereference is required.

While at it add proper line breaks to stay in the 80 character limit.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Cc: fweis...@gmail.com
Cc: pet...@infradead.org
Cc: Steven Rostedt 
Link: https://lkml.kernel.org/r/20190321120921.16463-4-anna-ma...@linutronix.de

---
 include/trace/events/timer.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index a57e4ee989d6..da975d69c453 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -73,7 +73,7 @@ TRACE_EVENT(timer_start,
__entry->flags  = flags;
),
 
-   TP_printk("timer=%p function=%pf expires=%lu [timeout=%ld] cpu=%u 
idx=%u flags=%s",
+   TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] cpu=%u 
idx=%u flags=%s",
  __entry->timer, __entry->function, __entry->expires,
  (long)__entry->expires - __entry->now,
  __entry->flags & TIMER_CPUMASK,
@@ -105,7 +105,8 @@ TRACE_EVENT(timer_expire_entry,
__entry->function   = timer->function;
),
 
-   TP_printk("timer=%p function=%pf now=%lu", __entry->timer, 
__entry->function,__entry->now)
+   TP_printk("timer=%p function=%ps now=%lu",
+ __entry->timer, __entry->function, __entry->now)
 );
 
 /**
@@ -210,7 +211,7 @@ TRACE_EVENT(hrtimer_start,
__entry->mode   = mode;
),
 
-   TP_printk("hrtimer=%p function=%pf expires=%llu softexpires=%llu "
+   TP_printk("hrtimer=%p function=%ps expires=%llu softexpires=%llu "
  "mode=%s", __entry->hrtimer, __entry->function,
  (unsigned long long) __entry->expires,
  (unsigned long long) __entry->softexpires,
@@ -243,7 +244,8 @@ TRACE_EVENT(hrtimer_expire_entry,
__entry->function   = hrtimer->function;
),
 
-   TP_printk("hrtimer=%p function=%pf now=%llu", __entry->hrtimer, 
__entry->function,
+   TP_printk("hrtimer=%p function=%ps now=%llu",
+ __entry->hrtimer, __entry->function,
  (unsigned long long) __entry->now)
 );
 


[tip:timers/core] tick/sched: Update tick_sched struct documentation

2019-03-24 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  d6b87eaf10bd061914f6d277d7428b3285d8850e
Gitweb: https://git.kernel.org/tip/d6b87eaf10bd061914f6d277d7428b3285d8850e
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Mar 2019 13:09:18 +0100
Committer:  Thomas Gleixner 
CommitDate: Sun, 24 Mar 2019 20:29:32 +0100

tick/sched: Update tick_sched struct documentation

Adapt the documentation order of struct members to the effective order of
struct members and add missing descriptions.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Cc: fweis...@gmail.com
Cc: pet...@infradead.org
Link: https://lkml.kernel.org/r/20190321120921.16463-2-anna-ma...@linutronix.de

---
 kernel/time/tick-sched.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 6de959a854b2..4fb06527cf64 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -24,12 +24,19 @@ enum tick_nohz_mode {
  * struct tick_sched - sched tick emulation and no idle tick control/stats
  * @sched_timer:   hrtimer to schedule the periodic tick in high
  * resolution mode
+ * @check_clocks:  Notification mechanism about clocksource changes
+ * @nohz_mode: Mode - one state of tick_nohz_mode
+ * @inidle:Indicator that the CPU is in the tick idle mode
+ * @tick_stopped:  Indicator that the idle tick has been stopped
+ * @idle_active:   Indicator that the CPU is actively in the tick idle 
mode;
+ * it is resetted during irq handling phases.
+ * @do_timer_lst:  CPU was the last one doing do_timer before going idle
+ * @got_idle_tick: Tick timer function has run with @inidle set
  * @last_tick: Store the last tick expiry time when the tick
  * timer is modified for nohz sleeps. This is necessary
  * to resume the tick timer operation in the timeline
  * when the CPU returns from nohz sleep.
  * @next_tick: Next tick to be fired when in dynticks mode.
- * @tick_stopped:  Indicator that the idle tick has been stopped
  * @idle_jiffies:  jiffies at the entry to idle for idle time accounting
  * @idle_calls:Total number of idle calls
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
@@ -40,8 +47,8 @@ enum tick_nohz_mode {
  * @iowait_sleeptime:  Sum of the time slept in idle with sched tick stopped, 
with IO outstanding
  * @timer_expires: Anticipated timer expiration time (in case sched tick 
is stopped)
  * @timer_expires_base:Base time clock monotonic for @timer_expires
- * @do_timer_lst:  CPU was the last one doing do_timer before going idle
- * @got_idle_tick: Tick timer function has run with @inidle set
+ * @next_timer:Expiry time of next expiring timer for 
debugging purpose only
+ * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick
  */
 struct tick_sched {
struct hrtimer  sched_timer;


[PATCH 3/4] timer: Replace deprecated vsprintf pointer extension %pf by %ps

2019-03-21 Thread Anna-Maria Gleixner
Since commit 04b8eb7a4ccd ("symbol lookup: introduce
dereference_symbol_descriptor()") %pf is deprecated, because %ps is smart
enough to handle function pointer dereference on platforms where such
dereference is required.

While at it shorten touched lines not to contain more than 80 characters.

Signed-off-by: Anna-Maria Gleixner 
---
 include/trace/events/timer.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index a57e4ee989d6..da975d69c453 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -73,7 +73,7 @@ TRACE_EVENT(timer_start,
__entry->flags  = flags;
),
 
-   TP_printk("timer=%p function=%pf expires=%lu [timeout=%ld] cpu=%u 
idx=%u flags=%s",
+   TP_printk("timer=%p function=%ps expires=%lu [timeout=%ld] cpu=%u 
idx=%u flags=%s",
  __entry->timer, __entry->function, __entry->expires,
  (long)__entry->expires - __entry->now,
  __entry->flags & TIMER_CPUMASK,
@@ -105,7 +105,8 @@ TRACE_EVENT(timer_expire_entry,
__entry->function   = timer->function;
),
 
-   TP_printk("timer=%p function=%pf now=%lu", __entry->timer, 
__entry->function,__entry->now)
+   TP_printk("timer=%p function=%ps now=%lu",
+ __entry->timer, __entry->function, __entry->now)
 );
 
 /**
@@ -210,7 +211,7 @@ TRACE_EVENT(hrtimer_start,
__entry->mode   = mode;
),
 
-   TP_printk("hrtimer=%p function=%pf expires=%llu softexpires=%llu "
+   TP_printk("hrtimer=%p function=%ps expires=%llu softexpires=%llu "
  "mode=%s", __entry->hrtimer, __entry->function,
  (unsigned long long) __entry->expires,
  (unsigned long long) __entry->softexpires,
@@ -243,7 +244,8 @@ TRACE_EVENT(hrtimer_expire_entry,
__entry->function   = hrtimer->function;
),
 
-   TP_printk("hrtimer=%p function=%pf now=%llu", __entry->hrtimer, 
__entry->function,
+   TP_printk("hrtimer=%p function=%ps now=%llu",
+ __entry->hrtimer, __entry->function,
  (unsigned long long) __entry->now)
 );
 
-- 
2.20.1



[PATCH 0/4] timers: Fix and improve tracing and documentation

2019-03-21 Thread Anna-Maria Gleixner
Hi,

the patch series was developed during investigating timer problems and
timer improvements. It contains a struct documentation fix in tick-sched
and a fixes as well as an improvement for timer tracing.

Thanks,

Anna-Maria


Anna-Maria Gleixner (4):
  tick-sched: Update tick_sched struct documentation
  timer: Move trace point to get proper index
  timer: Replace deprecated vsprintf pointer extension %pf by %ps
  trace/timer: Improve timer tracing

 include/trace/events/timer.h | 17 +++--
 kernel/time/tick-sched.h | 13 ++---
 kernel/time/timer.c  | 30 +-
 3 files changed, 38 insertions(+), 22 deletions(-)

-- 
2.20.1



[PATCH 4/4] trace/timer: Improve timer tracing

2019-03-21 Thread Anna-Maria Gleixner
Timers are added to the timer wheel off by one. This is required in
case a timer is queued directly before incrementing jiffies to prevent
early timer expiry.

When reading a timer trace and relying only on the expiry time of the
timer in timer_start trace point and on the now in timer_expiry_entry
trace point, it seems that the timer fires late. With the current
timer_expiry_entry trace point information only now=jiffies is printed
but not the value of base->clk. This makes it impossible to draw a
conclusion to the index of base->clk and makes it impossible to
examine timer problems without additional trace points.

Therefore add the base->clk value to the timer_expire_entry trace
point, to be able to calculate the index the timer base is located at
during collecting expired timers.

Signed-off-by: Anna-Maria Gleixner 
---
 include/trace/events/timer.h | 11 +++
 kernel/time/timer.c  | 17 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index da975d69c453..dade735657ef 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -89,24 +89,27 @@ TRACE_EVENT(timer_start,
  */
 TRACE_EVENT(timer_expire_entry,
 
-   TP_PROTO(struct timer_list *timer),
+   TP_PROTO(struct timer_list *timer, unsigned long baseclk),
 
-   TP_ARGS(timer),
+   TP_ARGS(timer, baseclk),
 
TP_STRUCT__entry(
__field( void *,timer   )
__field( unsigned long, now )
__field( void *,function)
+   __field( unsigned long, baseclk )
),
 
TP_fast_assign(
__entry->timer  = timer;
__entry->now= jiffies;
__entry->function   = timer->function;
+   __entry->baseclk= baseclk;
),
 
-   TP_printk("timer=%p function=%ps now=%lu",
- __entry->timer, __entry->function, __entry->now)
+   TP_printk("timer=%p function=%ps now=%lu base->clk=%lu",
+ __entry->timer, __entry->function, __entry->now,
+ __entry->baseclk)
 );
 
 /**
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8d7918ae4d0c..c0233c1a4ccb 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1293,7 +1293,8 @@ int del_timer_sync(struct timer_list *timer)
 EXPORT_SYMBOL(del_timer_sync);
 #endif
 
-static void call_timer_fn(struct timer_list *timer, void (*fn)(struct 
timer_list *))
+static void call_timer_fn(unsigned long baseclk, struct timer_list *timer,
+ void (*fn)(struct timer_list *))
 {
int count = preempt_count();
 
@@ -1316,7 +1317,7 @@ static void call_timer_fn(struct timer_list *timer, void 
(*fn)(struct timer_list
 */
lock_map_acquire(_map);
 
-   trace_timer_expire_entry(timer);
+   trace_timer_expire_entry(timer, baseclk);
fn(timer);
trace_timer_expire_exit(timer);
 
@@ -1337,6 +1338,14 @@ static void call_timer_fn(struct timer_list *timer, void 
(*fn)(struct timer_list
 
 static void expire_timers(struct timer_base *base, struct hlist_head *head)
 {
+   /*
+* this value is required for tracing only
+*
+* base->clk was incremented directly before expire_timers was
+* called. But expiry is related to the old base->clk value.
+*/
+   unsigned long baseclk = base->clk - 1;
+
while (!hlist_empty(head)) {
struct timer_list *timer;
void (*fn)(struct timer_list *);
@@ -1350,11 +1359,11 @@ static void expire_timers(struct timer_base *base, 
struct hlist_head *head)
 
if (timer->flags & TIMER_IRQSAFE) {
raw_spin_unlock(>lock);
-   call_timer_fn(timer, fn);
+   call_timer_fn(baseclk, timer, fn);
raw_spin_lock(>lock);
} else {
raw_spin_unlock_irq(>lock);
-   call_timer_fn(timer, fn);
+   call_timer_fn(baseclk, timer, fn);
raw_spin_lock_irq(>lock);
}
}
-- 
2.20.1



[PATCH 2/4] timer: Move trace point to get proper index

2019-03-21 Thread Anna-Maria Gleixner
When placing the timer_start trace point before timer wheel bucket
index is calculated, index information in trace point is useless.

It is not possible to simply move debug_activate() call after index
calculation, because debug_object_activate() function needs to be
called before touching the object.

Therefore split debug_activate() function and move trace point into
timer enqueue after index calculation. debug_object_activate() call
remains at the original place.

Signed-off-by: Anna-Maria Gleixner 
---
 kernel/time/timer.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2fce056f8a49..8d7918ae4d0c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -536,6 +536,8 @@ static void enqueue_timer(struct timer_base *base, struct 
timer_list *timer,
hlist_add_head(>entry, base->vectors + idx);
__set_bit(idx, base->pending_map);
timer_set_idx(timer, idx);
+
+   trace_timer_start(timer, timer->expires, timer->flags);
 }
 
 static void
@@ -757,13 +759,6 @@ static inline void debug_init(struct timer_list *timer)
trace_timer_init(timer);
 }
 
-static inline void
-debug_activate(struct timer_list *timer, unsigned long expires)
-{
-   debug_timer_activate(timer);
-   trace_timer_start(timer, expires, timer->flags);
-}
-
 static inline void debug_deactivate(struct timer_list *timer)
 {
debug_timer_deactivate(timer);
@@ -1037,7 +1032,7 @@ __mod_timer(struct timer_list *timer, unsigned long 
expires, unsigned int option
}
}
 
-   debug_activate(timer, expires);
+   debug_timer_activate(timer);
 
timer->expires = expires;
/*
@@ -1171,7 +1166,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
}
forward_timer_base(base);
 
-   debug_activate(timer, timer->expires);
+   debug_timer_activate(timer);
internal_add_timer(base, timer);
raw_spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.20.1



[PATCH 1/4] tick-sched: Update tick_sched struct documentation

2019-03-21 Thread Anna-Maria Gleixner
Adapt the documentation order of struct members to effective order of
struct members and add missing descriptions.

Signed-off-by: Anna-Maria Gleixner 
---
 kernel/time/tick-sched.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 6de959a854b2..4fb06527cf64 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -24,12 +24,19 @@ enum tick_nohz_mode {
  * struct tick_sched - sched tick emulation and no idle tick control/stats
  * @sched_timer:   hrtimer to schedule the periodic tick in high
  * resolution mode
+ * @check_clocks:  Notification mechanism about clocksource changes
+ * @nohz_mode: Mode - one state of tick_nohz_mode
+ * @inidle:Indicator that the CPU is in the tick idle mode
+ * @tick_stopped:  Indicator that the idle tick has been stopped
+ * @idle_active:   Indicator that the CPU is actively in the tick idle 
mode;
+ * it is resetted during irq handling phases.
+ * @do_timer_lst:  CPU was the last one doing do_timer before going idle
+ * @got_idle_tick: Tick timer function has run with @inidle set
  * @last_tick: Store the last tick expiry time when the tick
  * timer is modified for nohz sleeps. This is necessary
  * to resume the tick timer operation in the timeline
  * when the CPU returns from nohz sleep.
  * @next_tick: Next tick to be fired when in dynticks mode.
- * @tick_stopped:  Indicator that the idle tick has been stopped
  * @idle_jiffies:  jiffies at the entry to idle for idle time accounting
  * @idle_calls:Total number of idle calls
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
@@ -40,8 +47,8 @@ enum tick_nohz_mode {
  * @iowait_sleeptime:  Sum of the time slept in idle with sched tick stopped, 
with IO outstanding
  * @timer_expires: Anticipated timer expiration time (in case sched tick 
is stopped)
  * @timer_expires_base:Base time clock monotonic for @timer_expires
- * @do_timer_lst:  CPU was the last one doing do_timer before going idle
- * @got_idle_tick: Tick timer function has run with @inidle set
+ * @next_timer:Expiry time of next expiring timer for 
debugging purpose only
+ * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick
  */
 struct tick_sched {
struct hrtimer  sched_timer;
-- 
2.20.1



[PATCH] bitops/find: Fix function description argument ordering

2018-08-06 Thread Anna-Maria Gleixner
The order of the arguments in function documentation doesn't fit with
the implementation. Change the documentation so that it corresponds to
the code. This prevents to confuse people reading the documentation.

While at it fixing the line breaks between the type of an argument and
the arguments name in function declaration for better readability.

Signed-off-by: Anna-Maria Gleixner 
---
 include/asm-generic/bitops/find.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/bitops/find.h 
b/include/asm-generic/bitops/find.h
index 8a1ee10014de..30f0f8d0bd79 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -6,14 +6,15 @@
 /**
  * find_next_bit - find the next set bit in a memory region
  * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
-   size, unsigned long offset);
+extern unsigned long
+find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset);
 #endif
 
 #ifndef find_next_and_bit
@@ -21,29 +22,30 @@ extern unsigned long find_next_bit(const unsigned long 
*addr, unsigned long
  * find_next_and_bit - find the next set bit in both memory regions
  * @addr1: The first address to base the search on
  * @addr2: The second address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
-   const unsigned long *addr2, unsigned long size,
-   unsigned long offset);
+extern unsigned long
+find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2,
+ unsigned long size, unsigned long offset);
 #endif
 
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
  * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number of the next zero bit
  * If no bits are zero, returns @size.
  */
-extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned
-   long size, unsigned long offset);
+extern unsigned long
+find_next_zero_bit(const unsigned long *addr, unsigned long size,
+  unsigned long offset);
 #endif
 
 #ifdef CONFIG_GENERIC_FIND_FIRST_BIT
-- 
2.18.0



[PATCH] bitops/find: Fix function description argument ordering

2018-08-06 Thread Anna-Maria Gleixner
The order of the arguments in function documentation doesn't fit with
the implementation. Change the documentation so that it corresponds to
the code. This prevents to confuse people reading the documentation.

While at it fixing the line breaks between the type of an argument and
the arguments name in function declaration for better readability.

Signed-off-by: Anna-Maria Gleixner 
---
 include/asm-generic/bitops/find.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/bitops/find.h 
b/include/asm-generic/bitops/find.h
index 8a1ee10014de..30f0f8d0bd79 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -6,14 +6,15 @@
 /**
  * find_next_bit - find the next set bit in a memory region
  * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
-   size, unsigned long offset);
+extern unsigned long
+find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset);
 #endif
 
 #ifndef find_next_and_bit
@@ -21,29 +22,30 @@ extern unsigned long find_next_bit(const unsigned long 
*addr, unsigned long
  * find_next_and_bit - find the next set bit in both memory regions
  * @addr1: The first address to base the search on
  * @addr2: The second address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number for the next set bit
  * If no bits are set, returns @size.
  */
-extern unsigned long find_next_and_bit(const unsigned long *addr1,
-   const unsigned long *addr2, unsigned long size,
-   unsigned long offset);
+extern unsigned long
+find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2,
+ unsigned long size, unsigned long offset);
 #endif
 
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
  * @addr: The address to base the search on
- * @offset: The bitnumber to start searching at
  * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
  *
  * Returns the bit number of the next zero bit
  * If no bits are zero, returns @size.
  */
-extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned
-   long size, unsigned long offset);
+extern unsigned long
+find_next_zero_bit(const unsigned long *addr, unsigned long size,
+  unsigned long offset);
 #endif
 
 #ifdef CONFIG_GENERIC_FIND_FIRST_BIT
-- 
2.18.0



Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Anna-Maria Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:

> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 

Tested-by: Anna-Maria Gleixner 

Thanks,

Anna-Maria


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Anna-Maria Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:

> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 

Tested-by: Anna-Maria Gleixner 

Thanks,

Anna-Maria


[tip:timers/urgent] nohz: Fix local_timer_softirq_pending()

2018-07-31 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  80d20d35af1edd632a5e7a3b9c0ab7ceff92769e
Gitweb: https://git.kernel.org/tip/80d20d35af1edd632a5e7a3b9c0ab7ceff92769e
Author: Anna-Maria Gleixner 
AuthorDate: Tue, 31 Jul 2018 18:13:58 +0200
Committer:  Thomas Gleixner 
CommitDate: Tue, 31 Jul 2018 22:08:44 +0200

nohz: Fix local_timer_softirq_pending()

local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in 
tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Reviewed-by: Daniel Bristot de Oliveira 
Acked-by: Frederic Weisbecker 
Cc: bige...@linutronix.de
Cc: pet...@infradead.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-ma...@linutronix.de

---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da9455a6b42b..5b33e2f5c0ed 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 
 static inline bool local_timer_softirq_pending(void)
 {
-   return local_softirq_pending() & TIMER_SOFTIRQ;
+   return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
 }
 
 static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)


[tip:timers/urgent] nohz: Fix local_timer_softirq_pending()

2018-07-31 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  80d20d35af1edd632a5e7a3b9c0ab7ceff92769e
Gitweb: https://git.kernel.org/tip/80d20d35af1edd632a5e7a3b9c0ab7ceff92769e
Author: Anna-Maria Gleixner 
AuthorDate: Tue, 31 Jul 2018 18:13:58 +0200
Committer:  Thomas Gleixner 
CommitDate: Tue, 31 Jul 2018 22:08:44 +0200

nohz: Fix local_timer_softirq_pending()

local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in 
tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Reviewed-by: Daniel Bristot de Oliveira 
Acked-by: Frederic Weisbecker 
Cc: bige...@linutronix.de
Cc: pet...@infradead.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-ma...@linutronix.de

---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da9455a6b42b..5b33e2f5c0ed 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 
 static inline bool local_timer_softirq_pending(void)
 {
-   return local_softirq_pending() & TIMER_SOFTIRQ;
+   return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
 }
 
 static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)


[PATCH] nohz: Fix local_timer_softirq_pending()

2018-07-31 Thread Anna-Maria Gleixner
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in 
tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner 
---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da9455a6b42b..5b33e2f5c0ed 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 
 static inline bool local_timer_softirq_pending(void)
 {
-   return local_softirq_pending() & TIMER_SOFTIRQ;
+   return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
 }
 
 static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
-- 
2.18.0



[PATCH] nohz: Fix local_timer_softirq_pending()

2018-07-31 Thread Anna-Maria Gleixner
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in 
tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner 
---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da9455a6b42b..5b33e2f5c0ed 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 
 static inline bool local_timer_softirq_pending(void)
 {
-   return local_softirq_pending() & TIMER_SOFTIRQ;
+   return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
 }
 
 static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
-- 
2.18.0



Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick

2018-07-13 Thread Anna-Maria Gleixner
Hi Haris,

On Thu, 28 Jun 2018, Haris Okanovic wrote:

> Collect expired timers in interrupt context to avoid overhead of waking
> ktimersoftd on every scheduler tick.
> 
> This is implemented by storing lists of expired timers in the timer_base
> struct, which is updated by the interrupt routing on each tick in
> run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when
> one or more expired timers are collected.
> 
> Performance impact on a 2core Intel Atom E3825 system:
>  * reduction in small latency spikes measured by cyclictest
>  * ~30% fewer context-switches measured by perf
>  * run_local_timers() execution time increases by 0.2 measured by TSC
> 

I'm also working on timer improvements at the moment. When I fixed all
my bugs in my implementation (there is a last horrible one), I'm very
interested in integrating your patches into my testing to be able to
give you a tested-by.

Thanks,

Anna-Maria


Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick

2018-07-13 Thread Anna-Maria Gleixner
Hi Haris,

On Thu, 28 Jun 2018, Haris Okanovic wrote:

> Collect expired timers in interrupt context to avoid overhead of waking
> ktimersoftd on every scheduler tick.
> 
> This is implemented by storing lists of expired timers in the timer_base
> struct, which is updated by the interrupt routing on each tick in
> run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when
> one or more expired timers are collected.
> 
> Performance impact on a 2core Intel Atom E3825 system:
>  * reduction in small latency spikes measured by cyclictest
>  * ~30% fewer context-switches measured by perf
>  * run_local_timers() execution time increases by 0.2 measured by TSC
> 

I'm also working on timer improvements at the moment. When I fixed all
my bugs in my implementation (there is a last horrible one), I'm very
interested in integrating your patches into my testing to be able to
give you a tested-by.

Thanks,

Anna-Maria


[PATCH v2] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2018-07-03 Thread Anna-Maria Gleixner
From: Sebastian Andrzej Siewior 

hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.

Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.

Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
[anna-maria: Updating the commit message]
Signed-off-by: Anna-Maria Gleixner 
---

v1..v2: Fix missing call site in drivers/staging/android/vsoc.c

 block/blk-mq.c |  3 +--
 drivers/staging/android/vsoc.c |  6 ++---
 include/linux/hrtimer.h| 19 +++---
 include/linux/wait.h   |  4 +--
 kernel/futex.c | 19 ++
 kernel/time/hrtimer.c  | 46 ++
 net/core/pktgen.c  |  4 +--
 7 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 95919268564b..f95ad9ede0f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct 
request_queue *q,
kt = nsecs;
 
mode = HRTIMER_MODE_REL;
-   hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode);
+   hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current);
hrtimer_set_expires(, kt);
 
-   hrtimer_init_sleeper(, current);
do {
if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
break;
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 806beda1040b..6c7f666c0e33 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -438,12 +438,10 @@ static int handle_vsoc_cond_wait(struct file *filp, 
struct vsoc_cond_wait *arg)
 
if (!timespec_valid())
return -EINVAL;
-   hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS);
+   hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS, current);
hrtimer_set_expires_range_ns(>timer, timespec_to_ktime(ts),
 current->timer_slack_ns);
-
-   hrtimer_init_sleeper(to, current);
}
 
while (1) {
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 3892e9c8b2de..b8bbaabd5aff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t 
clock_id,
+enum hrtimer_mode mode,
+struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
  enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+ clockid_t clock_id,
+ enum hrtimer_mode mode,
+ struct task_struct *task);
 
 extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
 #else
@@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer 
*timer,
 {
hrtimer_init(timer, which_clock, mode);
 }
+
+static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+   clockid_t clock_id,
+   enum hrtimer_mode mode,
+   struct task_struct *task)
+{
+   hrtimer_init_sleeper(sl, clock_id, mode, task);
+}
+
 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
 #endif
 
@@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
  const enum hrtimer_mode mode,
  const clockid_t clockid);
 
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
-struct task_struct *tsk);
-
 extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode);
 extern int schedule_hrtimeout_range_clock(ktime_t *expires,
diff --git a/include/linux/wait.h b/include/linux/wait.h
index d9f131ecf708..a0938fc8dcdb 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -488,8 +488,8 @@ do {
\
int 

[PATCH v2] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2018-07-03 Thread Anna-Maria Gleixner
From: Sebastian Andrzej Siewior 

hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.

Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.

Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
[anna-maria: Updating the commit message]
Signed-off-by: Anna-Maria Gleixner 
---

v1..v2: Fix missing call site in drivers/staging/android/vsoc.c

 block/blk-mq.c |  3 +--
 drivers/staging/android/vsoc.c |  6 ++---
 include/linux/hrtimer.h| 19 +++---
 include/linux/wait.h   |  4 +--
 kernel/futex.c | 19 ++
 kernel/time/hrtimer.c  | 46 ++
 net/core/pktgen.c  |  4 +--
 7 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 95919268564b..f95ad9ede0f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct 
request_queue *q,
kt = nsecs;
 
mode = HRTIMER_MODE_REL;
-   hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode);
+   hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current);
hrtimer_set_expires(, kt);
 
-   hrtimer_init_sleeper(, current);
do {
if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
break;
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 806beda1040b..6c7f666c0e33 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -438,12 +438,10 @@ static int handle_vsoc_cond_wait(struct file *filp, 
struct vsoc_cond_wait *arg)
 
if (!timespec_valid())
return -EINVAL;
-   hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS);
+   hrtimer_init_sleeper_on_stack(to, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS, current);
hrtimer_set_expires_range_ns(>timer, timespec_to_ktime(ts),
 current->timer_slack_ns);
-
-   hrtimer_init_sleeper(to, current);
}
 
while (1) {
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 3892e9c8b2de..b8bbaabd5aff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t 
clock_id,
+enum hrtimer_mode mode,
+struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
  enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+ clockid_t clock_id,
+ enum hrtimer_mode mode,
+ struct task_struct *task);
 
 extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
 #else
@@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer 
*timer,
 {
hrtimer_init(timer, which_clock, mode);
 }
+
+static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+   clockid_t clock_id,
+   enum hrtimer_mode mode,
+   struct task_struct *task)
+{
+   hrtimer_init_sleeper(sl, clock_id, mode, task);
+}
+
 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
 #endif
 
@@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
  const enum hrtimer_mode mode,
  const clockid_t clockid);
 
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
-struct task_struct *tsk);
-
 extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode);
 extern int schedule_hrtimeout_range_clock(ktime_t *expires,
diff --git a/include/linux/wait.h b/include/linux/wait.h
index d9f131ecf708..a0938fc8dcdb 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -488,8 +488,8 @@ do {
\
int 

[PATCH] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2018-07-02 Thread Anna-Maria Gleixner
From: Sebastian Andrzej Siewior 

hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.

Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.

Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
[anna-maria: Updating the commit message]
Signed-off-by: Anna-Maria Gleixner 
---
 block/blk-mq.c  |  3 +--
 include/linux/hrtimer.h | 19 ++---
 include/linux/wait.h|  4 ++--
 kernel/futex.c  | 19 +++--
 kernel/time/hrtimer.c   | 46 -
 net/core/pktgen.c   |  4 ++--
 6 files changed, 65 insertions(+), 30 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 95919268564b..f95ad9ede0f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct 
request_queue *q,
kt = nsecs;
 
mode = HRTIMER_MODE_REL;
-   hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode);
+   hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current);
hrtimer_set_expires(, kt);
 
-   hrtimer_init_sleeper(, current);
do {
if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
break;
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 3892e9c8b2de..b8bbaabd5aff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t 
clock_id,
+enum hrtimer_mode mode,
+struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
  enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+ clockid_t clock_id,
+ enum hrtimer_mode mode,
+ struct task_struct *task);
 
 extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
 #else
@@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer 
*timer,
 {
hrtimer_init(timer, which_clock, mode);
 }
+
+static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+   clockid_t clock_id,
+   enum hrtimer_mode mode,
+   struct task_struct *task)
+{
+   hrtimer_init_sleeper(sl, clock_id, mode, task);
+}
+
 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
 #endif
 
@@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
  const enum hrtimer_mode mode,
  const clockid_t clockid);
 
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
-struct task_struct *tsk);
-
 extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode);
 extern int schedule_hrtimeout_range_clock(ktime_t *expires,
diff --git a/include/linux/wait.h b/include/linux/wait.h
index d9f131ecf708..a0938fc8dcdb 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -488,8 +488,8 @@ do {
\
int __ret = 0;  
\
struct hrtimer_sleeper __t; 
\

\
-   hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);   
\
-   hrtimer_init_sleeper(&__t, current);
\
+   hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, HRTIMER_MODE_REL,  
\
+ current); 
\
if ((timeout) != KTIME_MAX) 
\
hrtimer_start_range_ns(&__t.timer, timeout, 
\
   current->timer_slack_ns, 
\
diff --git a/kernel/futex.c b/kernel/futex.c
index 1f450e092c74..146432d78e06 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2624,10 +2624,9 @@ static int futex_wait(u32 __user *uaddr, unsigned 

[PATCH] hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2018-07-02 Thread Anna-Maria Gleixner
From: Sebastian Andrzej Siewior 

hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.

Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.

Link: http://lkml.kernel.org/r/20170905135719.qsj4h5twhjkmk...@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior 
[anna-maria: Updating the commit message]
Signed-off-by: Anna-Maria Gleixner 
---
 block/blk-mq.c  |  3 +--
 include/linux/hrtimer.h | 19 ++---
 include/linux/wait.h|  4 ++--
 kernel/futex.c  | 19 +++--
 kernel/time/hrtimer.c   | 46 -
 net/core/pktgen.c   |  4 ++--
 6 files changed, 65 insertions(+), 30 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 95919268564b..f95ad9ede0f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2984,10 +2984,9 @@ static bool blk_mq_poll_hybrid_sleep(struct 
request_queue *q,
kt = nsecs;
 
mode = HRTIMER_MODE_REL;
-   hrtimer_init_on_stack(, CLOCK_MONOTONIC, mode);
+   hrtimer_init_sleeper_on_stack(, CLOCK_MONOTONIC, mode, current);
hrtimer_set_expires(, kt);
 
-   hrtimer_init_sleeper(, current);
do {
if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
break;
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 3892e9c8b2de..b8bbaabd5aff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -364,10 +364,17 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t 
clock_id,
+enum hrtimer_mode mode,
+struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
  enum hrtimer_mode mode);
+extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+ clockid_t clock_id,
+ enum hrtimer_mode mode,
+ struct task_struct *task);
 
 extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
 #else
@@ -377,6 +384,15 @@ static inline void hrtimer_init_on_stack(struct hrtimer 
*timer,
 {
hrtimer_init(timer, which_clock, mode);
 }
+
+static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+   clockid_t clock_id,
+   enum hrtimer_mode mode,
+   struct task_struct *task)
+{
+   hrtimer_init_sleeper(sl, clock_id, mode, task);
+}
+
 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
 #endif
 
@@ -480,9 +496,6 @@ extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
  const enum hrtimer_mode mode,
  const clockid_t clockid);
 
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
-struct task_struct *tsk);
-
 extern int schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode);
 extern int schedule_hrtimeout_range_clock(ktime_t *expires,
diff --git a/include/linux/wait.h b/include/linux/wait.h
index d9f131ecf708..a0938fc8dcdb 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -488,8 +488,8 @@ do {
\
int __ret = 0;  
\
struct hrtimer_sleeper __t; 
\

\
-   hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);   
\
-   hrtimer_init_sleeper(&__t, current);
\
+   hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, HRTIMER_MODE_REL,  
\
+ current); 
\
if ((timeout) != KTIME_MAX) 
\
hrtimer_start_range_ns(&__t.timer, timeout, 
\
   current->timer_slack_ns, 
\
diff --git a/kernel/futex.c b/kernel/futex.c
index 1f450e092c74..146432d78e06 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2624,10 +2624,9 @@ static int futex_wait(u32 __user *uaddr, unsigned 

sched/core warning triggers on rcu torture test

2018-06-26 Thread Anna-Maria Gleixner
Hi,

during rcu torture tests (TREE04 and TREE07) I noticed, that a
WARN_ON_ONCE() in sched core triggers on a recent 4.18-rc2 based
kernel (6f0d349d922b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) as well as
on a 4.17.3.

I'm running the tests on a machine with 144 cores:

  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 
--configs "9*TREE07"
  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 
--configs "18*TREE04"


The warning was introduced by commit d84b31313ef8 ("sched/isolation:
Offload residual 1Hz scheduler tick").


Output looks similar for all tests I did (this one is the output of
the 4.18-rc2 based kernel):

WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 
sched_tick_remote+0xb6/0xc0
Modules linked in:
CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xb6/0xc0
Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 
00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb c5 66 0f 1f 44 
00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
Call Trace:
 process_one_work+0x1df/0x3b0
 worker_thread+0x44/0x3d0
 kthread+0xf3/0x130
 ? set_worker_desc+0xb0/0xb0
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x35/0x40
---[ end trace 7c99b83eb0ec64e8 ]---


Do you need some more information?


Thanks,

Anna-Maria


sched/core warning triggers on rcu torture test

2018-06-26 Thread Anna-Maria Gleixner
Hi,

during rcu torture tests (TREE04 and TREE07) I noticed, that a
WARN_ON_ONCE() in sched core triggers on a recent 4.18-rc2 based
kernel (6f0d349d922b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) as well as
on a 4.17.3.

I'm running the tests on a machine with 144 cores:

  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 
--configs "9*TREE07"
  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 
--configs "18*TREE04"


The warning was introduced by commit d84b31313ef8 ("sched/isolation:
Offload residual 1Hz scheduler tick").


Output looks similar for all tests I did (this one is the output of
the 4.18-rc2 based kernel):

WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 
sched_tick_remote+0xb6/0xc0
Modules linked in:
CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xb6/0xc0
Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 
00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb c5 66 0f 1f 44 
00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
Call Trace:
 process_one_work+0x1df/0x3b0
 worker_thread+0x44/0x3d0
 kthread+0xf3/0x130
 ? set_worker_desc+0xb0/0xb0
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x35/0x40
---[ end trace 7c99b83eb0ec64e8 ]---


Do you need some more information?


Thanks,

Anna-Maria


[tip:core/urgent] signal: Remove no longer required irqsave/restore

2018-06-09 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  59dc6f3c6d81c0c4379025c4eb56919391d62b67
Gitweb: https://git.kernel.org/tip/59dc6f3c6d81c0c4379025c4eb56919391d62b67
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 25 May 2018 11:05:07 +0200
Committer:  Thomas Gleixner 
CommitDate: Sun, 10 Jun 2018 06:14:01 +0200

signal: Remove no longer required irqsave/restore

Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make
wait_lock irq safe") for a different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock
irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep
enabled as suggested by Paul McKenney.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Acked-by: Paul E. McKenney 
Acked-by: "Eric W. Biederman" 
Cc: bige...@linutronix.de
Link: https://lkml.kernel.org/r/20180525090507.22248-3-anna-ma...@linutronix.de

---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0f865d67415d..8d8a940422a8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }


[tip:core/urgent] rcu: Update documentation of rcu_read_unlock()

2018-06-09 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ec84b27f9b3b569f9235413d1945a2006b97b0aa
Gitweb: https://git.kernel.org/tip/ec84b27f9b3b569f9235413d1945a2006b97b0aa
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 25 May 2018 11:05:06 +0200
Committer:  Thomas Gleixner 
CommitDate: Sun, 10 Jun 2018 06:14:01 +0200

rcu: Update documentation of rcu_read_unlock()

Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
wait_lock is no longer valid.

Remove it to prevent kernel developers reading the documentation to rely on
it.

Suggested-by: Eric W. Biederman 
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Acked-by: "Eric W. Biederman" 
Cc: bige...@linutronix.de
Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-ma...@linutronix.de

---
 include/linux/rcupdate.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index e679b175b411..65163aa0bb04 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -652,9 +652,7 @@ static inline void rcu_read_lock(void)
  * Unfortunately, this function acquires the scheduler's runqueue and
  * priority-inheritance spinlocks.  This means that deadlock could result
  * if the caller of rcu_read_unlock() already holds one of these locks or
- * any lock that is ever acquired while holding them; or any lock which
- * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
- * does not disable irqs while taking ->wait_lock.
+ * any lock that is ever acquired while holding them.
  *
  * That said, RCU readers are never priority boosted unless they were
  * preempted.  Therefore, one way to avoid deadlock is to make sure


[tip:core/urgent] signal: Remove no longer required irqsave/restore

2018-06-09 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  59dc6f3c6d81c0c4379025c4eb56919391d62b67
Gitweb: https://git.kernel.org/tip/59dc6f3c6d81c0c4379025c4eb56919391d62b67
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 25 May 2018 11:05:07 +0200
Committer:  Thomas Gleixner 
CommitDate: Sun, 10 Jun 2018 06:14:01 +0200

signal: Remove no longer required irqsave/restore

Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make
wait_lock irq safe") for a different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock
irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep
enabled as suggested by Paul McKenney.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Acked-by: Paul E. McKenney 
Acked-by: "Eric W. Biederman" 
Cc: bige...@linutronix.de
Link: https://lkml.kernel.org/r/20180525090507.22248-3-anna-ma...@linutronix.de

---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0f865d67415d..8d8a940422a8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }


[tip:core/urgent] rcu: Update documentation of rcu_read_unlock()

2018-06-09 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ec84b27f9b3b569f9235413d1945a2006b97b0aa
Gitweb: https://git.kernel.org/tip/ec84b27f9b3b569f9235413d1945a2006b97b0aa
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 25 May 2018 11:05:06 +0200
Committer:  Thomas Gleixner 
CommitDate: Sun, 10 Jun 2018 06:14:01 +0200

rcu: Update documentation of rcu_read_unlock()

Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
wait_lock is no longer valid.

Remove it to prevent kernel developers reading the documentation to rely on
it.

Suggested-by: Eric W. Biederman 
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Acked-by: "Eric W. Biederman" 
Cc: bige...@linutronix.de
Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-ma...@linutronix.de

---
 include/linux/rcupdate.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index e679b175b411..65163aa0bb04 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -652,9 +652,7 @@ static inline void rcu_read_lock(void)
  * Unfortunately, this function acquires the scheduler's runqueue and
  * priority-inheritance spinlocks.  This means that deadlock could result
  * if the caller of rcu_read_unlock() already holds one of these locks or
- * any lock that is ever acquired while holding them; or any lock which
- * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
- * does not disable irqs while taking ->wait_lock.
+ * any lock that is ever acquired while holding them.
  *
  * That said, RCU readers are never priority boosted unless they were
  * preempted.  Therefore, one way to avoid deadlock is to make sure


[tip:core/urgent] signal: Remove no longer required irqsave/restore

2018-06-07 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  e79e0f38083e607da5d7b493e7a0f78ba38d788e
Gitweb: https://git.kernel.org/tip/e79e0f38083e607da5d7b493e7a0f78ba38d788e
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 4 May 2018 16:40:14 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 7 Jun 2018 22:18:55 +0200

signal: Remove no longer required irqsave/restore

Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
wait_lock irq safe") for different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

Testing was done over an extensive period with RCU torture, especially with
TREE03 as requested by Paul.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: "Paul E . McKenney" 
Acked-by: "Eric W. Biederman" 
Link: https://lkml.kernel.org/r/20180504144014.5378-1-bige...@linutronix.de
---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0f865d67415d..8d8a940422a8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }


[tip:core/urgent] signal: Remove no longer required irqsave/restore

2018-06-07 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  e79e0f38083e607da5d7b493e7a0f78ba38d788e
Gitweb: https://git.kernel.org/tip/e79e0f38083e607da5d7b493e7a0f78ba38d788e
Author: Anna-Maria Gleixner 
AuthorDate: Fri, 4 May 2018 16:40:14 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 7 Jun 2018 22:18:55 +0200

signal: Remove no longer required irqsave/restore

Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
wait_lock irq safe") for different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

Testing was done over an extensive period with RCU torture, especially with
TREE03 as requested by Paul.

Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Acked-by: "Paul E . McKenney" 
Acked-by: "Eric W. Biederman" 
Link: https://lkml.kernel.org/r/20180504144014.5378-1-bige...@linutronix.de
---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0f865d67415d..8d8a940422a8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }


Re: [PATCH v6 0/4] enable early printing of hashed pointers

2018-06-06 Thread Anna-Maria Gleixner
On Tue, 5 Jun 2018, Anna-Maria Gleixner wrote:

> On Thu, 31 May 2018, Steven Rostedt wrote:
> 
> > On Mon, 28 May 2018 11:46:38 +1000
> > "Tobin C. Harding"  wrote:
> > 
> > > Steve,
> > 
> > Hi Tobin,
> > 
> > Sorry for the late reply, I'm currently at a conference and have had
> > little time to read email.
> > 
> > > 
> > > Could you please take a quick squiz at the final 2 patches if you get a
> > > chance.  I assumed we are in preemptible context during early_init based
> > > on your code (and code comment) and called static_branch_disable()
> > > directly if hw RNG returned keying material.  It's a pretty simple
> > > change but I'd love to get someone else to check I've not noob'ed it.
> > 
> > I can take a look, and perhaps do some tests. But it was Anna-Maria
> > that originally triggered the issue. She's on Cc, perhaps she can try
> > this and see if it works.
> 
> I'll test it today - sorry for the delay.
> 

I tested it with command line option enabled. This works early enough
because it works right after early_trace_init().

Thanks,

Anna-Maria


Re: [PATCH v6 0/4] enable early printing of hashed pointers

2018-06-06 Thread Anna-Maria Gleixner
On Tue, 5 Jun 2018, Anna-Maria Gleixner wrote:

> On Thu, 31 May 2018, Steven Rostedt wrote:
> 
> > On Mon, 28 May 2018 11:46:38 +1000
> > "Tobin C. Harding"  wrote:
> > 
> > > Steve,
> > 
> > Hi Tobin,
> > 
> > Sorry for the late reply, I'm currently at a conference and have had
> > little time to read email.
> > 
> > > 
> > > Could you please take a quick squiz at the final 2 patches if you get a
> > > chance.  I assumed we are in preemptible context during early_init based
> > > on your code (and code comment) and called static_branch_disable()
> > > directly if hw RNG returned keying material.  It's a pretty simple
> > > change but I'd love to get someone else to check I've not noob'ed it.
> > 
> > I can take a look, and perhaps do some tests. But it was Anna-Maria
> > that originally triggered the issue. She's on Cc, perhaps she can try
> > this and see if it works.
> 
> I'll test it today - sorry for the delay.
> 

I tested it with command line option enabled. This works early enough
because it works right after early_trace_init().

Thanks,

Anna-Maria


Re: [PATCH v6 0/4] enable early printing of hashed pointers

2018-06-05 Thread Anna-Maria Gleixner
On Thu, 31 May 2018, Steven Rostedt wrote:

> On Mon, 28 May 2018 11:46:38 +1000
> "Tobin C. Harding"  wrote:
> 
> > Steve,
> 
> Hi Tobin,
> 
> Sorry for the late reply, I'm currently at a conference and have had
> little time to read email.
> 
> > 
> > Could you please take a quick squiz at the final 2 patches if you get a
> > chance.  I assumed we are in preemptible context during early_init based
> > on your code (and code comment) and called static_branch_disable()
> > directly if hw RNG returned keying material.  It's a pretty simple
> > change but I'd love to get someone else to check I've not noob'ed it.
> 
> I can take a look, and perhaps do some tests. But it was Anna-Maria
> that originally triggered the issue. She's on Cc, perhaps she can try
> this and see if it works.

I'll test it today - sorry for the delay.

Anna-Maria


Re: [PATCH v6 0/4] enable early printing of hashed pointers

2018-06-05 Thread Anna-Maria Gleixner
On Thu, 31 May 2018, Steven Rostedt wrote:

> On Mon, 28 May 2018 11:46:38 +1000
> "Tobin C. Harding"  wrote:
> 
> > Steve,
> 
> Hi Tobin,
> 
> Sorry for the late reply, I'm currently at a conference and have had
> little time to read email.
> 
> > 
> > Could you please take a quick squiz at the final 2 patches if you get a
> > chance.  I assumed we are in preemptible context during early_init based
> > on your code (and code comment) and called static_branch_disable()
> > directly if hw RNG returned keying material.  It's a pretty simple
> > change but I'd love to get someone else to check I've not noob'ed it.
> 
> I can take a look, and perhaps do some tests. But it was Anna-Maria
> that originally triggered the issue. She's on Cc, perhaps she can try
> this and see if it works.

I'll test it today - sorry for the delay.

Anna-Maria


Re: [PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()

2018-05-28 Thread Anna-Maria Gleixner
On Fri, 25 May 2018, Paul E. McKenney wrote:

> On Fri, May 25, 2018 at 11:05:06AM +0200, Anna-Maria Gleixner wrote:
> > Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
> > explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
> > wait_lock is no longer valid.
> > 
> > Remove it to prevent kernel developers reading the documentation to rely on
> > it.
> > 
> > Suggested-by: Eric W. Biederman <ebied...@xmission.com>
> > Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
> 
> Reviewed-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> 
> Or let me know if you would like me to carry this patch.  Either way,
> just let me know!
> 

Thanks! Thomas told be he will take both.

Anna-Maria


> 
> > ---
> >  include/linux/rcupdate.h | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 36360d07f25b..64644fda3b22 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void)
> >   * Unfortunately, this function acquires the scheduler's runqueue and
> >   * priority-inheritance spinlocks.  This means that deadlock could result
> >   * if the caller of rcu_read_unlock() already holds one of these locks or
> > - * any lock that is ever acquired while holding them; or any lock which
> > - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> > - * does not disable irqs while taking ->wait_lock.
> > + * any lock that is ever acquired while holding them.
> >   *
> >   * That said, RCU readers are never priority boosted unless they were
> >   * preempted.  Therefore, one way to avoid deadlock is to make sure
> > -- 
> > 2.15.1
> > 
> 
> 


Re: [PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()

2018-05-28 Thread Anna-Maria Gleixner
On Fri, 25 May 2018, Paul E. McKenney wrote:

> On Fri, May 25, 2018 at 11:05:06AM +0200, Anna-Maria Gleixner wrote:
> > Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
> > explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
> > wait_lock is no longer valid.
> > 
> > Remove it to prevent kernel developers reading the documentation to rely on
> > it.
> > 
> > Suggested-by: Eric W. Biederman 
> > Signed-off-by: Anna-Maria Gleixner 
> 
> Reviewed-by: Paul E. McKenney 
> 
> Or let me know if you would like me to carry this patch.  Either way,
> just let me know!
> 

Thanks! Thomas told be he will take both.

Anna-Maria


> 
> > ---
> >  include/linux/rcupdate.h | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 36360d07f25b..64644fda3b22 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -653,9 +653,7 @@ static inline void rcu_read_lock(void)
> >   * Unfortunately, this function acquires the scheduler's runqueue and
> >   * priority-inheritance spinlocks.  This means that deadlock could result
> >   * if the caller of rcu_read_unlock() already holds one of these locks or
> > - * any lock that is ever acquired while holding them; or any lock which
> > - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> > - * does not disable irqs while taking ->wait_lock.
> > + * any lock that is ever acquired while holding them.
> >   *
> >   * That said, RCU readers are never priority boosted unless they were
> >   * preempted.  Therefore, one way to avoid deadlock is to make sure
> > -- 
> > 2.15.1
> > 
> 
> 


[PATCH] afs/server: Remove leftover variable

2018-05-25 Thread Anna-Maria Gleixner
Variable ret is set two times in afs_install_server() but never
dereferenced. It is a leftover of a rework of afs_install_server() by
commit d2ddc776a458 ("afs: Overhaul volume and server record caching and
fileserver rotation").

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
---
 fs/afs/server.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/afs/server.c b/fs/afs/server.c
index 3af4625e2f8c..b69a70362bd5 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -153,7 +153,7 @@ static struct afs_server *afs_install_server(struct afs_net 
*net,
const struct afs_addr_list *alist;
struct afs_server *server;
struct rb_node **pp, *p;
-   int ret = -EEXIST, diff;
+   int diff;
 
_enter("%p", candidate);
 
@@ -198,7 +198,6 @@ static struct afs_server *afs_install_server(struct afs_net 
*net,
hlist_add_head_rcu(>addr6_link, >fs_addresses6);
 
write_sequnlock(>fs_addr_lock);
-   ret = 0;
 
 exists:
afs_get_server(server);
-- 
2.15.1



[PATCH] afs/server: Remove leftover variable

2018-05-25 Thread Anna-Maria Gleixner
Variable ret is set two times in afs_install_server() but never
dereferenced. It is a leftover of a rework of afs_install_server() by
commit d2ddc776a458 ("afs: Overhaul volume and server record caching and
fileserver rotation").

Signed-off-by: Anna-Maria Gleixner 
---
 fs/afs/server.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/afs/server.c b/fs/afs/server.c
index 3af4625e2f8c..b69a70362bd5 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -153,7 +153,7 @@ static struct afs_server *afs_install_server(struct afs_net 
*net,
const struct afs_addr_list *alist;
struct afs_server *server;
struct rb_node **pp, *p;
-   int ret = -EEXIST, diff;
+   int diff;
 
_enter("%p", candidate);
 
@@ -198,7 +198,6 @@ static struct afs_server *afs_install_server(struct afs_net 
*net,
hlist_add_head_rcu(>addr6_link, >fs_addresses6);
 
write_sequnlock(>fs_addr_lock);
-   ret = 0;
 
 exists:
afs_get_server(server);
-- 
2.15.1



[PATCH v2 0/2] rtmutex wait_lock is irq safe

2018-05-25 Thread Anna-Maria Gleixner
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the rtmutex
wait_lock is irq safe. Therefore the irqsave/restore in kernel/signal is no
longer required (see Patch 2/2). During discussions about v1 of this patch,
Eric Biederman noticed, that there is a no longer valid rcu_read_unlock()
documentation.

Therefore sending a short queue: fixing first the documentation of
rcu_read_unlock() and afterwards removing irqsave/restore in kernel/signal.

v1..v2:

 - Add new patch updating rcu documentation as suggested by Eric Biederman
 - Udpate commit message of kernel/signal patch

Thanks,

Anna-Maria


Anna-Maria Gleixner (2):
  rcu: Update documentation of rcu_read_unlock()
  signal: Remove no longer required irqsave/restore

 include/linux/rcupdate.h |  4 +---
 kernel/signal.c  | 24 +++-
 2 files changed, 8 insertions(+), 20 deletions(-)

-- 
2.15.1



[PATCH v2 2/2] signal: Remove no longer required irqsave/restore

2018-05-25 Thread Anna-Maria Gleixner
Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make
wait_lock irq safe") for a different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock
irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep
enabled as suggested by Paul McKenney.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Acked-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9c33163a6165..19679ad77aa6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }
-- 
2.15.1



[PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()

2018-05-25 Thread Anna-Maria Gleixner
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
wait_lock is no longer valid.

Remove it to prevent kernel developers reading the documentation to rely on
it.

Suggested-by: Eric W. Biederman <ebied...@xmission.com>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
---
 include/linux/rcupdate.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 36360d07f25b..64644fda3b22 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -653,9 +653,7 @@ static inline void rcu_read_lock(void)
  * Unfortunately, this function acquires the scheduler's runqueue and
  * priority-inheritance spinlocks.  This means that deadlock could result
  * if the caller of rcu_read_unlock() already holds one of these locks or
- * any lock that is ever acquired while holding them; or any lock which
- * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
- * does not disable irqs while taking ->wait_lock.
+ * any lock that is ever acquired while holding them.
  *
  * That said, RCU readers are never priority boosted unless they were
  * preempted.  Therefore, one way to avoid deadlock is to make sure
-- 
2.15.1



[PATCH v2 0/2] rtmutex wait_lock is irq safe

2018-05-25 Thread Anna-Maria Gleixner
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the rtmutex
wait_lock is irq safe. Therefore the irqsave/restore in kernel/signal is no
longer required (see Patch 2/2). During discussions about v1 of this patch,
Eric Biederman noticed, that there is a no longer valid rcu_read_unlock()
documentation.

Therefore sending a short queue: fixing first the documentation of
rcu_read_unlock() and afterwards removing irqsave/restore in kernel/signal.

v1..v2:

 - Add new patch updating rcu documentation as suggested by Eric Biederman
 - Udpate commit message of kernel/signal patch

Thanks,

Anna-Maria


Anna-Maria Gleixner (2):
  rcu: Update documentation of rcu_read_unlock()
  signal: Remove no longer required irqsave/restore

 include/linux/rcupdate.h |  4 +---
 kernel/signal.c  | 24 +++-
 2 files changed, 8 insertions(+), 20 deletions(-)

-- 
2.15.1



[PATCH v2 2/2] signal: Remove no longer required irqsave/restore

2018-05-25 Thread Anna-Maria Gleixner
Commit a841796f11c9 ("signal: align __lock_task_sighand() irq disabling and
RCU") introduced a rcu read side critical section with interrupts
disabled. The changelog suggested that a better long-term fix would be "to
make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
->wait_lock".

This long-term fix has been made in commit b4abf91047cf ("rtmutex: Make
wait_lock irq safe") for a different reason.

Therefore revert commit a841796f11c9 ("signal: align >
__lock_task_sighand() irq disabling and RCU") as the interrupt disable
dance is not longer required.

The change was tested on the base of b4abf91047cf ("rtmutex: Make wait_lock
irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep
enabled as suggested by Paul McKenney.

Signed-off-by: Anna-Maria Gleixner 
Acked-by: Paul E. McKenney 
---
 kernel/signal.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9c33163a6165..19679ad77aa6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1244,19 +1244,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 {
struct sighand_struct *sighand;
 
+   rcu_read_lock();
for (;;) {
-   /*
-* Disable interrupts early to avoid deadlocks.
-* See rcu_read_unlock() comment header for details.
-*/
-   local_irq_save(*flags);
-   rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
-   if (unlikely(sighand == NULL)) {
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   if (unlikely(sighand == NULL))
break;
-   }
+
/*
 * This sighand can be already freed and even reused, but
 * we rely on SLAB_TYPESAFE_BY_RCU and sighand_ctor() which
@@ -1268,15 +1261,12 @@ struct sighand_struct *__lock_task_sighand(struct 
task_struct *tsk,
 * __exit_signal(). In the latter case the next iteration
 * must see ->sighand == NULL.
 */
-   spin_lock(>siglock);
-   if (likely(sighand == tsk->sighand)) {
-   rcu_read_unlock();
+   spin_lock_irqsave(>siglock, *flags);
+   if (likely(sighand == tsk->sighand))
break;
-   }
-   spin_unlock(>siglock);
-   rcu_read_unlock();
-   local_irq_restore(*flags);
+   spin_unlock_irqrestore(>siglock, *flags);
}
+   rcu_read_unlock();
 
return sighand;
 }
-- 
2.15.1



[PATCH v2 1/2] rcu: Update documentation of rcu_read_unlock()

2018-05-25 Thread Anna-Maria Gleixner
Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the
explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
wait_lock is no longer valid.

Remove it to prevent kernel developers reading the documentation to rely on
it.

Suggested-by: Eric W. Biederman 
Signed-off-by: Anna-Maria Gleixner 
---
 include/linux/rcupdate.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 36360d07f25b..64644fda3b22 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -653,9 +653,7 @@ static inline void rcu_read_lock(void)
  * Unfortunately, this function acquires the scheduler's runqueue and
  * priority-inheritance spinlocks.  This means that deadlock could result
  * if the caller of rcu_read_unlock() already holds one of these locks or
- * any lock that is ever acquired while holding them; or any lock which
- * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
- * does not disable irqs while taking ->wait_lock.
+ * any lock that is ever acquired while holding them.
  *
  * That said, RCU readers are never priority boosted unless they were
  * preempted.  Therefore, one way to avoid deadlock is to make sure
-- 
2.15.1



Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

2018-05-08 Thread Anna-Maria Gleixner
On Tue, 8 May 2018, Paul E. McKenney wrote:

> On Tue, May 08, 2018 at 03:42:25PM +0200, Anna-Maria Gleixner wrote:
> > On Sat, 5 May 2018, Thomas Gleixner wrote:
> > 
> > > On Fri, 4 May 2018, Paul E. McKenney wrote:
> > > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> > > > > > (Me, I would run rcutorture scenario TREE03 for an extended time 
> > > > > > period
> > > > > > on b4abf91047cf with your patch applied.
> > > > 
> > > > And with lockdep enabled, which TREE03 does not do by default.
> > > 
> > > Will run that again to make sure.
> > 
> > I ran the rcutorture scenario TREE03 for 4 hours with the above
> > described setup. It was successful and without any lockdep splats.
> 
> Thank you for the testing, Anna-Maria!  If you give them a Tested-by,
> I will give them an ack.  ;-)
> 

If it is ok to give a Tested-by to the patch I wrote, I will do this
to get your ack :)

Tested-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>


Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

2018-05-08 Thread Anna-Maria Gleixner
On Tue, 8 May 2018, Paul E. McKenney wrote:

> On Tue, May 08, 2018 at 03:42:25PM +0200, Anna-Maria Gleixner wrote:
> > On Sat, 5 May 2018, Thomas Gleixner wrote:
> > 
> > > On Fri, 4 May 2018, Paul E. McKenney wrote:
> > > > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> > > > > > (Me, I would run rcutorture scenario TREE03 for an extended time 
> > > > > > period
> > > > > > on b4abf91047cf with your patch applied.
> > > > 
> > > > And with lockdep enabled, which TREE03 does not do by default.
> > > 
> > > Will run that again to make sure.
> > 
> > I ran the rcutorture scenario TREE03 for 4 hours with the above
> > described setup. It was successful and without any lockdep splats.
> 
> Thank you for the testing, Anna-Maria!  If you give them a Tested-by,
> I will give them an ack.  ;-)
> 

If it is ok to give a Tested-by to the patch I wrote, I will do this
to get your ack :)

Tested-by: Anna-Maria Gleixner 


Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

2018-05-08 Thread Anna-Maria Gleixner
On Sat, 5 May 2018, Thomas Gleixner wrote:

> On Fri, 4 May 2018, Paul E. McKenney wrote:
> > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> > > > (Me, I would run rcutorture scenario TREE03 for an extended time period
> > > > on b4abf91047cf with your patch applied.
> > 
> > And with lockdep enabled, which TREE03 does not do by default.
> 
> Will run that again to make sure.
> 

I ran the rcutorture scenario TREE03 for 4 hours with the above
described setup. It was successful and without any lockdep splats.

Anna-Maria


Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

2018-05-08 Thread Anna-Maria Gleixner
On Sat, 5 May 2018, Thomas Gleixner wrote:

> On Fri, 4 May 2018, Paul E. McKenney wrote:
> > On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> > > > (Me, I would run rcutorture scenario TREE03 for an extended time period
> > > > on b4abf91047cf with your patch applied.
> > 
> > And with lockdep enabled, which TREE03 does not do by default.
> 
> Will run that again to make sure.
> 

I ran the rcutorture scenario TREE03 for 4 hours with the above
described setup. It was successful and without any lockdep splats.

Anna-Maria


Hashed pointer issues

2018-04-30 Thread Anna-Maria Gleixner
Hi,

I stumbled over an issue with hashed pointers and tracing. 

I'm using trace points for examination and on error the trace buffers
are dumped. The error occurs when entropy has not been set up, so the
pointers are not hashed and only (ptrval) is printed instead. The
pointers are required to distinguish the different objects in the
trace.

Beside workarounds like patching lib/vsprintf.c helpers before testing
or dumping trace buffers later (given that kernel comes up properly
and entropy is set up), is there a possible generic solution for this
issue? A commandline option for disabling the pointer obfuscation
would be a pretty handy tool.


Thanks,

  Anna-Maria


Hashed pointer issues

2018-04-30 Thread Anna-Maria Gleixner
Hi,

I stumbled over an issue with hashed pointers and tracing. 

I'm using trace points for examination and on error the trace buffers
are dumped. The error occurs when entropy has not been set up, so the
pointers are not hashed and only (ptrval) is printed instead. The
pointers are required to distinguish the different objects in the
trace.

Beside workarounds like patching lib/vsprintf.c helpers before testing
or dumping trace buffers later (given that kernel comes up properly
and entropy is set up), is there a possible generic solution for this
issue? A commandline option for disabling the pointer obfuscation
would be a pretty handy tool.


Thanks,

  Anna-Maria


[tip:timers/core] hrtimer: Implement SOFT/HARD clock base selection

2018-01-16 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  42f42da41b54c191ae6a775e84a86c100d66c5e8
Gitweb: https://git.kernel.org/tip/42f42da41b54c191ae6a775e84a86c100d66c5e8
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:58 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 09:51:22 +0100

hrtimer: Implement SOFT/HARD clock base selection

All prerequisites to handle hrtimers for expiry in either hard or soft
interrupt context are in place.

Add the missing bit in hrtimer_init() which associates the timer to the
hard or the softirq clock base.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d93e3e7..3d20158 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1220,8 +1220,9 @@ static inline int hrtimer_clockid_to_base(clockid_t 
clock_id)
 static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
   enum hrtimer_mode mode)
 {
+   bool softtimer = !!(mode & HRTIMER_MODE_SOFT);
+   int base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
struct hrtimer_cpu_base *cpu_base;
-   int base;
 
memset(timer, 0, sizeof(struct hrtimer));
 
@@ -1235,7 +1236,8 @@ static void __hrtimer_init(struct hrtimer *timer, 
clockid_t clock_id,
if (clock_id == CLOCK_REALTIME && mode & HRTIMER_MODE_REL)
clock_id = CLOCK_MONOTONIC;
 
-   base = hrtimer_clockid_to_base(clock_id);
+   base += hrtimer_clockid_to_base(clock_id);
+   timer->is_soft = softtimer;
timer->base = _base->clock_base[base];
timerqueue_init(>node);
 }
@@ -1244,8 +1246,13 @@ static void __hrtimer_init(struct hrtimer *timer, 
clockid_t clock_id,
  * hrtimer_init - initialize a timer to the given clock
  * @timer: the timer to be initialized
  * @clock_id:  the clock to be used
- * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
- * relative (HRTIMER_MODE_REL); pinned is not considered here!
+ * @mode:   The modes which are relevant for intitialization:
+ *  HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT,
+ *  HRTIMER_MODE_REL_SOFT
+ *
+ *  The PINNED variants of the above can be handed in,
+ *  but the PINNED bit is ignored as pinning happens
+ *  when the hrtimer is started
  */
 void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
  enum hrtimer_mode mode)


[tip:timers/core] hrtimer: Implement SOFT/HARD clock base selection

2018-01-16 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  42f42da41b54c191ae6a775e84a86c100d66c5e8
Gitweb: https://git.kernel.org/tip/42f42da41b54c191ae6a775e84a86c100d66c5e8
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:58 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 09:51:22 +0100

hrtimer: Implement SOFT/HARD clock base selection

All prerequisites to handle hrtimers for expiry in either hard or soft
interrupt context are in place.

Add the missing bit in hrtimer_init() which associates the timer to the
hard or the softirq clock base.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d93e3e7..3d20158 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1220,8 +1220,9 @@ static inline int hrtimer_clockid_to_base(clockid_t 
clock_id)
 static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
   enum hrtimer_mode mode)
 {
+   bool softtimer = !!(mode & HRTIMER_MODE_SOFT);
+   int base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
struct hrtimer_cpu_base *cpu_base;
-   int base;
 
memset(timer, 0, sizeof(struct hrtimer));
 
@@ -1235,7 +1236,8 @@ static void __hrtimer_init(struct hrtimer *timer, 
clockid_t clock_id,
if (clock_id == CLOCK_REALTIME && mode & HRTIMER_MODE_REL)
clock_id = CLOCK_MONOTONIC;
 
-   base = hrtimer_clockid_to_base(clock_id);
+   base += hrtimer_clockid_to_base(clock_id);
+   timer->is_soft = softtimer;
timer->base = _base->clock_base[base];
timerqueue_init(>node);
 }
@@ -1244,8 +1246,13 @@ static void __hrtimer_init(struct hrtimer *timer, 
clockid_t clock_id,
  * hrtimer_init - initialize a timer to the given clock
  * @timer: the timer to be initialized
  * @clock_id:  the clock to be used
- * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
- * relative (HRTIMER_MODE_REL); pinned is not considered here!
+ * @mode:   The modes which are relevant for intitialization:
+ *  HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT,
+ *  HRTIMER_MODE_REL_SOFT
+ *
+ *  The PINNED variants of the above can be handed in,
+ *  but the PINNED bit is ignored as pinning happens
+ *  when the hrtimer is started
  */
 void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
  enum hrtimer_mode mode)


[tip:timers/core] hrtimer: Implement support for softirq based hrtimers

2018-01-16 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  5da70160462e80b0ab8a6960cdd0cdd476907523
Gitweb: https://git.kernel.org/tip/5da70160462e80b0ab8a6960cdd0cdd476907523
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:57 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 09:51:22 +0100

hrtimer: Implement support for softirq based hrtimers

hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.

That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.

Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.

Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.

Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.

Suggested-by: Thomas Gleixner <t...@linutronix.de>
Suggested-by: Peter Zijlstra <pet...@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/hrtimer.h |  21 --
 kernel/time/hrtimer.c   | 196 ++--
 2 files changed, 188 insertions(+), 29 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 26ae8a8..c7902ca 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -103,6 +103,7 @@ enum hrtimer_restart {
  * @base:  pointer to the timer base (per cpu and per clock)
  * @state: state information (See bit values above)
  * @is_rel:Set if the timer was armed relative
+ * @is_soft:   Set if hrtimer will be expired in soft interrupt context.
  *
  * The hrtimer structure must be initialized by hrtimer_init()
  */
@@ -113,6 +114,7 @@ struct hrtimer {
struct hrtimer_clock_base   *base;
u8  state;
u8  is_rel;
+   u8  is_soft;
 };
 
 /**
@@ -178,13 +180,18 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
+ * @softirq_activated: displays, if the softirq is raised - update of softirq
+ * related settings is not required then.
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
  * @expires_next:  absolute time of the next event, is required for remote
- * hrtimer enqueue
+ * hrtimer enqueue; it is the total first expiry time (hard
+ * and soft hrtimer are taken into account)
  * @next_timer:Pointer to the first expiring timer
+ * @softirq_expires_next: Time to check, if soft queues needs also to be 
expired
+ * @softirq_next_timer: Pointer to the first expiring softirq based timer
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -196,9 +203,10 @@ struct hrtimer_cpu_base {
unsigned intcpu;
unsigned intactive_bases;
unsigned intclock_was_set_seq;
-   unsigned inthres_active : 1,
-   in_hrtirq   : 1,
-   hang_detected   : 1;
+   unsigned inthres_active : 1,
+   in_hrtirq   : 1,
+

[tip:timers/core] hrtimer: Implement support for softirq based hrtimers

2018-01-16 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  5da70160462e80b0ab8a6960cdd0cdd476907523
Gitweb: https://git.kernel.org/tip/5da70160462e80b0ab8a6960cdd0cdd476907523
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:57 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 09:51:22 +0100

hrtimer: Implement support for softirq based hrtimers

hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.

That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.

Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.

Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.

Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.

Suggested-by: Thomas Gleixner 
Suggested-by: Peter Zijlstra 
Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 include/linux/hrtimer.h |  21 --
 kernel/time/hrtimer.c   | 196 ++--
 2 files changed, 188 insertions(+), 29 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 26ae8a8..c7902ca 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -103,6 +103,7 @@ enum hrtimer_restart {
  * @base:  pointer to the timer base (per cpu and per clock)
  * @state: state information (See bit values above)
  * @is_rel:Set if the timer was armed relative
+ * @is_soft:   Set if hrtimer will be expired in soft interrupt context.
  *
  * The hrtimer structure must be initialized by hrtimer_init()
  */
@@ -113,6 +114,7 @@ struct hrtimer {
struct hrtimer_clock_base   *base;
u8  state;
u8  is_rel;
+   u8  is_soft;
 };
 
 /**
@@ -178,13 +180,18 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
+ * @softirq_activated: displays, if the softirq is raised - update of softirq
+ * related settings is not required then.
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
  * @expires_next:  absolute time of the next event, is required for remote
- * hrtimer enqueue
+ * hrtimer enqueue; it is the total first expiry time (hard
+ * and soft hrtimer are taken into account)
  * @next_timer:Pointer to the first expiring timer
+ * @softirq_expires_next: Time to check, if soft queues needs also to be 
expired
+ * @softirq_next_timer: Pointer to the first expiring softirq based timer
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -196,9 +203,10 @@ struct hrtimer_cpu_base {
unsigned intcpu;
unsigned intactive_bases;
unsigned intclock_was_set_seq;
-   unsigned inthres_active : 1,
-   in_hrtirq   : 1,
-   hang_detected   : 1;
+   unsigned inthres_active : 1,
+   in_hrtirq   : 1,
+   hang_detected   : 1,
+   softirq_activated   : 1;
 #ifdef CONFIG_HIGH_RES_TIMERS
unsigned intnr_events;
unsigned short  nr_retries;
@@ -207,6 +215,8 @@ struct

[tip:timers/core] hrtimer: Prepare handling of hard and softirq based hrtimers

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  c458b1d102036eaa2c70e03000c959bd491c2037
Gitweb: https://git.kernel.org/tip/c458b1d102036eaa2c70e03000c959bd491c2037
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:56 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 03:01:20 +0100

hrtimer: Prepare handling of hard and softirq based hrtimers

The softirq based hrtimer can utilize most of the existing hrtimers
functions, but need to operate on a different data set.

Add an 'active_mask' parameter to various functions so the hard and soft bases
can be selected. Fixup the existing callers and hand in the ACTIVE_HARD
mask.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e2353f5..ba4674e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -60,6 +60,15 @@
 #include "tick-internal.h"
 
 /*
+ * Masks for selecting the soft and hard context timers from
+ * cpu_base->active
+ */
+#define MASK_SHIFT (HRTIMER_BASE_MONOTONIC_SOFT)
+#define HRTIMER_ACTIVE_HARD((1U << MASK_SHIFT) - 1)
+#define HRTIMER_ACTIVE_SOFT(HRTIMER_ACTIVE_HARD << MASK_SHIFT)
+#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+
+/*
  * The timer bases:
  *
  * There are more clockids than hrtimer bases. Thus, we index
@@ -507,13 +516,24 @@ static ktime_t __hrtimer_next_event_base(struct 
hrtimer_cpu_base *cpu_base,
return expires_next;
 }
 
-static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+/*
+ * Recomputes cpu_base::*next_timer and returns the earliest expires_next but
+ * does not set cpu_base::*expires_next, that is done by hrtimer_reprogram.
+ *
+ * @active_mask must be one of:
+ *  - HRTIMER_ACTIVE,
+ *  - HRTIMER_ACTIVE_SOFT, or
+ *  - HRTIMER_ACTIVE_HARD.
+ */
+static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base,
+   unsigned int active_mask)
 {
-   unsigned int active = cpu_base->active_bases;
+   unsigned int active;
ktime_t expires_next = KTIME_MAX;
 
cpu_base->next_timer = NULL;
 
+   active = cpu_base->active_bases & active_mask;
expires_next = __hrtimer_next_event_base(cpu_base, active, 
expires_next);
 
return expires_next;
@@ -553,7 +573,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 {
ktime_t expires_next;
 
-   expires_next = __hrtimer_get_next_event(cpu_base);
+   expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD);
 
if (skip_equal && expires_next == cpu_base->expires_next)
return;
@@ -1074,7 +1094,7 @@ u64 hrtimer_get_next_event(void)
raw_spin_lock_irqsave(_base->lock, flags);
 
if (!__hrtimer_hres_active(cpu_base))
-   expires = __hrtimer_get_next_event(cpu_base);
+   expires = __hrtimer_get_next_event(cpu_base, 
HRTIMER_ACTIVE_HARD);
 
raw_spin_unlock_irqrestore(_base->lock, flags);
 
@@ -1248,10 +1268,10 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
 }
 
 static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now,
-unsigned long flags)
+unsigned long flags, unsigned int active_mask)
 {
struct hrtimer_clock_base *base;
-   unsigned int active = cpu_base->active_bases;
+   unsigned int active = cpu_base->active_bases & active_mask;
 
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *node;
@@ -1314,10 +1334,10 @@ retry:
 */
cpu_base->expires_next = KTIME_MAX;
 
-   __hrtimer_run_queues(cpu_base, now, flags);
+   __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD);
 
/* Reevaluate the clock bases for the next expiry */
-   expires_next = __hrtimer_get_next_event(cpu_base);
+   expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD);
/*
 * Store the new expiry value so the migration code can verify
 * against it.
@@ -1421,7 +1441,7 @@ void hrtimer_run_queues(void)
 
raw_spin_lock_irqsave(_base->lock, flags);
now = hrtimer_update_base(cpu_base);
-   __hrtimer_run_queues(cpu_base, now, flags);
+   __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD);
raw_spin_unlock_irqrestore(_base->lock, flags);
 }
 


[tip:timers/core] hrtimer: Prepare handling of hard and softirq based hrtimers

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  c458b1d102036eaa2c70e03000c959bd491c2037
Gitweb: https://git.kernel.org/tip/c458b1d102036eaa2c70e03000c959bd491c2037
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:56 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 03:01:20 +0100

hrtimer: Prepare handling of hard and softirq based hrtimers

The softirq based hrtimer can utilize most of the existing hrtimers
functions, but need to operate on a different data set.

Add an 'active_mask' parameter to various functions so the hard and soft bases
can be selected. Fixup the existing callers and hand in the ACTIVE_HARD
mask.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e2353f5..ba4674e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -60,6 +60,15 @@
 #include "tick-internal.h"
 
 /*
+ * Masks for selecting the soft and hard context timers from
+ * cpu_base->active
+ */
+#define MASK_SHIFT (HRTIMER_BASE_MONOTONIC_SOFT)
+#define HRTIMER_ACTIVE_HARD((1U << MASK_SHIFT) - 1)
+#define HRTIMER_ACTIVE_SOFT(HRTIMER_ACTIVE_HARD << MASK_SHIFT)
+#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+
+/*
  * The timer bases:
  *
  * There are more clockids than hrtimer bases. Thus, we index
@@ -507,13 +516,24 @@ static ktime_t __hrtimer_next_event_base(struct 
hrtimer_cpu_base *cpu_base,
return expires_next;
 }
 
-static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+/*
+ * Recomputes cpu_base::*next_timer and returns the earliest expires_next but
+ * does not set cpu_base::*expires_next, that is done by hrtimer_reprogram.
+ *
+ * @active_mask must be one of:
+ *  - HRTIMER_ACTIVE,
+ *  - HRTIMER_ACTIVE_SOFT, or
+ *  - HRTIMER_ACTIVE_HARD.
+ */
+static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base,
+   unsigned int active_mask)
 {
-   unsigned int active = cpu_base->active_bases;
+   unsigned int active;
ktime_t expires_next = KTIME_MAX;
 
cpu_base->next_timer = NULL;
 
+   active = cpu_base->active_bases & active_mask;
expires_next = __hrtimer_next_event_base(cpu_base, active, 
expires_next);
 
return expires_next;
@@ -553,7 +573,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 {
ktime_t expires_next;
 
-   expires_next = __hrtimer_get_next_event(cpu_base);
+   expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD);
 
if (skip_equal && expires_next == cpu_base->expires_next)
return;
@@ -1074,7 +1094,7 @@ u64 hrtimer_get_next_event(void)
raw_spin_lock_irqsave(_base->lock, flags);
 
if (!__hrtimer_hres_active(cpu_base))
-   expires = __hrtimer_get_next_event(cpu_base);
+   expires = __hrtimer_get_next_event(cpu_base, 
HRTIMER_ACTIVE_HARD);
 
raw_spin_unlock_irqrestore(_base->lock, flags);
 
@@ -1248,10 +1268,10 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
 }
 
 static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now,
-unsigned long flags)
+unsigned long flags, unsigned int active_mask)
 {
struct hrtimer_clock_base *base;
-   unsigned int active = cpu_base->active_bases;
+   unsigned int active = cpu_base->active_bases & active_mask;
 
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *node;
@@ -1314,10 +1334,10 @@ retry:
 */
cpu_base->expires_next = KTIME_MAX;
 
-   __hrtimer_run_queues(cpu_base, now, flags);
+   __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD);
 
/* Reevaluate the clock bases for the next expiry */
-   expires_next = __hrtimer_get_next_event(cpu_base);
+   expires_next = __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_HARD);
/*
 * Store the new expiry value so the migration code can verify
 * against it.
@@ -1421,7 +1441,7 @@ void hrtimer_run_queues(void)
 
raw_spin_lock_irqsave(_base->lock, flags);
now = hrtimer_update_base(cpu_base);
-   __hrtimer_run_queues(cpu_base, now, flags);
+   __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD);
raw_spin_unlock_irqrestore(_base->lock, flags);
 }
 


[tip:timers/core] hrtimer: Add clock bases and hrtimer mode for softirq context

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  98ecadd4305d8677ba77162152485798d47dcc85
Gitweb: https://git.kernel.org/tip/98ecadd4305d8677ba77162152485798d47dcc85
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:55 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 03:00:50 +0100

hrtimer: Add clock bases and hrtimer mode for softirq context

Currently hrtimer callback functions are always executed in hard interrupt
context. Users of hrtimers, which need their timer function to be executed
in soft interrupt context, make use of tasklets to get the proper context.

Add additional hrtimer clock bases for timers which must expire in softirq
context, so the detour via the tasklet can be avoided. This is also
required for RT, where the majority of hrtimer is moved into softirq
hrtimer context.

The selection of the expiry mode happens via a mode bit. Introduce
HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
bits and update the decoding of hrtimer_mode in tracepoints.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/hrtimer.h  | 14 ++
 include/trace/events/timer.h |  6 +-
 kernel/time/hrtimer.c| 20 
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 98ed357..26ae8a8 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -33,14 +33,24 @@ struct hrtimer_cpu_base;
  * HRTIMER_MODE_REL- Time value is relative to now
  * HRTIMER_MODE_PINNED - Timer is bound to CPU (is only considered
  *   when starting the timer)
+ * HRTIMER_MODE_SOFT   - Timer callback function will be executed in
+ *   soft irq context
  */
 enum hrtimer_mode {
HRTIMER_MODE_ABS= 0x00,
HRTIMER_MODE_REL= 0x01,
HRTIMER_MODE_PINNED = 0x02,
+   HRTIMER_MODE_SOFT   = 0x04,
 
HRTIMER_MODE_ABS_PINNED = HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED,
HRTIMER_MODE_REL_PINNED = HRTIMER_MODE_REL | HRTIMER_MODE_PINNED,
+
+   HRTIMER_MODE_ABS_SOFT   = HRTIMER_MODE_ABS | HRTIMER_MODE_SOFT,
+   HRTIMER_MODE_REL_SOFT   = HRTIMER_MODE_REL | HRTIMER_MODE_SOFT,
+
+   HRTIMER_MODE_ABS_PINNED_SOFT = HRTIMER_MODE_ABS_PINNED | 
HRTIMER_MODE_SOFT,
+   HRTIMER_MODE_REL_PINNED_SOFT = HRTIMER_MODE_REL_PINNED | 
HRTIMER_MODE_SOFT,
+
 };
 
 /*
@@ -151,6 +161,10 @@ enum  hrtimer_base_type {
HRTIMER_BASE_REALTIME,
HRTIMER_BASE_BOOTTIME,
HRTIMER_BASE_TAI,
+   HRTIMER_BASE_MONOTONIC_SOFT,
+   HRTIMER_BASE_REALTIME_SOFT,
+   HRTIMER_BASE_BOOTTIME_SOFT,
+   HRTIMER_BASE_TAI_SOFT,
HRTIMER_MAX_CLOCK_BASES,
 };
 
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 744b431..a57e4ee 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -148,7 +148,11 @@ DEFINE_EVENT(timer_class, timer_cancel,
{ HRTIMER_MODE_ABS, "ABS"   },  \
{ HRTIMER_MODE_REL, "REL"   },  \
{ HRTIMER_MODE_ABS_PINNED,  "ABS|PINNED"},  \
-   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"})
+   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"},  \
+   { HRTIMER_MODE_ABS_SOFT,"ABS|SOFT"  },  \
+   { HRTIMER_MODE_REL_SOFT,"REL|SOFT"  },  \
+   { HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" },\
+   { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" })
 
 /**
  * hrtimer_init - called when the hrtimer is initialized
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 31ccd86..e2353f5 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -92,6 +92,26 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = _get_clocktai,
},
+   {
+   .index = HRTIMER_BASE_MONOTONIC_SOFT,
+   .clockid = CLOCK_MONOTONIC,
+   .get_time = _get,
+   },
+   {
+   .index = HRTIMER_BASE_REALTIME_SOFT,
+   .clockid = CLOCK_REALTIME,
+   .get_time = _get_real,
+  

[tip:timers/core] hrtimer: Add clock bases and hrtimer mode for softirq context

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  98ecadd4305d8677ba77162152485798d47dcc85
Gitweb: https://git.kernel.org/tip/98ecadd4305d8677ba77162152485798d47dcc85
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:55 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 03:00:50 +0100

hrtimer: Add clock bases and hrtimer mode for softirq context

Currently hrtimer callback functions are always executed in hard interrupt
context. Users of hrtimers, which need their timer function to be executed
in soft interrupt context, make use of tasklets to get the proper context.

Add additional hrtimer clock bases for timers which must expire in softirq
context, so the detour via the tasklet can be avoided. This is also
required for RT, where the majority of hrtimer is moved into softirq
hrtimer context.

The selection of the expiry mode happens via a mode bit. Introduce
HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
bits and update the decoding of hrtimer_mode in tracepoints.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 include/linux/hrtimer.h  | 14 ++
 include/trace/events/timer.h |  6 +-
 kernel/time/hrtimer.c| 20 
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 98ed357..26ae8a8 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -33,14 +33,24 @@ struct hrtimer_cpu_base;
  * HRTIMER_MODE_REL- Time value is relative to now
  * HRTIMER_MODE_PINNED - Timer is bound to CPU (is only considered
  *   when starting the timer)
+ * HRTIMER_MODE_SOFT   - Timer callback function will be executed in
+ *   soft irq context
  */
 enum hrtimer_mode {
HRTIMER_MODE_ABS= 0x00,
HRTIMER_MODE_REL= 0x01,
HRTIMER_MODE_PINNED = 0x02,
+   HRTIMER_MODE_SOFT   = 0x04,
 
HRTIMER_MODE_ABS_PINNED = HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED,
HRTIMER_MODE_REL_PINNED = HRTIMER_MODE_REL | HRTIMER_MODE_PINNED,
+
+   HRTIMER_MODE_ABS_SOFT   = HRTIMER_MODE_ABS | HRTIMER_MODE_SOFT,
+   HRTIMER_MODE_REL_SOFT   = HRTIMER_MODE_REL | HRTIMER_MODE_SOFT,
+
+   HRTIMER_MODE_ABS_PINNED_SOFT = HRTIMER_MODE_ABS_PINNED | 
HRTIMER_MODE_SOFT,
+   HRTIMER_MODE_REL_PINNED_SOFT = HRTIMER_MODE_REL_PINNED | 
HRTIMER_MODE_SOFT,
+
 };
 
 /*
@@ -151,6 +161,10 @@ enum  hrtimer_base_type {
HRTIMER_BASE_REALTIME,
HRTIMER_BASE_BOOTTIME,
HRTIMER_BASE_TAI,
+   HRTIMER_BASE_MONOTONIC_SOFT,
+   HRTIMER_BASE_REALTIME_SOFT,
+   HRTIMER_BASE_BOOTTIME_SOFT,
+   HRTIMER_BASE_TAI_SOFT,
HRTIMER_MAX_CLOCK_BASES,
 };
 
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 744b431..a57e4ee 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -148,7 +148,11 @@ DEFINE_EVENT(timer_class, timer_cancel,
{ HRTIMER_MODE_ABS, "ABS"   },  \
{ HRTIMER_MODE_REL, "REL"   },  \
{ HRTIMER_MODE_ABS_PINNED,  "ABS|PINNED"},  \
-   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"})
+   { HRTIMER_MODE_REL_PINNED,  "REL|PINNED"},  \
+   { HRTIMER_MODE_ABS_SOFT,"ABS|SOFT"  },  \
+   { HRTIMER_MODE_REL_SOFT,"REL|SOFT"  },  \
+   { HRTIMER_MODE_ABS_PINNED_SOFT, "ABS|PINNED|SOFT" },\
+   { HRTIMER_MODE_REL_PINNED_SOFT, "REL|PINNED|SOFT" })
 
 /**
  * hrtimer_init - called when the hrtimer is initialized
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 31ccd86..e2353f5 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -92,6 +92,26 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = _get_clocktai,
},
+   {
+   .index = HRTIMER_BASE_MONOTONIC_SOFT,
+   .clockid = CLOCK_MONOTONIC,
+   .get_time = _get,
+   },
+   {
+   .index = HRTIMER_BASE_REALTIME_SOFT,
+   .clockid = CLOCK_REALTIME,
+   .get_time = _get_real,
+   },
+   {
+   .index = HRTIMER_BASE_BOOTTIME_SOFT,
+   .clockid = CLOCK_BOOTTIME,
+   .get_time = _get_boottime,
+   },
+   {
+ 

[tip:timers/core] hrtimer: Use irqsave/irqrestore around __run_hrtimer()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff
Gitweb: https://git.kernel.org/tip/dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:54 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 03:00:47 +0100

hrtimer: Use irqsave/irqrestore around __run_hrtimer()

__run_hrtimer() is called with the hrtimer_cpu_base.lock held and
interrupts disabled. Before invoking the timer callback the base lock is
dropped, but interrupts stay disabled.

The upcoming support for softirq based hrtimers requires that interrupts
are enabled before the timer callback is invoked.

To avoid code duplication, take hrtimer_cpu_base.lock with
raw_spin_lock_irqsave(flags) at the call site and hand in the flags as
a parameter. So raw_spin_unlock_irqrestore() before the callback invocation
will either keep interrupts disabled in interrupt context or restore to
interrupt enabled state when called from softirq context.

Suggested-by: Peter Zijlstra <pet...@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5d9b81d..31ccd86 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1159,7 +1159,8 @@ EXPORT_SYMBOL_GPL(hrtimer_active);
 
 static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base,
  struct hrtimer_clock_base *base,
- struct hrtimer *timer, ktime_t *now)
+ struct hrtimer *timer, ktime_t *now,
+ unsigned long flags)
 {
enum hrtimer_restart (*fn)(struct hrtimer *);
int restart;
@@ -1194,11 +1195,11 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
 * protected against migration to a different CPU even if the lock
 * is dropped.
 */
-   raw_spin_unlock(_base->lock);
+   raw_spin_unlock_irqrestore(_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
restart = fn(timer);
trace_hrtimer_expire_exit(timer);
-   raw_spin_lock(_base->lock);
+   raw_spin_lock_irq(_base->lock);
 
/*
 * Note: We clear the running state after enqueue_hrtimer and
@@ -1226,7 +1227,8 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
base->running = NULL;
 }
 
-static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now)
+static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now,
+unsigned long flags)
 {
struct hrtimer_clock_base *base;
unsigned int active = cpu_base->active_bases;
@@ -1257,7 +1259,7 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base 
*cpu_base, ktime_t now)
if (basenow < hrtimer_get_softexpires_tv64(timer))
break;
 
-   __run_hrtimer(cpu_base, base, timer, );
+   __run_hrtimer(cpu_base, base, timer, , flags);
}
}
 }
@@ -1272,13 +1274,14 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 {
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
ktime_t expires_next, now, entry_time, delta;
+   unsigned long flags;
int retries = 0;
 
BUG_ON(!cpu_base->hres_active);
cpu_base->nr_events++;
dev->next_event = KTIME_MAX;
 
-   raw_spin_lock(_base->lock);
+   raw_spin_lock_irqsave(_base->lock, flags);
entry_time = now = hrtimer_update_base(cpu_base);
 retry:
cpu_base->in_hrtirq = 1;
@@ -1291,7 +1294,7 @@ retry:
 */
cpu_base->expires_next = KTIME_MAX;
 
-   __hrtimer_run_queues(cpu_base, now);
+   __hrtimer_run_queues(cpu_base, now, flags);
 
/* Reevaluate the clock bases for the next expiry */
expires_next = __hrtimer_get_next_event(cpu_base);
@@ -1301,7 +1304,7 @@ retry:
 */
cpu_base->expires_next = expires_next;
cpu_base->in_hrtirq = 0;
-   raw_spin_unlock(_base->lock);
+   raw_spin_unlock_irqrestore(_base->lock, flags);
 
/* Reprogramming necessary ? */
if (!tick_program_event(expires_next, 0)) {
@@ -1322,7 +1325,7 @@ retry:
 * Acquire base lock for updating the offsets and retrieving
 * the current time.
 */
-   raw_spin_lock(_base->lock);
+

[tip:timers/core] hrtimer: Use irqsave/irqrestore around __run_hrtimer()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff
Gitweb: https://git.kernel.org/tip/dd934aa8ad1fbaab3d916125c7fe42fff75aa7ff
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:54 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 03:00:47 +0100

hrtimer: Use irqsave/irqrestore around __run_hrtimer()

__run_hrtimer() is called with the hrtimer_cpu_base.lock held and
interrupts disabled. Before invoking the timer callback the base lock is
dropped, but interrupts stay disabled.

The upcoming support for softirq based hrtimers requires that interrupts
are enabled before the timer callback is invoked.

To avoid code duplication, take hrtimer_cpu_base.lock with
raw_spin_lock_irqsave(flags) at the call site and hand in the flags as
a parameter. So raw_spin_unlock_irqrestore() before the callback invocation
will either keep interrupts disabled in interrupt context or restore to
interrupt enabled state when called from softirq context.

Suggested-by: Peter Zijlstra 
Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5d9b81d..31ccd86 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1159,7 +1159,8 @@ EXPORT_SYMBOL_GPL(hrtimer_active);
 
 static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base,
  struct hrtimer_clock_base *base,
- struct hrtimer *timer, ktime_t *now)
+ struct hrtimer *timer, ktime_t *now,
+ unsigned long flags)
 {
enum hrtimer_restart (*fn)(struct hrtimer *);
int restart;
@@ -1194,11 +1195,11 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
 * protected against migration to a different CPU even if the lock
 * is dropped.
 */
-   raw_spin_unlock(_base->lock);
+   raw_spin_unlock_irqrestore(_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
restart = fn(timer);
trace_hrtimer_expire_exit(timer);
-   raw_spin_lock(_base->lock);
+   raw_spin_lock_irq(_base->lock);
 
/*
 * Note: We clear the running state after enqueue_hrtimer and
@@ -1226,7 +1227,8 @@ static void __run_hrtimer(struct hrtimer_cpu_base 
*cpu_base,
base->running = NULL;
 }
 
-static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now)
+static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t 
now,
+unsigned long flags)
 {
struct hrtimer_clock_base *base;
unsigned int active = cpu_base->active_bases;
@@ -1257,7 +1259,7 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base 
*cpu_base, ktime_t now)
if (basenow < hrtimer_get_softexpires_tv64(timer))
break;
 
-   __run_hrtimer(cpu_base, base, timer, );
+   __run_hrtimer(cpu_base, base, timer, , flags);
}
}
 }
@@ -1272,13 +1274,14 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 {
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
ktime_t expires_next, now, entry_time, delta;
+   unsigned long flags;
int retries = 0;
 
BUG_ON(!cpu_base->hres_active);
cpu_base->nr_events++;
dev->next_event = KTIME_MAX;
 
-   raw_spin_lock(_base->lock);
+   raw_spin_lock_irqsave(_base->lock, flags);
entry_time = now = hrtimer_update_base(cpu_base);
 retry:
cpu_base->in_hrtirq = 1;
@@ -1291,7 +1294,7 @@ retry:
 */
cpu_base->expires_next = KTIME_MAX;
 
-   __hrtimer_run_queues(cpu_base, now);
+   __hrtimer_run_queues(cpu_base, now, flags);
 
/* Reevaluate the clock bases for the next expiry */
expires_next = __hrtimer_get_next_event(cpu_base);
@@ -1301,7 +1304,7 @@ retry:
 */
cpu_base->expires_next = expires_next;
cpu_base->in_hrtirq = 0;
-   raw_spin_unlock(_base->lock);
+   raw_spin_unlock_irqrestore(_base->lock, flags);
 
/* Reprogramming necessary ? */
if (!tick_program_event(expires_next, 0)) {
@@ -1322,7 +1325,7 @@ retry:
 * Acquire base lock for updating the offsets and retrieving
 * the current time.
 */
-   raw_spin_lock(_base->lock);
+   raw_spin_lock_irqsave(_base->lock, flags);
now = hrtimer_update_base(cpu_base);
cpu_base->nr_retries++;
if (++retries < 3)
@@ -1335,7 +1338,8 @@ retry:
 */
cpu_base->nr_hangs++;
   

[tip:timers/core] hrtimer: Factor out __hrtimer_start_range_ns()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  138a6b7ae4dedde5513678f57b275eee19c41b6a
Gitweb: https://git.kernel.org/tip/138a6b7ae4dedde5513678f57b275eee19c41b6a
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:52 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:59 +0100

hrtimer: Factor out __hrtimer_start_range_ns()

Preparatory patch for softirq based hrtimers to avoid code duplication,
factor out the __hrtimer_start_range_ns() function from 
hrtimer_start_range_ns().

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 33a6c99..4142e6f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -905,22 +905,11 @@ static inline ktime_t hrtimer_update_lowres(struct 
hrtimer *timer, ktime_t tim,
return tim;
 }
 
-/**
- * hrtimer_start_range_ns - (re)start an hrtimer
- * @timer: the timer to be added
- * @tim:   expiry time
- * @delta_ns:  "slack" range for the timer
- * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
- * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED)
- */
-void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
-   u64 delta_ns, const enum hrtimer_mode mode)
+static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+   u64 delta_ns, const enum hrtimer_mode mode,
+   struct hrtimer_clock_base *base)
 {
-   struct hrtimer_clock_base *base, *new_base;
-   unsigned long flags;
-   int leftmost;
-
-   base = lock_hrtimer_base(timer, );
+   struct hrtimer_clock_base *new_base;
 
/* Remove an active timer from the queue: */
remove_hrtimer(timer, base, true);
@@ -935,12 +924,27 @@ void hrtimer_start_range_ns(struct hrtimer *timer, 
ktime_t tim,
/* Switch the timer base, if necessary: */
new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
 
-   leftmost = enqueue_hrtimer(timer, new_base, mode);
-   if (!leftmost)
-   goto unlock;
+   return enqueue_hrtimer(timer, new_base, mode);
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer
+ * @timer: the timer to be added
+ * @tim:   expiry time
+ * @delta_ns:  "slack" range for the timer
+ * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
+ * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED)
+ */
+void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+   u64 delta_ns, const enum hrtimer_mode mode)
+{
+   struct hrtimer_clock_base *base;
+   unsigned long flags;
+
+   base = lock_hrtimer_base(timer, );
+
+   if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
+   hrtimer_reprogram(timer);
 
-   hrtimer_reprogram(timer);
-unlock:
unlock_hrtimer_base(timer, );
 }
 EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);


[tip:timers/core] hrtimer: Factor out __hrtimer_start_range_ns()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  138a6b7ae4dedde5513678f57b275eee19c41b6a
Gitweb: https://git.kernel.org/tip/138a6b7ae4dedde5513678f57b275eee19c41b6a
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:52 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:59 +0100

hrtimer: Factor out __hrtimer_start_range_ns()

Preparatory patch for softirq based hrtimers to avoid code duplication,
factor out the __hrtimer_start_range_ns() function from 
hrtimer_start_range_ns().

No functional change.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 33a6c99..4142e6f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -905,22 +905,11 @@ static inline ktime_t hrtimer_update_lowres(struct 
hrtimer *timer, ktime_t tim,
return tim;
 }
 
-/**
- * hrtimer_start_range_ns - (re)start an hrtimer
- * @timer: the timer to be added
- * @tim:   expiry time
- * @delta_ns:  "slack" range for the timer
- * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
- * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED)
- */
-void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
-   u64 delta_ns, const enum hrtimer_mode mode)
+static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+   u64 delta_ns, const enum hrtimer_mode mode,
+   struct hrtimer_clock_base *base)
 {
-   struct hrtimer_clock_base *base, *new_base;
-   unsigned long flags;
-   int leftmost;
-
-   base = lock_hrtimer_base(timer, );
+   struct hrtimer_clock_base *new_base;
 
/* Remove an active timer from the queue: */
remove_hrtimer(timer, base, true);
@@ -935,12 +924,27 @@ void hrtimer_start_range_ns(struct hrtimer *timer, 
ktime_t tim,
/* Switch the timer base, if necessary: */
new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
 
-   leftmost = enqueue_hrtimer(timer, new_base, mode);
-   if (!leftmost)
-   goto unlock;
+   return enqueue_hrtimer(timer, new_base, mode);
+}
+/**
+ * hrtimer_start_range_ns - (re)start an hrtimer
+ * @timer: the timer to be added
+ * @tim:   expiry time
+ * @delta_ns:  "slack" range for the timer
+ * @mode:  timer mode: absolute (HRTIMER_MODE_ABS) or
+ * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED)
+ */
+void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+   u64 delta_ns, const enum hrtimer_mode mode)
+{
+   struct hrtimer_clock_base *base;
+   unsigned long flags;
+
+   base = lock_hrtimer_base(timer, );
+
+   if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
+   hrtimer_reprogram(timer);
 
-   hrtimer_reprogram(timer);
-unlock:
unlock_hrtimer_base(timer, );
 }
 EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);


[tip:timers/core] hrtimer: Factor out __hrtimer_next_event_base()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ad38f596d8e4babc19be8b21a7a49debffb4a7f5
Gitweb: https://git.kernel.org/tip/ad38f596d8e4babc19be8b21a7a49debffb4a7f5
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:53 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 03:00:43 +0100

hrtimer: Factor out __hrtimer_next_event_base()

Preparatory patch for softirq based hrtimers to avoid code duplication.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-25-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4142e6f..5d9b81d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -458,13 +458,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned 
int *active)
 #define for_each_active_base(base, cpu_base, active)   \
while ((base = __next_base((cpu_base), &(active
 
-static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base,
+unsigned int active,
+ktime_t expires_next)
 {
struct hrtimer_clock_base *base;
-   unsigned int active = cpu_base->active_bases;
-   ktime_t expires, expires_next = KTIME_MAX;
+   ktime_t expires;
 
-   cpu_base->next_timer = NULL;
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *next;
struct hrtimer *timer;
@@ -487,6 +487,18 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
return expires_next;
 }
 
+static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+{
+   unsigned int active = cpu_base->active_bases;
+   ktime_t expires_next = KTIME_MAX;
+
+   cpu_base->next_timer = NULL;
+
+   expires_next = __hrtimer_next_event_base(cpu_base, active, 
expires_next);
+
+   return expires_next;
+}
+
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
 {
ktime_t *offs_real = >clock_base[HRTIMER_BASE_REALTIME].offset;


[tip:timers/core] hrtimer: Factor out __hrtimer_next_event_base()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ad38f596d8e4babc19be8b21a7a49debffb4a7f5
Gitweb: https://git.kernel.org/tip/ad38f596d8e4babc19be8b21a7a49debffb4a7f5
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:53 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 03:00:43 +0100

hrtimer: Factor out __hrtimer_next_event_base()

Preparatory patch for softirq based hrtimers to avoid code duplication.

No functional change.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-25-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4142e6f..5d9b81d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -458,13 +458,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned 
int *active)
 #define for_each_active_base(base, cpu_base, active)   \
while ((base = __next_base((cpu_base), &(active
 
-static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base,
+unsigned int active,
+ktime_t expires_next)
 {
struct hrtimer_clock_base *base;
-   unsigned int active = cpu_base->active_bases;
-   ktime_t expires, expires_next = KTIME_MAX;
+   ktime_t expires;
 
-   cpu_base->next_timer = NULL;
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *next;
struct hrtimer *timer;
@@ -487,6 +487,18 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
return expires_next;
 }
 
+static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+{
+   unsigned int active = cpu_base->active_bases;
+   ktime_t expires_next = KTIME_MAX;
+
+   cpu_base->next_timer = NULL;
+
+   expires_next = __hrtimer_next_event_base(cpu_base, active, 
expires_next);
+
+   return expires_next;
+}
+
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
 {
ktime_t *offs_real = >clock_base[HRTIMER_BASE_REALTIME].offset;


[tip:timers/core] hrtimer: Remove the 'base' parameter from hrtimer_reprogram()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d
Gitweb: https://git.kernel.org/tip/3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:51 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:59 +0100

hrtimer: Remove the 'base' parameter from hrtimer_reprogram()

hrtimer_reprogram() must have access to the hrtimer_clock_base of the new
first expiring timer to access hrtimer_clock_base.offset for adjusting the
expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the
new left most timer in the hrtimer_clock_base is the first expiring timer
of all clock bases in a hrtimer_cpu_base.

The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has
a pointer to hrtimer_clock_base() already and hands it in as a parameter. But
hrtimer_start_range_ns() will be split for the upcoming support for softirq
based hrtimers to avoid code duplication and will lose the direct access to
the clock base pointer.

Instead of handing in timer and timer->base as a parameter remove the base
parameter from hrtimer_reprogram() instead and retrieve the clock base 
internally.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index f4a56fb..33a6c99 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -648,10 +648,10 @@ static inline void retrigger_next_event(void *arg) { }
  *
  * Called with interrupts disabled and base->cpu_base.lock held
  */
-static void hrtimer_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
+static void hrtimer_reprogram(struct hrtimer *timer)
 {
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
+   struct hrtimer_clock_base *base = timer->base;
ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
 
WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0);
@@ -939,7 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t 
tim,
if (!leftmost)
goto unlock;
 
-   hrtimer_reprogram(timer, new_base);
+   hrtimer_reprogram(timer);
 unlock:
unlock_hrtimer_base(timer, );
 }


[tip:timers/core] hrtimer: Remove the 'base' parameter from hrtimer_reprogram()

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d
Gitweb: https://git.kernel.org/tip/3ec7a3ee9f15f6dcac1591902d85b94c2a4b520d
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:51 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:59 +0100

hrtimer: Remove the 'base' parameter from hrtimer_reprogram()

hrtimer_reprogram() must have access to the hrtimer_clock_base of the new
first expiring timer to access hrtimer_clock_base.offset for adjusting the
expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the
new left most timer in the hrtimer_clock_base is the first expiring timer
of all clock bases in a hrtimer_cpu_base.

The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has
a pointer to hrtimer_clock_base() already and hands it in as a parameter. But
hrtimer_start_range_ns() will be split for the upcoming support for softirq
based hrtimers to avoid code duplication and will lose the direct access to
the clock base pointer.

Instead of handing in timer and timer->base as a parameter remove the base
parameter from hrtimer_reprogram() instead and retrieve the clock base 
internally.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index f4a56fb..33a6c99 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -648,10 +648,10 @@ static inline void retrigger_next_event(void *arg) { }
  *
  * Called with interrupts disabled and base->cpu_base.lock held
  */
-static void hrtimer_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
+static void hrtimer_reprogram(struct hrtimer *timer)
 {
struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
+   struct hrtimer_clock_base *base = timer->base;
ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
 
WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0);
@@ -939,7 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t 
tim,
if (!leftmost)
goto unlock;
 
-   hrtimer_reprogram(timer, new_base);
+   hrtimer_reprogram(timer);
 unlock:
unlock_hrtimer_base(timer, );
 }


[tip:timers/core] hrtimer: Make remote enqueue decision less restrictive

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  2ac2dccce9d16a7b1a8fddf69a955d249375bce4
Gitweb: https://git.kernel.org/tip/2ac2dccce9d16a7b1a8fddf69a955d249375bce4
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:50 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Make remote enqueue decision less restrictive

The current decision whether a timer can be queued on a remote CPU checks
for timer->expiry <= remote_cpu_base.expires_next.

This is too restrictive because a timer with the same expiry time as an
existing timer will be enqueued on right-hand size of the existing timer
inside the rbtree, i.e. behind the first expiring timer.

So its safe to allow enqueuing timers with the same expiry time as the
first expiring timer on a remote CPU base.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 1c68bf2..f4a56fb 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -168,7 +168,7 @@ hrtimer_check_target(struct hrtimer *timer, struct 
hrtimer_clock_base *new_base)
ktime_t expires;
 
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
-   return expires <= new_base->cpu_base->expires_next;
+   return expires < new_base->cpu_base->expires_next;
 }
 
 static inline


[tip:timers/core] hrtimer: Make remote enqueue decision less restrictive

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  2ac2dccce9d16a7b1a8fddf69a955d249375bce4
Gitweb: https://git.kernel.org/tip/2ac2dccce9d16a7b1a8fddf69a955d249375bce4
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:50 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Make remote enqueue decision less restrictive

The current decision whether a timer can be queued on a remote CPU checks
for timer->expiry <= remote_cpu_base.expires_next.

This is too restrictive because a timer with the same expiry time as an
existing timer will be enqueued on right-hand size of the existing timer
inside the rbtree, i.e. behind the first expiring timer.

So its safe to allow enqueuing timers with the same expiry time as the
first expiring timer on a remote CPU base.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 1c68bf2..f4a56fb 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -168,7 +168,7 @@ hrtimer_check_target(struct hrtimer *timer, struct 
hrtimer_clock_base *new_base)
ktime_t expires;
 
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
-   return expires <= new_base->cpu_base->expires_next;
+   return expires < new_base->cpu_base->expires_next;
 }
 
 static inline


[tip:timers/core] hrtimer: Unify remote enqueue handling

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  14c803419de6acba08e143d51813ac5e0f3443b8
Gitweb: https://git.kernel.org/tip/14c803419de6acba08e143d51813ac5e0f3443b8
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:49 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Unify remote enqueue handling

hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns()
when hrtimer_cpu_base.hres_active is true.

In the !hres_active case there is a special condition for the nohz_active
case:

  If the newly enqueued timer expires before the first expiring timer on a
  remote CPU then the remote CPU needs to be notified and woken up from a
  NOHZ idle sleep to take the new first expiring timer into account.

Previous changes have already established the prerequisites to make the
remote enqueue behaviour the same whether high resolution mode is active or
not:

  If the to be enqueued timer expires before the first expiring timer on a
  remote CPU, then it cannot be enqueued there.

This was done for the high resolution mode because there is no way to
access the remote CPU timer hardware. The same is true for NOHZ, but was
handled differently by unconditionally enqueuing the timer and waking up
the remote CPU so it can reprogram its timer. Again there is no compelling
reason for this difference.

hrtimer_check_target(), which makes the 'can remote enqueue' decision is
already unconditional, but not yet functional because nothing updates
hrtimer_cpu_base.expires_next in the !hres_active case.

To unify this the following changes are required:

 1) Make the store of the new first expiry time unconditonal in
hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding
to the actual hardware access. This check also lets the compiler
eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n.

 2) Invoke hrtimer_reprogram() unconditionally from
hrtimer_start_range_ns()

 3) Remove the remote wakeup special case for the !high_res && nohz_active
case.

Confine the timers_nohz_active static key to timer.c which is the only user
now.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c   | 18 ++
 kernel/time/tick-internal.h |  6 --
 kernel/time/timer.c |  9 -
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e6a78ae..1c68bf2 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -685,21 +685,24 @@ static void hrtimer_reprogram(struct hrtimer *timer,
 
/* Update the pointer to the next expiring timer */
cpu_base->next_timer = timer;
+   cpu_base->expires_next = expires;
 
/*
+* If hres is not active, hardware does not have to be
+* programmed yet.
+*
 * If a hang was detected in the last timer interrupt then we
 * do not schedule a timer which is earlier than the expiry
 * which we enforced in the hang detection. We want the system
 * to make progress.
 */
-   if (cpu_base->hang_detected)
+   if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected)
return;
 
/*
 * Program the timer hardware. We enforce the expiry for
 * events which are already in the past.
 */
-   cpu_base->expires_next = expires;
tick_program_event(expires, 1);
 }
 
@@ -936,16 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t 
tim,
if (!leftmost)
goto unlock;
 
-   if (!hrtimer_is_hres_active(timer)) {
-   /*
-* Kick to reschedule the next tick to handle the new timer
-* on dynticks target.
-*/
-   if (is_timers_nohz_active())
-   wake_up_nohz_cpu(new_base->cpu_base->cpu);
-   } else {
-   hrtimer_reprogram(timer, new_base);
-   }
+   hrtimer_reprogram(timer, new_base);
 unlock:
unlock_hrtimer_base(timer, );
 }
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f690628..e277284 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -151,18 +151,12 @@ static inline void tick_nohz_init(void) { }
 #ifdef CONFIG_NO_HZ_COMMON
 extern unsigned long tick_nohz_active;
 extern void timers_update_nohz(void);
-extern struct static_key_false timers_nohz_active;
-static inline b

[tip:timers/core] hrtimer: Unify remote enqueue handling

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  14c803419de6acba08e143d51813ac5e0f3443b8
Gitweb: https://git.kernel.org/tip/14c803419de6acba08e143d51813ac5e0f3443b8
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:49 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Unify remote enqueue handling

hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns()
when hrtimer_cpu_base.hres_active is true.

In the !hres_active case there is a special condition for the nohz_active
case:

  If the newly enqueued timer expires before the first expiring timer on a
  remote CPU then the remote CPU needs to be notified and woken up from a
  NOHZ idle sleep to take the new first expiring timer into account.

Previous changes have already established the prerequisites to make the
remote enqueue behaviour the same whether high resolution mode is active or
not:

  If the to be enqueued timer expires before the first expiring timer on a
  remote CPU, then it cannot be enqueued there.

This was done for the high resolution mode because there is no way to
access the remote CPU timer hardware. The same is true for NOHZ, but was
handled differently by unconditionally enqueuing the timer and waking up
the remote CPU so it can reprogram its timer. Again there is no compelling
reason for this difference.

hrtimer_check_target(), which makes the 'can remote enqueue' decision is
already unconditional, but not yet functional because nothing updates
hrtimer_cpu_base.expires_next in the !hres_active case.

To unify this the following changes are required:

 1) Make the store of the new first expiry time unconditonal in
hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding
to the actual hardware access. This check also lets the compiler
eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n.

 2) Invoke hrtimer_reprogram() unconditionally from
hrtimer_start_range_ns()

 3) Remove the remote wakeup special case for the !high_res && nohz_active
case.

Confine the timers_nohz_active static key to timer.c which is the only user
now.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c   | 18 ++
 kernel/time/tick-internal.h |  6 --
 kernel/time/timer.c |  9 -
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e6a78ae..1c68bf2 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -685,21 +685,24 @@ static void hrtimer_reprogram(struct hrtimer *timer,
 
/* Update the pointer to the next expiring timer */
cpu_base->next_timer = timer;
+   cpu_base->expires_next = expires;
 
/*
+* If hres is not active, hardware does not have to be
+* programmed yet.
+*
 * If a hang was detected in the last timer interrupt then we
 * do not schedule a timer which is earlier than the expiry
 * which we enforced in the hang detection. We want the system
 * to make progress.
 */
-   if (cpu_base->hang_detected)
+   if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected)
return;
 
/*
 * Program the timer hardware. We enforce the expiry for
 * events which are already in the past.
 */
-   cpu_base->expires_next = expires;
tick_program_event(expires, 1);
 }
 
@@ -936,16 +939,7 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t 
tim,
if (!leftmost)
goto unlock;
 
-   if (!hrtimer_is_hres_active(timer)) {
-   /*
-* Kick to reschedule the next tick to handle the new timer
-* on dynticks target.
-*/
-   if (is_timers_nohz_active())
-   wake_up_nohz_cpu(new_base->cpu_base->cpu);
-   } else {
-   hrtimer_reprogram(timer, new_base);
-   }
+   hrtimer_reprogram(timer, new_base);
 unlock:
unlock_hrtimer_base(timer, );
 }
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f690628..e277284 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -151,18 +151,12 @@ static inline void tick_nohz_init(void) { }
 #ifdef CONFIG_NO_HZ_COMMON
 extern unsigned long tick_nohz_active;
 extern void timers_update_nohz(void);
-extern struct static_key_false timers_nohz_active;
-static inline bool is_timers_nohz_active(void)
-{
-   return static_branch_likely(_nohz_active);
-}
 # ifdef CONFIG_SMP
 extern struct static_key_false timers_migration_enabled;
 # endif
 #else /* CONFIG_NO_HZ_COMMON */
 static inline void timers_update_nohz(void) { }

[tip:timers/core] hrtimer: Unify hrtimer removal handling

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  61bb4bcb79c7afcd0bf0d20aef4704977172fd60
Gitweb: https://git.kernel.org/tip/61bb4bcb79c7afcd0bf0d20aef4704977172fd60
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:48 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Unify hrtimer removal handling

When the first hrtimer on the current CPU is removed,
hrtimer_force_reprogram() is invoked but only when
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set.

hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and
reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and
hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be
prevented.

hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as
hrtimer_check_target() is unconditionally available and
hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(),
hrtimer_force_reprogram() needs to be available unconditionally as well to
prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n:

- the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is
  not executed

- CPU goes idle (next timer is calculated and hrtimers are taken into
  account)

- a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target()
  compares expiry value and hrtimer_cpu_base.expires_next. The expiry value
  is after expires_next, so the hrtimer is enqueued. This timer will fire
  late, if it expires before the effective first hrtimer on this CPU and
  the comparison was with an outdated expires_next value.

To prevent this scenario, make hrtimer_force_reprogram() unconditional
except the effective reprogramming part, which gets eliminated by the
compiler in the CONFIG_HIGH_RES_TIMERS=n case.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 2b3222e..e6a78ae 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -521,9 +521,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 {
ktime_t expires_next;
 
-   if (!__hrtimer_hres_active(cpu_base))
-   return;
-
expires_next = __hrtimer_get_next_event(cpu_base);
 
if (skip_equal && expires_next == cpu_base->expires_next)
@@ -532,6 +529,9 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
cpu_base->expires_next = expires_next;
 
/*
+* If hres is not active, hardware does not have to be
+* reprogrammed yet.
+*
 * If a hang was detected in the last timer interrupt then we
 * leave the hang delay active in the hardware. We want the
 * system to make progress. That also prevents the following
@@ -545,7 +545,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 * set. So we'd effectivly block all timers until the T2 event
 * fires.
 */
-   if (cpu_base->hang_detected)
+   if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected)
return;
 
tick_program_event(cpu_base->expires_next, 1);
@@ -844,7 +844,6 @@ static void __remove_hrtimer(struct hrtimer *timer,
if (!timerqueue_del(>active, >node))
cpu_base->active_bases &= ~(1 << base->index);
 
-#ifdef CONFIG_HIGH_RES_TIMERS
/*
 * Note: If reprogram is false we do not update
 * cpu_base->next_timer. This happens when we remove the first
@@ -855,7 +854,6 @@ static void __remove_hrtimer(struct hrtimer *timer,
 */
if (reprogram && timer == cpu_base->next_timer)
hrtimer_force_reprogram(cpu_base, 1);
-#endif
 }
 
 /*


[tip:timers/core] hrtimer: Unify hrtimer removal handling

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  61bb4bcb79c7afcd0bf0d20aef4704977172fd60
Gitweb: https://git.kernel.org/tip/61bb4bcb79c7afcd0bf0d20aef4704977172fd60
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:48 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:58 +0100

hrtimer: Unify hrtimer removal handling

When the first hrtimer on the current CPU is removed,
hrtimer_force_reprogram() is invoked but only when
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set.

hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and
reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and
hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be
prevented.

hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as
hrtimer_check_target() is unconditionally available and
hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(),
hrtimer_force_reprogram() needs to be available unconditionally as well to
prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n:

- the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is
  not executed

- CPU goes idle (next timer is calculated and hrtimers are taken into
  account)

- a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target()
  compares expiry value and hrtimer_cpu_base.expires_next. The expiry value
  is after expires_next, so the hrtimer is enqueued. This timer will fire
  late, if it expires before the effective first hrtimer on this CPU and
  the comparison was with an outdated expires_next value.

To prevent this scenario, make hrtimer_force_reprogram() unconditional
except the effective reprogramming part, which gets eliminated by the
compiler in the CONFIG_HIGH_RES_TIMERS=n case.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 2b3222e..e6a78ae 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -521,9 +521,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 {
ktime_t expires_next;
 
-   if (!__hrtimer_hres_active(cpu_base))
-   return;
-
expires_next = __hrtimer_get_next_event(cpu_base);
 
if (skip_equal && expires_next == cpu_base->expires_next)
@@ -532,6 +529,9 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
cpu_base->expires_next = expires_next;
 
/*
+* If hres is not active, hardware does not have to be
+* reprogrammed yet.
+*
 * If a hang was detected in the last timer interrupt then we
 * leave the hang delay active in the hardware. We want the
 * system to make progress. That also prevents the following
@@ -545,7 +545,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 * set. So we'd effectivly block all timers until the T2 event
 * fires.
 */
-   if (cpu_base->hang_detected)
+   if (!__hrtimer_hres_active(cpu_base) || cpu_base->hang_detected)
return;
 
tick_program_event(cpu_base->expires_next, 1);
@@ -844,7 +844,6 @@ static void __remove_hrtimer(struct hrtimer *timer,
if (!timerqueue_del(>active, >node))
cpu_base->active_bases &= ~(1 << base->index);
 
-#ifdef CONFIG_HIGH_RES_TIMERS
/*
 * Note: If reprogram is false we do not update
 * cpu_base->next_timer. This happens when we remove the first
@@ -855,7 +854,6 @@ static void __remove_hrtimer(struct hrtimer *timer,
 */
if (reprogram && timer == cpu_base->next_timer)
hrtimer_force_reprogram(cpu_base, 1);
-#endif
 }
 
 /*


[tip:timers/core] hrtimer: Make hrtimer_force_reprogramm() unconditionally available

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ebba2c723f38a766546b2eaf828c522576c791d4
Gitweb: https://git.kernel.org/tip/ebba2c723f38a766546b2eaf828c522576c791d4
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:47 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:53:28 +0100

hrtimer: Make hrtimer_force_reprogramm() unconditionally available

hrtimer_force_reprogram() needs to be available unconditionally for softirq
based hrtimers. Move the function and all required struct members out of
the CONFIG_HIGH_RES_TIMERS #ifdef.

There is no functional change because hrtimer_force_reprogram() is only
invoked when hrtimer_cpu_base.hres_active is true and
CONFIG_HIGH_RES_TIMERS=y.

Making it unconditional increases the text size for the
CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code
for the upcoming softirq based hrtimers support. Most of the code gets
eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-ma...@linutronix.de
[ Made it build on !CONFIG_HIGH_RES_TIMERS ]
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/hrtimer.c | 60 ---
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 63d804a..2b3222e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -458,7 +458,6 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int 
*active)
 #define for_each_active_base(base, cpu_base, active)   \
while ((base = __next_base((cpu_base), &(active
 
-#if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
 static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
 {
struct hrtimer_clock_base *base;
@@ -487,7 +486,6 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
expires_next = 0;
return expires_next;
 }
-#endif
 
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
 {
@@ -513,34 +511,6 @@ static inline int hrtimer_hres_active(void)
return __hrtimer_hres_active(this_cpu_ptr(_bases));
 }
 
-/* High resolution timer related functions */
-#ifdef CONFIG_HIGH_RES_TIMERS
-
-/*
- * High resolution timer enabled ?
- */
-static bool hrtimer_hres_enabled __read_mostly  = true;
-unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC;
-EXPORT_SYMBOL_GPL(hrtimer_resolution);
-
-/*
- * Enable / Disable high resolution mode
- */
-static int __init setup_hrtimer_hres(char *str)
-{
-   return (kstrtobool(str, _hres_enabled) == 0);
-}
-
-__setup("highres=", setup_hrtimer_hres);
-
-/*
- * hrtimer_high_res_enabled - query, if the highres mode is enabled
- */
-static inline int hrtimer_is_hres_enabled(void)
-{
-   return hrtimer_hres_enabled;
-}
-
 /*
  * Reprogram the event source with checking both queues for the
  * next event
@@ -581,6 +551,34 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
tick_program_event(cpu_base->expires_next, 1);
 }
 
+/* High resolution timer related functions */
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+/*
+ * High resolution timer enabled ?
+ */
+static bool hrtimer_hres_enabled __read_mostly  = true;
+unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC;
+EXPORT_SYMBOL_GPL(hrtimer_resolution);
+
+/*
+ * Enable / Disable high resolution mode
+ */
+static int __init setup_hrtimer_hres(char *str)
+{
+   return (kstrtobool(str, _hres_enabled) == 0);
+}
+
+__setup("highres=", setup_hrtimer_hres);
+
+/*
+ * hrtimer_high_res_enabled - query, if the highres mode is enabled
+ */
+static inline int hrtimer_is_hres_enabled(void)
+{
+   return hrtimer_hres_enabled;
+}
+
 /*
  * Retrigger next event is called after clock was set
  *
@@ -639,8 +637,6 @@ void clock_was_set_delayed(void)
 
 static inline int hrtimer_is_hres_enabled(void) { return 0; }
 static inline void hrtimer_switch_to_hres(void) { }
-static inline void
-hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
 static inline void retrigger_next_event(void *arg) { }
 
 #endif /* CONFIG_HIGH_RES_TIMERS */


[tip:timers/core] hrtimer: Make hrtimer_force_reprogramm() unconditionally available

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  ebba2c723f38a766546b2eaf828c522576c791d4
Gitweb: https://git.kernel.org/tip/ebba2c723f38a766546b2eaf828c522576c791d4
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:47 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:53:28 +0100

hrtimer: Make hrtimer_force_reprogramm() unconditionally available

hrtimer_force_reprogram() needs to be available unconditionally for softirq
based hrtimers. Move the function and all required struct members out of
the CONFIG_HIGH_RES_TIMERS #ifdef.

There is no functional change because hrtimer_force_reprogram() is only
invoked when hrtimer_cpu_base.hres_active is true and
CONFIG_HIGH_RES_TIMERS=y.

Making it unconditional increases the text size for the
CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code
for the upcoming softirq based hrtimers support. Most of the code gets
eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-ma...@linutronix.de
[ Made it build on !CONFIG_HIGH_RES_TIMERS ]
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 60 ---
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 63d804a..2b3222e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -458,7 +458,6 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned int 
*active)
 #define for_each_active_base(base, cpu_base, active)   \
while ((base = __next_base((cpu_base), &(active
 
-#if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
 static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
 {
struct hrtimer_clock_base *base;
@@ -487,7 +486,6 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
expires_next = 0;
return expires_next;
 }
-#endif
 
 static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
 {
@@ -513,34 +511,6 @@ static inline int hrtimer_hres_active(void)
return __hrtimer_hres_active(this_cpu_ptr(_bases));
 }
 
-/* High resolution timer related functions */
-#ifdef CONFIG_HIGH_RES_TIMERS
-
-/*
- * High resolution timer enabled ?
- */
-static bool hrtimer_hres_enabled __read_mostly  = true;
-unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC;
-EXPORT_SYMBOL_GPL(hrtimer_resolution);
-
-/*
- * Enable / Disable high resolution mode
- */
-static int __init setup_hrtimer_hres(char *str)
-{
-   return (kstrtobool(str, _hres_enabled) == 0);
-}
-
-__setup("highres=", setup_hrtimer_hres);
-
-/*
- * hrtimer_high_res_enabled - query, if the highres mode is enabled
- */
-static inline int hrtimer_is_hres_enabled(void)
-{
-   return hrtimer_hres_enabled;
-}
-
 /*
  * Reprogram the event source with checking both queues for the
  * next event
@@ -581,6 +551,34 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
tick_program_event(cpu_base->expires_next, 1);
 }
 
+/* High resolution timer related functions */
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+/*
+ * High resolution timer enabled ?
+ */
+static bool hrtimer_hres_enabled __read_mostly  = true;
+unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC;
+EXPORT_SYMBOL_GPL(hrtimer_resolution);
+
+/*
+ * Enable / Disable high resolution mode
+ */
+static int __init setup_hrtimer_hres(char *str)
+{
+   return (kstrtobool(str, _hres_enabled) == 0);
+}
+
+__setup("highres=", setup_hrtimer_hres);
+
+/*
+ * hrtimer_high_res_enabled - query, if the highres mode is enabled
+ */
+static inline int hrtimer_is_hres_enabled(void)
+{
+   return hrtimer_hres_enabled;
+}
+
 /*
  * Retrigger next event is called after clock was set
  *
@@ -639,8 +637,6 @@ void clock_was_set_delayed(void)
 
 static inline int hrtimer_is_hres_enabled(void) { return 0; }
 static inline void hrtimer_switch_to_hres(void) { }
-static inline void
-hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
 static inline void retrigger_next_event(void *arg) { }
 
 #endif /* CONFIG_HIGH_RES_TIMERS */


[tip:timers/core] hrtimer: Make hrtimer_reprogramm() unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  11a9fe069e341ac53bddb8fe1a85ea986cff1a42
Gitweb: https://git.kernel.org/tip/11a9fe069e341ac53bddb8fe1a85ea986cff1a42
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:46 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make hrtimer_reprogramm() unconditional

hrtimer_reprogram() needs to be available unconditionally for softirq based
hrtimers. Move the function and all required struct members out of the
CONFIG_HIGH_RES_TIMERS #ifdef.

There is no functional change because hrtimer_reprogram() is only invoked
when hrtimer_cpu_base.hres_active is true. Making it unconditional
increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
replication of that code for the upcoming softirq based hrtimers support.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/hrtimer.h |   6 +--
 kernel/time/hrtimer.c   | 129 +++-
 2 files changed, 65 insertions(+), 70 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 2d3e1d6..98ed357 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -182,10 +182,10 @@ struct hrtimer_cpu_base {
unsigned intcpu;
unsigned intactive_bases;
unsigned intclock_was_set_seq;
-   unsigned inthres_active : 1;
-#ifdef CONFIG_HIGH_RES_TIMERS
-   unsigned intin_hrtirq   : 1,
+   unsigned inthres_active : 1,
+   in_hrtirq   : 1,
hang_detected   : 1;
+#ifdef CONFIG_HIGH_RES_TIMERS
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 26abaa7..63d804a 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -582,68 +582,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 }
 
 /*
- * When a timer is enqueued and expires earlier than the already enqueued
- * timers, we have to check, whether it expires earlier than the timer for
- * which the clock event device was armed.
- *
- * Called with interrupts disabled and base->cpu_base.lock held
- */
-static void hrtimer_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
-{
-   struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
-   ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
-
-   WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0);
-
-   /*
-* If the timer is not on the current cpu, we cannot reprogram
-* the other cpus clock event device.
-*/
-   if (base->cpu_base != cpu_base)
-   return;
-
-   /*
-* If the hrtimer interrupt is running, then it will
-* reevaluate the clock bases and reprogram the clock event
-* device. The callbacks are always executed in hard interrupt
-* context so we don't need an extra check for a running
-* callback.
-*/
-   if (cpu_base->in_hrtirq)
-   return;
-
-   /*
-* CLOCK_REALTIME timer might be requested with an absolute
-* expiry time which is less than base->offset. Set it to 0.
-*/
-   if (expires < 0)
-   expires = 0;
-
-   if (expires >= cpu_base->expires_next)
-   return;
-
-   /* Update the pointer to the next expiring timer */
-   cpu_base->next_timer = timer;
-
-   /*
-* If a hang was detected in the last timer interrupt then we
-* do not schedule a timer which is earlier than the expiry
-* which we enforced in the hang detection. We want the system
-* to make progress.
-*/
-   if (cpu_base->hang_detected)
-   return;
-
-   /*
-* Program the timer hardware. We enforce the expiry for
-* events which are already in the past.
-*/
-   cpu_base->expires_next = expires;
-   tick_program_event(expires, 1);
-}
-
-/*
  * Retrigger next event is called after clock was set
  *
  * Called with interrupts disabled via on_each_cpu()
@@ -703,16 +641,73 @@ static inline int hrtimer_is_hres_enabled(void) { return 
0; }
 static inline

[tip:timers/core] hrtimer: Make hrtimer_reprogramm() unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  11a9fe069e341ac53bddb8fe1a85ea986cff1a42
Gitweb: https://git.kernel.org/tip/11a9fe069e341ac53bddb8fe1a85ea986cff1a42
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:46 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make hrtimer_reprogramm() unconditional

hrtimer_reprogram() needs to be available unconditionally for softirq based
hrtimers. Move the function and all required struct members out of the
CONFIG_HIGH_RES_TIMERS #ifdef.

There is no functional change because hrtimer_reprogram() is only invoked
when hrtimer_cpu_base.hres_active is true. Making it unconditional
increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
replication of that code for the upcoming softirq based hrtimers support.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 include/linux/hrtimer.h |   6 +--
 kernel/time/hrtimer.c   | 129 +++-
 2 files changed, 65 insertions(+), 70 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 2d3e1d6..98ed357 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -182,10 +182,10 @@ struct hrtimer_cpu_base {
unsigned intcpu;
unsigned intactive_bases;
unsigned intclock_was_set_seq;
-   unsigned inthres_active : 1;
-#ifdef CONFIG_HIGH_RES_TIMERS
-   unsigned intin_hrtirq   : 1,
+   unsigned inthres_active : 1,
+   in_hrtirq   : 1,
hang_detected   : 1;
+#ifdef CONFIG_HIGH_RES_TIMERS
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 26abaa7..63d804a 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -582,68 +582,6 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, 
int skip_equal)
 }
 
 /*
- * When a timer is enqueued and expires earlier than the already enqueued
- * timers, we have to check, whether it expires earlier than the timer for
- * which the clock event device was armed.
- *
- * Called with interrupts disabled and base->cpu_base.lock held
- */
-static void hrtimer_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
-{
-   struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(_bases);
-   ktime_t expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
-
-   WARN_ON_ONCE(hrtimer_get_expires_tv64(timer) < 0);
-
-   /*
-* If the timer is not on the current cpu, we cannot reprogram
-* the other cpus clock event device.
-*/
-   if (base->cpu_base != cpu_base)
-   return;
-
-   /*
-* If the hrtimer interrupt is running, then it will
-* reevaluate the clock bases and reprogram the clock event
-* device. The callbacks are always executed in hard interrupt
-* context so we don't need an extra check for a running
-* callback.
-*/
-   if (cpu_base->in_hrtirq)
-   return;
-
-   /*
-* CLOCK_REALTIME timer might be requested with an absolute
-* expiry time which is less than base->offset. Set it to 0.
-*/
-   if (expires < 0)
-   expires = 0;
-
-   if (expires >= cpu_base->expires_next)
-   return;
-
-   /* Update the pointer to the next expiring timer */
-   cpu_base->next_timer = timer;
-
-   /*
-* If a hang was detected in the last timer interrupt then we
-* do not schedule a timer which is earlier than the expiry
-* which we enforced in the hang detection. We want the system
-* to make progress.
-*/
-   if (cpu_base->hang_detected)
-   return;
-
-   /*
-* Program the timer hardware. We enforce the expiry for
-* events which are already in the past.
-*/
-   cpu_base->expires_next = expires;
-   tick_program_event(expires, 1);
-}
-
-/*
  * Retrigger next event is called after clock was set
  *
  * Called with interrupts disabled via on_each_cpu()
@@ -703,16 +641,73 @@ static inline int hrtimer_is_hres_enabled(void) { return 
0; }
 static inline void hrtimer_switch_to_hres(void) { }
 static inline void
 hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
-static inline int hrtimer_reprogram(struct hrtimer *timer,
-   struct hrtimer_clock_base *base)
-{

[tip:timers/core] hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  eb27926ba05233dc4f2052cc9d4f19359ec3cd2c
Gitweb: https://git.kernel.org/tip/eb27926ba05233dc4f2052cc9d4f19359ec3cd2c
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:45 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional

hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.

This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.

Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.

Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.

Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
#ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
marginal as it's just a store on an already dirtied cacheline.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/hrtimer.h |  4 ++--
 kernel/time/hrtimer.c   | 12 ++--
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index bb7270e..2d3e1d6 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -164,13 +164,13 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
- * @next_timer:Pointer to the first expiring timer
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
  * @expires_next:  absolute time of the next event, is required for remote
  * hrtimer enqueue
+ * @next_timer:Pointer to the first expiring timer
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -186,13 +186,13 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
unsigned intin_hrtirq   : 1,
hang_detected   : 1;
-   struct hrtimer  *next_timer;
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
 #endif
ktime_t expires_next;
+   struct hrtimer  *next_timer;
struct hrtimer_clock_base   clock_base[HRTIMER_MAX_CLOCK_BASES];
 } cacheline_aligned;
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index a9ab67f..26abaa7 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -459,21 +459,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned 
int *active)
while ((base = __next_base((cpu_base), &(active
 
 #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
-static inline void hrtimer_update_next_timer(struct hrtimer_cpu_base *cpu_base,
-struct hrtimer *timer)
-{
-#ifdef CONFIG_HIGH_RES_TIMERS
-   cpu_base->next_timer = timer;
-#endif
-}
-
 static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
 {
struct hrtimer_clock_base *base;
unsigned int active = cpu_base->active_bases;
ktime_t expires, expires_next = KTIME_MAX;
 
-   hrtimer_update_next_timer(cpu_base, NULL);
+   cpu_base->next_timer = NULL;
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *next;
struct hrtimer *timer;
@@ -483,7 +475,7 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
  

[tip:timers/core] hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  eb27926ba05233dc4f2052cc9d4f19359ec3cd2c
Gitweb: https://git.kernel.org/tip/eb27926ba05233dc4f2052cc9d4f19359ec3cd2c
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:45 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional

hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.

This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.

Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.

Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.

Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
#ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
marginal as it's just a store on an already dirtied cacheline.

No functional change.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 include/linux/hrtimer.h |  4 ++--
 kernel/time/hrtimer.c   | 12 ++--
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index bb7270e..2d3e1d6 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -164,13 +164,13 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
- * @next_timer:Pointer to the first expiring timer
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
  * @expires_next:  absolute time of the next event, is required for remote
  * hrtimer enqueue
+ * @next_timer:Pointer to the first expiring timer
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -186,13 +186,13 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
unsigned intin_hrtirq   : 1,
hang_detected   : 1;
-   struct hrtimer  *next_timer;
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
 #endif
ktime_t expires_next;
+   struct hrtimer  *next_timer;
struct hrtimer_clock_base   clock_base[HRTIMER_MAX_CLOCK_BASES];
 } cacheline_aligned;
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index a9ab67f..26abaa7 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -459,21 +459,13 @@ __next_base(struct hrtimer_cpu_base *cpu_base, unsigned 
int *active)
while ((base = __next_base((cpu_base), &(active
 
 #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
-static inline void hrtimer_update_next_timer(struct hrtimer_cpu_base *cpu_base,
-struct hrtimer *timer)
-{
-#ifdef CONFIG_HIGH_RES_TIMERS
-   cpu_base->next_timer = timer;
-#endif
-}
-
 static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
 {
struct hrtimer_clock_base *base;
unsigned int active = cpu_base->active_bases;
ktime_t expires, expires_next = KTIME_MAX;
 
-   hrtimer_update_next_timer(cpu_base, NULL);
+   cpu_base->next_timer = NULL;
for_each_active_base(base, cpu_base, active) {
struct timerqueue_node *next;
struct hrtimer *timer;
@@ -483,7 +475,7 @@ static ktime_t __hrtimer_get_next_event(struct 
hrtimer_cpu_base *cpu_base)
expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
if (expires < expires_next) {
expires_next = expires;
-   hrtimer_update_next_timer(cpu_base, timer);
+   cpu_base->next_timer = timer;
}
}
/*


[tip:timers/core] hrtimer: Make the remote enqueue check unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  07a9a7eae86abb796468b225586086d7c4cb59fc
Gitweb: https://git.kernel.org/tip/07a9a7eae86abb796468b225586086d7c4cb59fc
Author: Anna-Maria Gleixner <anna-ma...@linutronix.de>
AuthorDate: Thu, 21 Dec 2017 11:41:44 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make the remote enqueue check unconditional

hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.

The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.

Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().

The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.

Signed-off-by: Anna-Maria Gleixner <anna-ma...@linutronix.de>
Cc: Christoph Hellwig <h...@lst.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/hrtimer.h |  6 +++---
 kernel/time/hrtimer.c   | 26 ++
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 22627b3..bb7270e 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -164,13 +164,13 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
- * @expires_next:  absolute time of the next event, is required for remote
- * hrtimer enqueue
  * @next_timer:Pointer to the first expiring timer
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
+ * @expires_next:  absolute time of the next event, is required for remote
+ * hrtimer enqueue
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -186,13 +186,13 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
unsigned intin_hrtirq   : 1,
hang_detected   : 1;
-   ktime_t expires_next;
struct hrtimer  *next_timer;
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
 #endif
+   ktime_t expires_next;
struct hrtimer_clock_base   clock_base[HRTIMER_MAX_CLOCK_BASES];
 } cacheline_aligned;
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5a624f9..a9ab67f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -154,26 +154,21 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct 
hrtimer *timer,
 }
 
 /*
- * With HIGHRES=y we do not migrate the timer when it is expiring
- * before the next event on the target cpu because we cannot reprogram
- * the target cpu hardware and we would cause it to fire late.
+ * We do not migrate the timer when it is expiring before the next
+ * event on the target cpu. When high resolution is enabled, we cannot
+ * reprogram the target cpu hardware and we would cause it to fire
+ * late. To keep it simple, we handle the high resolution enabled and
+ * disabled case similar.
  *
  * Called with cpu_base->lock of target cpu held.
  */
 static int
 hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base 
*new_base)
 {
-#ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires;
 
-   if (!new_base->cpu_base->hres_active)
-   return 0;
-
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
return expires <= new_base->cpu_base->expires_next;
-#else
-   return 0;
-#endif
 }
 
 static inli

[tip:timers/core] hrtimer: Make the remote enqueue check unconditional

2018-01-15 Thread tip-bot for Anna-Maria Gleixner
Commit-ID:  07a9a7eae86abb796468b225586086d7c4cb59fc
Gitweb: https://git.kernel.org/tip/07a9a7eae86abb796468b225586086d7c4cb59fc
Author: Anna-Maria Gleixner 
AuthorDate: Thu, 21 Dec 2017 11:41:44 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 16 Jan 2018 02:35:47 +0100

hrtimer: Make the remote enqueue check unconditional

hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.

The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.

Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().

The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keesc...@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-ma...@linutronix.de
Signed-off-by: Ingo Molnar 
---
 include/linux/hrtimer.h |  6 +++---
 kernel/time/hrtimer.c   | 26 ++
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 22627b3..bb7270e 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -164,13 +164,13 @@ enum  hrtimer_base_type {
  * @hres_active:   State of high resolution mode
  * @in_hrtirq: hrtimer_interrupt() is currently executing
  * @hang_detected: The last hrtimer interrupt detected a hang
- * @expires_next:  absolute time of the next event, is required for remote
- * hrtimer enqueue
  * @next_timer:Pointer to the first expiring timer
  * @nr_events: Total number of hrtimer interrupt events
  * @nr_retries:Total number of hrtimer interrupt retries
  * @nr_hangs:  Total number of hrtimer interrupt hangs
  * @max_hang_time: Maximum time spent in hrtimer_interrupt
+ * @expires_next:  absolute time of the next event, is required for remote
+ * hrtimer enqueue
  * @clock_base:array of clock bases for this cpu
  *
  * Note: next_timer is just an optimization for __remove_hrtimer().
@@ -186,13 +186,13 @@ struct hrtimer_cpu_base {
 #ifdef CONFIG_HIGH_RES_TIMERS
unsigned intin_hrtirq   : 1,
hang_detected   : 1;
-   ktime_t expires_next;
struct hrtimer  *next_timer;
unsigned intnr_events;
unsigned short  nr_retries;
unsigned short  nr_hangs;
unsigned intmax_hang_time;
 #endif
+   ktime_t expires_next;
struct hrtimer_clock_base   clock_base[HRTIMER_MAX_CLOCK_BASES];
 } cacheline_aligned;
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5a624f9..a9ab67f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -154,26 +154,21 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct 
hrtimer *timer,
 }
 
 /*
- * With HIGHRES=y we do not migrate the timer when it is expiring
- * before the next event on the target cpu because we cannot reprogram
- * the target cpu hardware and we would cause it to fire late.
+ * We do not migrate the timer when it is expiring before the next
+ * event on the target cpu. When high resolution is enabled, we cannot
+ * reprogram the target cpu hardware and we would cause it to fire
+ * late. To keep it simple, we handle the high resolution enabled and
+ * disabled case similar.
  *
  * Called with cpu_base->lock of target cpu held.
  */
 static int
 hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base 
*new_base)
 {
-#ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires;
 
-   if (!new_base->cpu_base->hres_active)
-   return 0;
-
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
return expires <= new_base->cpu_base->expires_next;
-#else
-   return 0;
-#endif
 }
 
 static inline
@@ -657,14 +652,6 @@ static void hrtimer_reprogram(struct hrtimer *timer,
 }
 
 /*
- * Initialize the high resolution related parts of cpu_base
- */
-static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
-{
-   base->expires_next = KTIME_MAX;
-}

  1   2   3   4   5   6   7   8   9   >