[tip:timers/urgent] time: Fix ktime_get_raw() incorrect base accumulation

2017-08-26 Thread tip-bot for John Stultz
Commit-ID:  0bcdc0987cce9880436b70836c6a92bb8e744fd1
Gitweb: http://git.kernel.org/tip/0bcdc0987cce9880436b70836c6a92bb8e744fd1
Author: John Stultz 
AuthorDate: Fri, 25 Aug 2017 15:57:04 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 26 Aug 2017 16:06:12 +0200

time: Fix ktime_get_raw() incorrect base accumulation

In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
handling"), the following code got mistakenly added to the update of the
raw timekeeper:

 /* Update the monotonic raw base */
 seconds = tk->raw_sec;
 nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
 tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the raw_sec value and the shifted down raw xtime_nsec to the
base value.

But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
another time, The result of this is that ktime_get_raw() users (which are
all internal users) see the raw time move faster then it should (the rate
at which can vary with the current size of tkr_raw.xtime_nsec), which has
resulted in at least problems with graphics rendering performance.

The change tried to match the monotonic base update logic:

 seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
 nsec = (u32) tk->wall_to_monotonic.tv_nsec;
 tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the wall_to_monotonic.tv_nsec value, but not the
tk->tkr_mono.xtime_nsec value to the base.

To fix this, simplify the tkr_raw.base accumulation to only accumulate the
raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
will be added at read time.

Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
Reported-and-tested-by: Chris Wilson 
Signed-off-by: John Stultz 
Signed-off-by: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Kevin Brodsky 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Will Deacon 
Cc: Miroslav Lichvar 
Cc: Daniel Mentz 
Link: 
http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stu...@linaro.org

---
 kernel/time/timekeeping.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..7e7e61c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper 
*tk)
tk->ktime_sec = seconds;
 
/* Update the monotonic raw base */
-   seconds = tk->raw_sec;
-   nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
-   tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
+   tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC);
 }
 
 /* must hold timekeeper_lock */


[tip:timers/urgent] time: Fix ktime_get_raw() incorrect base accumulation

2017-08-26 Thread tip-bot for John Stultz
Commit-ID:  0bcdc0987cce9880436b70836c6a92bb8e744fd1
Gitweb: http://git.kernel.org/tip/0bcdc0987cce9880436b70836c6a92bb8e744fd1
Author: John Stultz 
AuthorDate: Fri, 25 Aug 2017 15:57:04 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 26 Aug 2017 16:06:12 +0200

time: Fix ktime_get_raw() incorrect base accumulation

In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
handling"), the following code got mistakenly added to the update of the
raw timekeeper:

 /* Update the monotonic raw base */
 seconds = tk->raw_sec;
 nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
 tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the raw_sec value and the shifted down raw xtime_nsec to the
base value.

But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
another time, The result of this is that ktime_get_raw() users (which are
all internal users) see the raw time move faster then it should (the rate
at which can vary with the current size of tkr_raw.xtime_nsec), which has
resulted in at least problems with graphics rendering performance.

The change tried to match the monotonic base update logic:

 seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
 nsec = (u32) tk->wall_to_monotonic.tv_nsec;
 tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the wall_to_monotonic.tv_nsec value, but not the
tk->tkr_mono.xtime_nsec value to the base.

To fix this, simplify the tkr_raw.base accumulation to only accumulate the
raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
will be added at read time.

Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
Reported-and-tested-by: Chris Wilson 
Signed-off-by: John Stultz 
Signed-off-by: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Kevin Brodsky 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Will Deacon 
Cc: Miroslav Lichvar 
Cc: Daniel Mentz 
Link: 
http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stu...@linaro.org

---
 kernel/time/timekeeping.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..7e7e61c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper 
*tk)
tk->ktime_sec = seconds;
 
/* Update the monotonic raw base */
-   seconds = tk->raw_sec;
-   nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
-   tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
+   tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC);
 }
 
 /* must hold timekeeper_lock */


[tip:timers/urgent] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

2017-06-20 Thread tip-bot for John Stultz
Commit-ID:  3d88d56c5873f6eebe23e05c3da701960146b801
Gitweb: http://git.kernel.org/tip/3d88d56c5873f6eebe23e05c3da701960146b801
Author: John Stultz 
AuthorDate: Thu, 8 Jun 2017 16:44:21 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 20 Jun 2017 10:41:50 +0200

time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz 
Tested-by: Daniel Mentz 
Cc: Prarit Bhargava 
Cc: Kevin Brodsky 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Will Deacon 
Cc: "stable #4 . 8+" 
Cc: Miroslav Lichvar 
Link: 
http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeper_internal.h |  4 ++--
 kernel/time/timekeeping.c   | 19 ++-
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e9834ad..f7043cc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -57,7 +57,7 @@ struct tk_read_base {
  * interval.
  * @xtime_remainder:   Shifted nano seconds left over when rounding
  * @cycle_interval
- * @raw_interval:  Raw nano seconds accumulated per NTP interval.
+ * @raw_interval:  Shifted raw nano seconds accumulated per NTP interval.
  * @ntp_error: Difference between accumulated time and NTP time in ntp
  * shifted nano seconds.
  * @ntp_error_shift:   Shift conversion between clock shifted nano seconds and
@@ -99,7 +99,7 @@ struct timekeeper {
u64 cycle_interval;
u64 xtime_interval;
s64 xtime_remainder;
-   u32 raw_interval;
+   u64 raw_interval;
/* The ntp_tick_length() value currently being used.
 * This cached copy ensures we consistently apply the tick
 * length for an entire tick, as ntp_tick_length may change
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index eff94cb..b602c48 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -280,7 +280,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
/* Go back from cycles -> shifted ns */
tk->xtime_interval = interval * clock->mult;
tk->xtime_remainder = ntpinterval - tk->xtime_interval;
-   tk->raw_interval = (interval * clock->mult) >> clock->shift;
+   tk->raw_interval = interval * clock->mult;
 
 /* if changing clocks, convert xtime_nsec shift units */
if (old_clock) {
@@ -1996,7 +1996,7 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
u32 shift, unsigned int *clock_set)
 {
u64 interval = tk->cycle_interval << shift;
-   u64 raw_nsecs;
+   u64 snsec_per_sec;
 
/* If the offset is smaller than a shifted interval, do nothing */
if (offset < interval)
@@ -2011,14 +2011,15 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
*clock_set |= accumulate_nsecs_to_secs(tk);
 
/* Accumulate raw time */
-   raw_nsecs = (u64)tk->raw_interval << shift;
-   raw_nsecs += tk->raw_time.tv_nsec;
-   if (raw_nsecs >= NSEC_PER_SEC) {
-   u64 raw_secs = raw_nsecs;
-   raw_nsecs = do_div(raw_secs, NSEC_PER_SEC);
-   tk->raw_time.tv_sec += raw_secs;
+   tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << 
tk->tkr_raw.shift;
+   tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
+   snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+   while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) {
+   tk->tkr_raw.xtime_nsec -= snsec_per_sec;
+   tk->raw_time.tv_sec++;
}
-   tk->raw_time.tv_nsec = 

[tip:timers/urgent] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

2017-06-20 Thread tip-bot for John Stultz
Commit-ID:  3d88d56c5873f6eebe23e05c3da701960146b801
Gitweb: http://git.kernel.org/tip/3d88d56c5873f6eebe23e05c3da701960146b801
Author: John Stultz 
AuthorDate: Thu, 8 Jun 2017 16:44:21 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 20 Jun 2017 10:41:50 +0200

time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz 
Tested-by: Daniel Mentz 
Cc: Prarit Bhargava 
Cc: Kevin Brodsky 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Will Deacon 
Cc: "stable #4 . 8+" 
Cc: Miroslav Lichvar 
Link: 
http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeper_internal.h |  4 ++--
 kernel/time/timekeeping.c   | 19 ++-
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e9834ad..f7043cc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -57,7 +57,7 @@ struct tk_read_base {
  * interval.
  * @xtime_remainder:   Shifted nano seconds left over when rounding
  * @cycle_interval
- * @raw_interval:  Raw nano seconds accumulated per NTP interval.
+ * @raw_interval:  Shifted raw nano seconds accumulated per NTP interval.
  * @ntp_error: Difference between accumulated time and NTP time in ntp
  * shifted nano seconds.
  * @ntp_error_shift:   Shift conversion between clock shifted nano seconds and
@@ -99,7 +99,7 @@ struct timekeeper {
u64 cycle_interval;
u64 xtime_interval;
s64 xtime_remainder;
-   u32 raw_interval;
+   u64 raw_interval;
/* The ntp_tick_length() value currently being used.
 * This cached copy ensures we consistently apply the tick
 * length for an entire tick, as ntp_tick_length may change
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index eff94cb..b602c48 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -280,7 +280,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
/* Go back from cycles -> shifted ns */
tk->xtime_interval = interval * clock->mult;
tk->xtime_remainder = ntpinterval - tk->xtime_interval;
-   tk->raw_interval = (interval * clock->mult) >> clock->shift;
+   tk->raw_interval = interval * clock->mult;
 
 /* if changing clocks, convert xtime_nsec shift units */
if (old_clock) {
@@ -1996,7 +1996,7 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
u32 shift, unsigned int *clock_set)
 {
u64 interval = tk->cycle_interval << shift;
-   u64 raw_nsecs;
+   u64 snsec_per_sec;
 
/* If the offset is smaller than a shifted interval, do nothing */
if (offset < interval)
@@ -2011,14 +2011,15 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
*clock_set |= accumulate_nsecs_to_secs(tk);
 
/* Accumulate raw time */
-   raw_nsecs = (u64)tk->raw_interval << shift;
-   raw_nsecs += tk->raw_time.tv_nsec;
-   if (raw_nsecs >= NSEC_PER_SEC) {
-   u64 raw_secs = raw_nsecs;
-   raw_nsecs = do_div(raw_secs, NSEC_PER_SEC);
-   tk->raw_time.tv_sec += raw_secs;
+   tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << 
tk->tkr_raw.shift;
+   tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
+   snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+   while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) {
+   tk->tkr_raw.xtime_nsec -= snsec_per_sec;
+   tk->raw_time.tv_sec++;
}
-   tk->raw_time.tv_nsec = raw_nsecs;
+   tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
+   tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << 
tk->tkr_raw.shift;
 
/* Accumulate error between NTP and clock interval */
tk->ntp_error += tk->ntp_tick << 

[tip:timers/urgent] time: Fix clock->read(clock) race around clocksource changes

2017-06-20 Thread tip-bot for John Stultz
Commit-ID:  ceea5e3771ed2378668455fa21861bead7504df5
Gitweb: http://git.kernel.org/tip/ceea5e3771ed2378668455fa21861bead7504df5
Author: John Stultz 
AuthorDate: Thu, 8 Jun 2017 16:44:20 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 20 Jun 2017 10:41:50 +0200

time: Fix clock->read(clock) race around clocksource changes

In tests, which excercise switching of clocksources, a NULL
pointer dereference can be observed on AMR64 platforms in the
clocksource read() function:

u64 clocksource_mmio_readl_down(struct clocksource *c)
{
return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}

This is called from the core timekeeping code via:

cycle_now = tkr->read(tkr->clock);

tkr->read is the cached tkr->clock->read() function pointer.
When the clocksource is changed then tkr->clock and tkr->read
are updated sequentially. The code above results in a sequential
load operation of tkr->read and tkr->clock as well.

If the store to tkr->clock hits between the loads of tkr->read
and tkr->clock, then the old read() function is called with the
new clock pointer. As a consequence the read() function
dereferences a different data structure and the resulting 'reg'
pointer can point anywhere including NULL.

This problem was introduced when the timekeeping code was
switched over to use struct tk_read_base. Before that, it was
theoretically possible as well when the compiler decided to
reload clock in the code sequence:

 now = tk->clock->read(tk->clock);

Add a helper function which avoids the issue by reading
tk_read_base->clock once into a local variable clk and then issue
the read function via clk->read(clk). This guarantees that the
read() function always gets the proper clocksource pointer handed
in.

Since there is now no use for the tkr.read pointer, this patch
also removes it, and to address stopping the fast timekeeper
during suspend/resume, it introduces a dummy clocksource to use
rather then just a dummy read function.

Signed-off-by: John Stultz 
Acked-by: Ingo Molnar 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: stable 
Cc: Miroslav Lichvar 
Cc: Daniel Mentz 
Link: 
http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeper_internal.h |  1 -
 kernel/time/timekeeping.c   | 52 +
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 110f453..e9834ad 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -29,7 +29,6 @@
  */
 struct tk_read_base {
struct clocksource  *clock;
-   u64 (*read)(struct clocksource *cs);
u64 mask;
u64 cycle_last;
u32 mult;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9652bc5..eff94cb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+/*
+ * tk_clock_read - atomic clocksource read() helper
+ *
+ * This helper is necessary to use in the read paths because, while the
+ * seqlock ensures we don't return a bad value while structures are updated,
+ * it doesn't protect from potential crashes. There is the possibility that
+ * the tkr's clocksource may change between the read reference, and the
+ * clock reference passed to the read function.  This can cause crashes if
+ * the wrong clocksource is passed to the wrong read function.
+ * This isn't necessary to use when holding the timekeeper_lock or doing
+ * a read of the fast-timekeeper tkrs (which is protected by its own locking
+ * and update logic).
+ */
+static inline u64 tk_clock_read(struct tk_read_base *tkr)
+{
+   struct clocksource *clock = READ_ONCE(tkr->clock);
+
+   return clock->read(clock);
+}
+
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
 
@@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base 
*tkr)
 */
do {
seq = read_seqcount_begin(_core.seq);
-   now = tkr->read(tkr->clock);
+   now = tk_clock_read(tkr);
last = tkr->cycle_last;
mask = tkr->mask;
max = tkr->clock->max_cycles;
@@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base 
*tkr)
u64 cycle_now, delta;
 
/* read clocksource */
-   cycle_now = 

[tip:timers/urgent] time: Fix clock->read(clock) race around clocksource changes

2017-06-20 Thread tip-bot for John Stultz
Commit-ID:  ceea5e3771ed2378668455fa21861bead7504df5
Gitweb: http://git.kernel.org/tip/ceea5e3771ed2378668455fa21861bead7504df5
Author: John Stultz 
AuthorDate: Thu, 8 Jun 2017 16:44:20 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 20 Jun 2017 10:41:50 +0200

time: Fix clock->read(clock) race around clocksource changes

In tests, which excercise switching of clocksources, a NULL
pointer dereference can be observed on AMR64 platforms in the
clocksource read() function:

u64 clocksource_mmio_readl_down(struct clocksource *c)
{
return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}

This is called from the core timekeeping code via:

cycle_now = tkr->read(tkr->clock);

tkr->read is the cached tkr->clock->read() function pointer.
When the clocksource is changed then tkr->clock and tkr->read
are updated sequentially. The code above results in a sequential
load operation of tkr->read and tkr->clock as well.

If the store to tkr->clock hits between the loads of tkr->read
and tkr->clock, then the old read() function is called with the
new clock pointer. As a consequence the read() function
dereferences a different data structure and the resulting 'reg'
pointer can point anywhere including NULL.

This problem was introduced when the timekeeping code was
switched over to use struct tk_read_base. Before that, it was
theoretically possible as well when the compiler decided to
reload clock in the code sequence:

 now = tk->clock->read(tk->clock);

Add a helper function which avoids the issue by reading
tk_read_base->clock once into a local variable clk and then issue
the read function via clk->read(clk). This guarantees that the
read() function always gets the proper clocksource pointer handed
in.

Since there is now no use for the tkr.read pointer, this patch
also removes it, and to address stopping the fast timekeeper
during suspend/resume, it introduces a dummy clocksource to use
rather then just a dummy read function.

Signed-off-by: John Stultz 
Acked-by: Ingo Molnar 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: stable 
Cc: Miroslav Lichvar 
Cc: Daniel Mentz 
Link: 
http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeper_internal.h |  1 -
 kernel/time/timekeeping.c   | 52 +
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 110f453..e9834ad 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -29,7 +29,6 @@
  */
 struct tk_read_base {
struct clocksource  *clock;
-   u64 (*read)(struct clocksource *cs);
u64 mask;
u64 cycle_last;
u32 mult;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9652bc5..eff94cb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+/*
+ * tk_clock_read - atomic clocksource read() helper
+ *
+ * This helper is necessary to use in the read paths because, while the
+ * seqlock ensures we don't return a bad value while structures are updated,
+ * it doesn't protect from potential crashes. There is the possibility that
+ * the tkr's clocksource may change between the read reference, and the
+ * clock reference passed to the read function.  This can cause crashes if
+ * the wrong clocksource is passed to the wrong read function.
+ * This isn't necessary to use when holding the timekeeper_lock or doing
+ * a read of the fast-timekeeper tkrs (which is protected by its own locking
+ * and update logic).
+ */
+static inline u64 tk_clock_read(struct tk_read_base *tkr)
+{
+   struct clocksource *clock = READ_ONCE(tkr->clock);
+
+   return clock->read(clock);
+}
+
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
 
@@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base 
*tkr)
 */
do {
seq = read_seqcount_begin(_core.seq);
-   now = tkr->read(tkr->clock);
+   now = tk_clock_read(tkr);
last = tkr->cycle_last;
mask = tkr->mask;
max = tkr->clock->max_cycles;
@@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base 
*tkr)
u64 cycle_now, delta;
 
/* read clocksource */
-   cycle_now = tkr->read(tkr->clock);
+   cycle_now = tk_clock_read(tkr);
 
/* calculate the delta since the last update_wall_time */
delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -238,12 +258,10 @@ static void tk_setup_internals(struct 

[tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression

2016-10-05 Thread tip-bot for John Stultz
Commit-ID:  58bfea9532552d422bde7afa207e1a0f08dffa7d
Gitweb: http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d
Author: John Stultz 
AuthorDate: Tue, 4 Oct 2016 19:55:48 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 5 Oct 2016 15:44:46 +0200

timekeeping: Fix __ktime_get_fast_ns() regression

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper 0 [000]   253.427536:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426573:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426687:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426800:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426905:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427022:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427127:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427239:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427346:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427463:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   255.426572:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper 0 [000]39.953768:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.064839:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.175956:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.287103:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.398217:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.509324:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.620437:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.731546:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.842654:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.953772:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]41.064881:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with 
CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg 
Reported-by: Alexei Starovoitov 
Tested-and-reviewed-by: Mathieu Desnoyers 
Signed-off-by: John Stultz 
Cc: Peter Zijlstra 
Cc: stable 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e07fb09..37dec7e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct 
tk_fast *tkf)
tkr = tkf->base + (seq & 0x01);
now = ktime_to_ns(tkr->base);
 
-   now += clocksource_delta(tkr->read(tkr->clock),
-tkr->cycle_last, tkr->mask);
+   now += timekeeping_delta_to_ns(tkr,
+   

[tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression

2016-10-05 Thread tip-bot for John Stultz
Commit-ID:  58bfea9532552d422bde7afa207e1a0f08dffa7d
Gitweb: http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d
Author: John Stultz 
AuthorDate: Tue, 4 Oct 2016 19:55:48 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 5 Oct 2016 15:44:46 +0200

timekeeping: Fix __ktime_get_fast_ns() regression

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper 0 [000]   253.427536:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426573:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426687:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426800:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.426905:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427022:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427127:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427239:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427346:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   254.427463:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]   255.426572:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper 0 [000]39.953768:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.064839:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.175956:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.287103:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.398217:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.509324:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.620437:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.731546:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.842654:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]40.953772:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])
 swapper 0 [000]41.064881:  1 cpu-clock:  810a0de6 
native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with 
CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg 
Reported-by: Alexei Starovoitov 
Tested-and-reviewed-by: Mathieu Desnoyers 
Signed-off-by: John Stultz 
Cc: Peter Zijlstra 
Cc: stable 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e07fb09..37dec7e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct 
tk_fast *tkf)
tkr = tkf->base + (seq & 0x01);
now = ktime_to_ns(tkr->base);
 
-   now += clocksource_delta(tkr->read(tkr->clock),
-tkr->cycle_last, tkr->mask);
+   now += timekeeping_delta_to_ns(tkr,
+   clocksource_delta(
+   tkr->read(tkr->clock),
+   tkr->cycle_last,
+   tkr->mask));
} while (read_seqcount_retry(>seq, seq));
 
   

[tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING

2016-08-24 Thread tip-bot for John Stultz
Commit-ID:  27727df240c7cc84f2ba6047c6f18d5addfd25ef
Gitweb: http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef
Author: John Stultz 
AuthorDate: Tue, 23 Aug 2016 16:08:21 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 24 Aug 2016 09:34:31 +0200

timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are 
observed"
Reported-by: Steven Rostedt 
Reported-by: Peter Zijlstra 
Signed-off-by: John Stultz 
Cc: stable 
Link: 
http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3b65746..e07fb09 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct 
tk_fast *tkf)
do {
seq = raw_read_seqcount_latch(>seq);
tkr = tkf->base + (seq & 0x01);
-   now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
+   now = ktime_to_ns(tkr->base);
+
+   now += clocksource_delta(tkr->read(tkr->clock),
+tkr->cycle_last, tkr->mask);
} while (read_seqcount_retry(>seq, seq));
 
return now;


[tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING

2016-08-24 Thread tip-bot for John Stultz
Commit-ID:  27727df240c7cc84f2ba6047c6f18d5addfd25ef
Gitweb: http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef
Author: John Stultz 
AuthorDate: Tue, 23 Aug 2016 16:08:21 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 24 Aug 2016 09:34:31 +0200

timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are 
observed"
Reported-by: Steven Rostedt 
Reported-by: Peter Zijlstra 
Signed-off-by: John Stultz 
Cc: stable 
Link: 
http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3b65746..e07fb09 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct 
tk_fast *tkf)
do {
seq = raw_read_seqcount_latch(>seq);
tkr = tkf->base + (seq & 0x01);
-   now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
+   now = ktime_to_ns(tkr->base);
+
+   now += clocksource_delta(tkr->read(tkr->clock),
+tkr->cycle_last, tkr->mask);
} while (read_seqcount_retry(>seq, seq));
 
return now;


[tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug

2016-08-24 Thread tip-bot for John Stultz
Commit-ID:  a4f8f6667f099036c88f231dcad4cf233652c824
Gitweb: http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824
Author: John Stultz 
AuthorDate: Tue, 23 Aug 2016 16:08:22 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 24 Aug 2016 09:34:32 +0200

timekeeping: Cap array access in timekeeping_debug

It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.

However there is actually no other process would like to grab this lock on
that problematic platform.

Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.

As depicted by System.map:
0x81c9d080 b sleep_time_bin
0x81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
 issue slightly differently, but borrowed his excelent explanation of the
 issue here.]

Fixes: 5c83545f24ab "power: Add option to log time spent in suspend"
Reported-by: Janek Kozicki 
Reported-by: Chen Yu 
Signed-off-by: John Stultz 
Cc: linux...@vger.kernel.org
Cc: Peter Zijlstra 
Cc: Xunlei Pang 
Cc: "Rafael J. Wysocki" 
Cc: stable 
Cc: Zhang Rui 
Link: 
http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping_debug.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..107310a 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -23,7 +23,9 @@
 
 #include "timekeeping_internal.h"
 
-static unsigned int sleep_time_bin[32] = {0};
+#define NUM_BINS 32
+
+static unsigned int sleep_time_bin[NUM_BINS] = {0};
 
 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);
 
 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-   sleep_time_bin[fls(t->tv_sec)]++;
+   /* Cap bin index so we don't overflow the array */
+   int bin = min(fls(t->tv_sec), NUM_BINS-1);
+
+   sleep_time_bin[bin]++;
 }
 


[tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug

2016-08-24 Thread tip-bot for John Stultz
Commit-ID:  a4f8f6667f099036c88f231dcad4cf233652c824
Gitweb: http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824
Author: John Stultz 
AuthorDate: Tue, 23 Aug 2016 16:08:22 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 24 Aug 2016 09:34:32 +0200

timekeeping: Cap array access in timekeeping_debug

It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.

However there is actually no other process would like to grab this lock on
that problematic platform.

Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.

As depicted by System.map:
0x81c9d080 b sleep_time_bin
0x81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
 issue slightly differently, but borrowed his excelent explanation of the
 issue here.]

Fixes: 5c83545f24ab "power: Add option to log time spent in suspend"
Reported-by: Janek Kozicki 
Reported-by: Chen Yu 
Signed-off-by: John Stultz 
Cc: linux...@vger.kernel.org
Cc: Peter Zijlstra 
Cc: Xunlei Pang 
Cc: "Rafael J. Wysocki" 
Cc: stable 
Cc: Zhang Rui 
Link: 
http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 kernel/time/timekeeping_debug.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..107310a 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -23,7 +23,9 @@
 
 #include "timekeeping_internal.h"
 
-static unsigned int sleep_time_bin[32] = {0};
+#define NUM_BINS 32
+
+static unsigned int sleep_time_bin[NUM_BINS] = {0};
 
 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);
 
 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-   sleep_time_bin[fls(t->tv_sec)]++;
+   /* Cap bin index so we don't overflow the array */
+   int bin = min(fls(t->tv_sec), NUM_BINS-1);
+
+   sleep_time_bin[bin]++;
 }
 


[tip:timers/urgent] time: Make settimeofday error checking work again

2016-06-01 Thread tip-bot for John Stultz
Commit-ID:  dfc2507b26af22b0bbc85251b8545b36d8bc5d72
Gitweb: http://git.kernel.org/tip/dfc2507b26af22b0bbc85251b8545b36d8bc5d72
Author: John Stultz 
AuthorDate: Wed, 1 Jun 2016 11:53:26 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 1 Jun 2016 21:13:43 +0200

time: Make settimeofday error checking work again

In commit 86d3473224b0 some of the checking for a valid timeval
was subtley changed which caused -EINVAL to be returned whenever
the timeval was null.

However, it is possible to set the timezone data while specifying
a NULL timeval, which is usually done to handle systems where the
RTC keeps local time instead of UTC. Thus the patch causes such
systems to have the time incorrectly set.

This patch addresses the issue by handling the error conditionals
in the same way as was done previously.

Fixes: 86d3473224b0 "time: Introduce do_sys_settimeofday64()"
Reported-by: Mika Westerberg 
Signed-off-by: John Stultz 
Tested-by: Mika Westerberg 
Cc: Prarit Bhargava 
Cc: Arnd Bergmann 
Cc: Baolin Wang 
Cc: Richard Cochran 
Cc: Shuah Khan 
Link: 
http://lkml.kernel.org/r/1464807207-16530-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeping.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 37dbacf..816b754 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -21,6 +21,9 @@ static inline int do_sys_settimeofday(const struct timespec 
*tv,
struct timespec64 ts64;
 
if (!tv)
+   return do_sys_settimeofday64(NULL, tz);
+
+   if (!timespec_valid(tv))
return -EINVAL;
 
ts64 = timespec_to_timespec64(*tv);


[tip:timers/urgent] time: Make settimeofday error checking work again

2016-06-01 Thread tip-bot for John Stultz
Commit-ID:  dfc2507b26af22b0bbc85251b8545b36d8bc5d72
Gitweb: http://git.kernel.org/tip/dfc2507b26af22b0bbc85251b8545b36d8bc5d72
Author: John Stultz 
AuthorDate: Wed, 1 Jun 2016 11:53:26 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 1 Jun 2016 21:13:43 +0200

time: Make settimeofday error checking work again

In commit 86d3473224b0 some of the checking for a valid timeval
was subtley changed which caused -EINVAL to be returned whenever
the timeval was null.

However, it is possible to set the timezone data while specifying
a NULL timeval, which is usually done to handle systems where the
RTC keeps local time instead of UTC. Thus the patch causes such
systems to have the time incorrectly set.

This patch addresses the issue by handling the error conditionals
in the same way as was done previously.

Fixes: 86d3473224b0 "time: Introduce do_sys_settimeofday64()"
Reported-by: Mika Westerberg 
Signed-off-by: John Stultz 
Tested-by: Mika Westerberg 
Cc: Prarit Bhargava 
Cc: Arnd Bergmann 
Cc: Baolin Wang 
Cc: Richard Cochran 
Cc: Shuah Khan 
Link: 
http://lkml.kernel.org/r/1464807207-16530-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/timekeeping.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 37dbacf..816b754 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -21,6 +21,9 @@ static inline int do_sys_settimeofday(const struct timespec 
*tv,
struct timespec64 ts64;
 
if (!tv)
+   return do_sys_settimeofday64(NULL, tz);
+
+   if (!timespec_valid(tv))
return -EINVAL;
 
ts64 = timespec_to_timespec64(*tv);


[tip:timers/urgent] kselftests: timers: Add adjtimex SETOFFSET validity tests

2016-01-26 Thread tip-bot for John Stultz
Commit-ID:  e03a58c320e1103ebe97bda8ebdfcc5c9829c53f
Gitweb: http://git.kernel.org/tip/e03a58c320e1103ebe97bda8ebdfcc5c9829c53f
Author: John Stultz 
AuthorDate: Thu, 21 Jan 2016 15:03:35 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 26 Jan 2016 16:26:06 +0100

kselftests: timers: Add adjtimex SETOFFSET validity tests

Add some simple tests to check both valid and invalid
offsets when using adjtimex's ADJ_SETOFFSET method.

Signed-off-by: John Stultz 
Acked-by: Shuah Khan 
Cc: Sasha Levin 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Cc: Harald Hoyer 
Cc: Kay Sievers 
Cc: David Herrmann 
Link: 
http://lkml.kernel.org/r/1453417415-19110-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 tools/testing/selftests/timers/valid-adjtimex.c | 139 +++-
 1 file changed, 138 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timers/valid-adjtimex.c 
b/tools/testing/selftests/timers/valid-adjtimex.c
index e86d937..60fe3c5 100644
--- a/tools/testing/selftests/timers/valid-adjtimex.c
+++ b/tools/testing/selftests/timers/valid-adjtimex.c
@@ -45,7 +45,17 @@ static inline int ksft_exit_fail(void)
 }
 #endif
 
-#define NSEC_PER_SEC 10L
+#define NSEC_PER_SEC 10LL
+#define USEC_PER_SEC 100LL
+
+#define ADJ_SETOFFSET 0x0100
+
+#include 
+static int clock_adjtime(clockid_t id, struct timex *tx)
+{
+   return syscall(__NR_clock_adjtime, id, tx);
+}
+
 
 /* clear NTP time_status & time_state */
 int clear_time_state(void)
@@ -193,10 +203,137 @@ out:
 }
 
 
+int set_offset(long long offset, int use_nano)
+{
+   struct timex tmx = {};
+   int ret;
+
+   tmx.modes = ADJ_SETOFFSET;
+   if (use_nano) {
+   tmx.modes |= ADJ_NANO;
+
+   tmx.time.tv_sec = offset / NSEC_PER_SEC;
+   tmx.time.tv_usec = offset % NSEC_PER_SEC;
+
+   if (offset < 0 && tmx.time.tv_usec) {
+   tmx.time.tv_sec -= 1;
+   tmx.time.tv_usec += NSEC_PER_SEC;
+   }
+   } else {
+   tmx.time.tv_sec = offset / USEC_PER_SEC;
+   tmx.time.tv_usec = offset % USEC_PER_SEC;
+
+   if (offset < 0 && tmx.time.tv_usec) {
+   tmx.time.tv_sec -= 1;
+   tmx.time.tv_usec += USEC_PER_SEC;
+   }
+   }
+
+   ret = clock_adjtime(CLOCK_REALTIME, );
+   if (ret < 0) {
+   printf("(sec: %ld  usec: %ld) ", tmx.time.tv_sec, 
tmx.time.tv_usec);
+   printf("[FAIL]\n");
+   return -1;
+   }
+   return 0;
+}
+
+int set_bad_offset(long sec, long usec, int use_nano)
+{
+   struct timex tmx = {};
+   int ret;
+
+   tmx.modes = ADJ_SETOFFSET;
+   if (use_nano)
+   tmx.modes |= ADJ_NANO;
+
+   tmx.time.tv_sec = sec;
+   tmx.time.tv_usec = usec;
+   ret = clock_adjtime(CLOCK_REALTIME, );
+   if (ret >= 0) {
+   printf("Invalid (sec: %ld  usec: %ld) did not fail! ", 
tmx.time.tv_sec, tmx.time.tv_usec);
+   printf("[FAIL]\n");
+   return -1;
+   }
+   return 0;
+}
+
+int validate_set_offset(void)
+{
+   printf("Testing ADJ_SETOFFSET... ");
+
+   /* Test valid values */
+   if (set_offset(NSEC_PER_SEC - 1, 1))
+   return -1;
+
+   if (set_offset(-NSEC_PER_SEC + 1, 1))
+   return -1;
+
+   if (set_offset(-NSEC_PER_SEC - 1, 1))
+   return -1;
+
+   if (set_offset(5 * NSEC_PER_SEC, 1))
+   return -1;
+
+   if (set_offset(-5 * NSEC_PER_SEC, 1))
+   return -1;
+
+   if (set_offset(5 * NSEC_PER_SEC + NSEC_PER_SEC / 2, 1))
+   return -1;
+
+   if (set_offset(-5 * NSEC_PER_SEC - NSEC_PER_SEC / 2, 1))
+   return -1;
+
+   if (set_offset(USEC_PER_SEC - 1, 0))
+   return -1;
+
+   if (set_offset(-USEC_PER_SEC + 1, 0))
+   return -1;
+
+   if (set_offset(-USEC_PER_SEC - 1, 0))
+   return -1;
+
+   if (set_offset(5 * USEC_PER_SEC, 0))
+   return -1;
+
+   if (set_offset(-5 * USEC_PER_SEC, 0))
+   return -1;
+
+   if (set_offset(5 * USEC_PER_SEC + USEC_PER_SEC / 2, 0))
+   return -1;
+
+   if (set_offset(-5 * USEC_PER_SEC - USEC_PER_SEC / 2, 0))
+   return -1;
+
+   /* Test invalid values */
+   if (set_bad_offset(0, -1, 1))
+   return -1;
+   if (set_bad_offset(0, -1, 0))
+   return -1;
+   if (set_bad_offset(0, 2 * NSEC_PER_SEC, 1))
+   return -1;
+   if (set_bad_offset(0, 2 * USEC_PER_SEC, 0))
+   return -1;
+   if (set_bad_offset(0, NSEC_PER_SEC, 1))
+   return -1;
+   if (set_bad_offset(0, USEC_PER_SEC, 0))
+   return -1;
+   if (set_bad_offset(0, -NSEC_PER_SEC, 1))
+   return -1;
+   if (set_bad_offset(0, -USEC_PER_SEC, 0))
+  

[tip:timers/urgent] kselftests: timers: Add adjtimex SETOFFSET validity tests

2016-01-26 Thread tip-bot for John Stultz
Commit-ID:  e03a58c320e1103ebe97bda8ebdfcc5c9829c53f
Gitweb: http://git.kernel.org/tip/e03a58c320e1103ebe97bda8ebdfcc5c9829c53f
Author: John Stultz 
AuthorDate: Thu, 21 Jan 2016 15:03:35 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 26 Jan 2016 16:26:06 +0100

kselftests: timers: Add adjtimex SETOFFSET validity tests

Add some simple tests to check both valid and invalid
offsets when using adjtimex's ADJ_SETOFFSET method.

Signed-off-by: John Stultz 
Acked-by: Shuah Khan 
Cc: Sasha Levin 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Cc: Harald Hoyer 
Cc: Kay Sievers 
Cc: David Herrmann 
Link: 
http://lkml.kernel.org/r/1453417415-19110-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 tools/testing/selftests/timers/valid-adjtimex.c | 139 +++-
 1 file changed, 138 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timers/valid-adjtimex.c 
b/tools/testing/selftests/timers/valid-adjtimex.c
index e86d937..60fe3c5 100644
--- a/tools/testing/selftests/timers/valid-adjtimex.c
+++ b/tools/testing/selftests/timers/valid-adjtimex.c
@@ -45,7 +45,17 @@ static inline int ksft_exit_fail(void)
 }
 #endif
 
-#define NSEC_PER_SEC 10L
+#define NSEC_PER_SEC 10LL
+#define USEC_PER_SEC 100LL
+
+#define ADJ_SETOFFSET 0x0100
+
+#include 
+static int clock_adjtime(clockid_t id, struct timex *tx)
+{
+   return syscall(__NR_clock_adjtime, id, tx);
+}
+
 
 /* clear NTP time_status & time_state */
 int clear_time_state(void)
@@ -193,10 +203,137 @@ out:
 }
 
 
+int set_offset(long long offset, int use_nano)
+{
+   struct timex tmx = {};
+   int ret;
+
+   tmx.modes = ADJ_SETOFFSET;
+   if (use_nano) {
+   tmx.modes |= ADJ_NANO;
+
+   tmx.time.tv_sec = offset / NSEC_PER_SEC;
+   tmx.time.tv_usec = offset % NSEC_PER_SEC;
+
+   if (offset < 0 && tmx.time.tv_usec) {
+   tmx.time.tv_sec -= 1;
+   tmx.time.tv_usec += NSEC_PER_SEC;
+   }
+   } else {
+   tmx.time.tv_sec = offset / USEC_PER_SEC;
+   tmx.time.tv_usec = offset % USEC_PER_SEC;
+
+   if (offset < 0 && tmx.time.tv_usec) {
+   tmx.time.tv_sec -= 1;
+   tmx.time.tv_usec += USEC_PER_SEC;
+   }
+   }
+
+   ret = clock_adjtime(CLOCK_REALTIME, );
+   if (ret < 0) {
+   printf("(sec: %ld  usec: %ld) ", tmx.time.tv_sec, 
tmx.time.tv_usec);
+   printf("[FAIL]\n");
+   return -1;
+   }
+   return 0;
+}
+
+int set_bad_offset(long sec, long usec, int use_nano)
+{
+   struct timex tmx = {};
+   int ret;
+
+   tmx.modes = ADJ_SETOFFSET;
+   if (use_nano)
+   tmx.modes |= ADJ_NANO;
+
+   tmx.time.tv_sec = sec;
+   tmx.time.tv_usec = usec;
+   ret = clock_adjtime(CLOCK_REALTIME, );
+   if (ret >= 0) {
+   printf("Invalid (sec: %ld  usec: %ld) did not fail! ", 
tmx.time.tv_sec, tmx.time.tv_usec);
+   printf("[FAIL]\n");
+   return -1;
+   }
+   return 0;
+}
+
+int validate_set_offset(void)
+{
+   printf("Testing ADJ_SETOFFSET... ");
+
+   /* Test valid values */
+   if (set_offset(NSEC_PER_SEC - 1, 1))
+   return -1;
+
+   if (set_offset(-NSEC_PER_SEC + 1, 1))
+   return -1;
+
+   if (set_offset(-NSEC_PER_SEC - 1, 1))
+   return -1;
+
+   if (set_offset(5 * NSEC_PER_SEC, 1))
+   return -1;
+
+   if (set_offset(-5 * NSEC_PER_SEC, 1))
+   return -1;
+
+   if (set_offset(5 * NSEC_PER_SEC + NSEC_PER_SEC / 2, 1))
+   return -1;
+
+   if (set_offset(-5 * NSEC_PER_SEC - NSEC_PER_SEC / 2, 1))
+   return -1;
+
+   if (set_offset(USEC_PER_SEC - 1, 0))
+   return -1;
+
+   if (set_offset(-USEC_PER_SEC + 1, 0))
+   return -1;
+
+   if (set_offset(-USEC_PER_SEC - 1, 0))
+   return -1;
+
+   if (set_offset(5 * USEC_PER_SEC, 0))
+   return -1;
+
+   if (set_offset(-5 * USEC_PER_SEC, 0))
+   return -1;
+
+   if (set_offset(5 * USEC_PER_SEC + USEC_PER_SEC / 2, 0))
+   return -1;
+
+   if (set_offset(-5 * USEC_PER_SEC - USEC_PER_SEC / 2, 0))
+   return -1;
+
+   /* Test invalid values */
+   if (set_bad_offset(0, -1, 1))
+   return -1;
+   if (set_bad_offset(0, -1, 0))
+   return -1;
+   if (set_bad_offset(0, 2 * NSEC_PER_SEC, 1))
+   return -1;
+   if (set_bad_offset(0, 2 * USEC_PER_SEC, 0))
+   return -1;
+   if (set_bad_offset(0, 

[tip:timers/urgent] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

2016-01-22 Thread tip-bot for John Stultz
Commit-ID:  dd4e17ab704269bce71402285f5e8b9ac24b1eff
Gitweb: http://git.kernel.org/tip/dd4e17ab704269bce71402285f5e8b9ac24b1eff
Author: John Stultz 
AuthorDate: Thu, 21 Jan 2016 15:03:34 -0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 22 Jan 2016 12:01:42 +0100

ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed
was actually a timespec (as is signaled with ADJ_NANO).

This resulted in that patch breaking ADJ_SETOFFSET users who set
ADJ_NANO, by rejecting valid timespecs that were compared with
valid timeval ranges.

This patch addresses this by checking for the ADJ_NANO flag and
using the timepsec check instead in that case.

Reported-by: Harald Hoyer 
Reported-by: Kay Sievers 
Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to 
avoid overflow"
Signed-off-by: John Stultz 
Cc: Sasha Levin 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Cc: David Herrmann 
Link: 
http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/ntp.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 36f2ca0..6df8927 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc)
if (!capable(CAP_SYS_TIME))
return -EPERM;
 
-   if (!timeval_inject_offset_valid(>time))
-   return -EINVAL;
+   if (txc->modes & ADJ_NANO) {
+   struct timespec ts;
+
+   ts.tv_sec = txc->time.tv_sec;
+   ts.tv_nsec = txc->time.tv_usec;
+   if (!timespec_inject_offset_valid())
+   return -EINVAL;
+
+   } else {
+   if (!timeval_inject_offset_valid(>time))
+   return -EINVAL;
+   }
}
 
/*


[tip:timers/urgent] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

2016-01-22 Thread tip-bot for John Stultz
Commit-ID:  dd4e17ab704269bce71402285f5e8b9ac24b1eff
Gitweb: http://git.kernel.org/tip/dd4e17ab704269bce71402285f5e8b9ac24b1eff
Author: John Stultz 
AuthorDate: Thu, 21 Jan 2016 15:03:34 -0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 22 Jan 2016 12:01:42 +0100

ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed
was actually a timespec (as is signaled with ADJ_NANO).

This resulted in that patch breaking ADJ_SETOFFSET users who set
ADJ_NANO, by rejecting valid timespecs that were compared with
valid timeval ranges.

This patch addresses this by checking for the ADJ_NANO flag and
using the timepsec check instead in that case.

Reported-by: Harald Hoyer 
Reported-by: Kay Sievers 
Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to 
avoid overflow"
Signed-off-by: John Stultz 
Cc: Sasha Levin 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Cc: David Herrmann 
Link: 
http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/ntp.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 36f2ca0..6df8927 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc)
if (!capable(CAP_SYS_TIME))
return -EPERM;
 
-   if (!timeval_inject_offset_valid(>time))
-   return -EINVAL;
+   if (txc->modes & ADJ_NANO) {
+   struct timespec ts;
+
+   ts.tv_sec = txc->time.tv_sec;
+   ts.tv_nsec = txc->time.tv_usec;
+   if (!timespec_inject_offset_valid())
+   return -EINVAL;
+
+   } else {
+   if (!timeval_inject_offset_valid(>time))
+   return -EINVAL;
+   }
}
 
/*


[tip:timers/core] timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments

2015-10-12 Thread tip-bot for John Stultz
Commit-ID:  6035519fcf5aa17084b41790cdc584d881d82c03
Gitweb: http://git.kernel.org/tip/6035519fcf5aa17084b41790cdc584d881d82c03
Author: John Stultz 
AuthorDate: Mon, 5 Oct 2015 18:16:57 -0700
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Oct 2015 09:51:34 +0200

timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments

Recently a kernel side NTP bug was fixed via the following commit:

  2619d7e9c92d ("time: Fix timekeeping_freqadjust()'s incorrect use of abs() 
instead of abs64()")

When the bug was reported it was difficult to detect, except by
tweaking the adjtimex tick value, and noticing how quickly the
adjustment took:

https://lkml.org/lkml/2015/9/1/488

Thus this patch introduces a new test which manipulates the
adjtimex tick value and validates that the results are what we
expect.

Signed-off-by: John Stultz 
Cc: Linus Torvalds 
Cc: Miroslav Lichvar 
Cc: Nuno Gonçalves 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Shuah Khan 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1444094217-20258-1-git-send-email-john.stu...@linaro.org
[ Tidied up the code and the changelog a bit. ]
Signed-off-by: Ingo Molnar 
---
 tools/testing/selftests/timers/Makefile  |   3 +-
 tools/testing/selftests/timers/adjtick.c | 221 +++
 2 files changed, 223 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timers/Makefile 
b/tools/testing/selftests/timers/Makefile
index 89a3f44..4a1be1b 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread
 TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \
 inconsistency-check raw_skew threadtest rtctest
 
-TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \
+TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \
  skew_consistency clocksource-switch leap-a-day \
  leapcrash set-tai set-2038
 
@@ -24,6 +24,7 @@ include ../lib.mk
 run_destructive_tests: run_tests
./alarmtimer-suspend
./valid-adjtimex
+   ./adjtick
./change_skew
./skew_consistency
./clocksource-switch
diff --git a/tools/testing/selftests/timers/adjtick.c 
b/tools/testing/selftests/timers/adjtick.c
new file mode 100644
index 000..9887fd5
--- /dev/null
+++ b/tools/testing/selftests/timers/adjtick.c
@@ -0,0 +1,221 @@
+/* adjtimex() tick adjustment test
+ * by:   John Stultz 
+ * (C) Copyright Linaro Limited 2015
+ * Licensed under the GPLv2
+ *
+ *  To build:
+ * $ gcc adjtick.c -o adjtick -lrt
+ *
+ *   This program is free software: you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef KTEST
+#include "../kselftest.h"
+#else
+static inline int ksft_exit_pass(void)
+{
+   exit(0);
+}
+static inline int ksft_exit_fail(void)
+{
+   exit(1);
+}
+#endif
+
+#define CLOCK_MONOTONIC_RAW4
+
+#define NSEC_PER_SEC   10LL
+#define USEC_PER_SEC   100
+
+#define MILLION100
+
+long systick;
+
+long long llabs(long long val)
+{
+   if (val < 0)
+   val = -val;
+   return val;
+}
+
+unsigned long long ts_to_nsec(struct timespec ts)
+{
+   return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+}
+
+struct timespec nsec_to_ts(long long ns)
+{
+   struct timespec ts;
+
+   ts.tv_sec = ns/NSEC_PER_SEC;
+   ts.tv_nsec = ns%NSEC_PER_SEC;
+
+   return ts;
+}
+
+long long diff_timespec(struct timespec start, struct timespec end)
+{
+   long long start_ns, end_ns;
+
+   start_ns = ts_to_nsec(start);
+   end_ns = ts_to_nsec(end);
+
+   return end_ns - start_ns;
+}
+
+void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw)
+{
+   struct timespec start, mid, end;
+   long long diff = 0, tmp;
+   int i;
+
+   clock_gettime(CLOCK_MONOTONIC, mon);
+   clock_gettime(CLOCK_MONOTONIC_RAW, raw);
+
+   /* Try to get a more tightly bound pairing */
+   for (i = 0; i < 3; i++) {
+   long long newdiff;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   newdiff = diff_timespec(start, end);
+   if (diff == 0 || newdiff < diff) {
+   diff = newdiff;
+  

[tip:timers/core] timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments

2015-10-12 Thread tip-bot for John Stultz
Commit-ID:  6035519fcf5aa17084b41790cdc584d881d82c03
Gitweb: http://git.kernel.org/tip/6035519fcf5aa17084b41790cdc584d881d82c03
Author: John Stultz 
AuthorDate: Mon, 5 Oct 2015 18:16:57 -0700
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Oct 2015 09:51:34 +0200

timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments

Recently a kernel side NTP bug was fixed via the following commit:

  2619d7e9c92d ("time: Fix timekeeping_freqadjust()'s incorrect use of abs() 
instead of abs64()")

When the bug was reported it was difficult to detect, except by
tweaking the adjtimex tick value, and noticing how quickly the
adjustment took:

https://lkml.org/lkml/2015/9/1/488

Thus this patch introduces a new test which manipulates the
adjtimex tick value and validates that the results are what we
expect.

Signed-off-by: John Stultz 
Cc: Linus Torvalds 
Cc: Miroslav Lichvar 
Cc: Nuno Gonçalves 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Shuah Khan 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1444094217-20258-1-git-send-email-john.stu...@linaro.org
[ Tidied up the code and the changelog a bit. ]
Signed-off-by: Ingo Molnar 
---
 tools/testing/selftests/timers/Makefile  |   3 +-
 tools/testing/selftests/timers/adjtick.c | 221 +++
 2 files changed, 223 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timers/Makefile 
b/tools/testing/selftests/timers/Makefile
index 89a3f44..4a1be1b 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread
 TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \
 inconsistency-check raw_skew threadtest rtctest
 
-TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \
+TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \
  skew_consistency clocksource-switch leap-a-day \
  leapcrash set-tai set-2038
 
@@ -24,6 +24,7 @@ include ../lib.mk
 run_destructive_tests: run_tests
./alarmtimer-suspend
./valid-adjtimex
+   ./adjtick
./change_skew
./skew_consistency
./clocksource-switch
diff --git a/tools/testing/selftests/timers/adjtick.c 
b/tools/testing/selftests/timers/adjtick.c
new file mode 100644
index 000..9887fd5
--- /dev/null
+++ b/tools/testing/selftests/timers/adjtick.c
@@ -0,0 +1,221 @@
+/* adjtimex() tick adjustment test
+ * by:   John Stultz 
+ * (C) Copyright Linaro Limited 2015
+ * Licensed under the GPLv2
+ *
+ *  To build:
+ * $ gcc adjtick.c -o adjtick -lrt
+ *
+ *   This program is free software: you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef KTEST
+#include "../kselftest.h"
+#else
+static inline int ksft_exit_pass(void)
+{
+   exit(0);
+}
+static inline int ksft_exit_fail(void)
+{
+   exit(1);
+}
+#endif
+
+#define CLOCK_MONOTONIC_RAW4
+
+#define NSEC_PER_SEC   10LL
+#define USEC_PER_SEC   100
+
+#define MILLION100
+
+long systick;
+
+long long llabs(long long val)
+{
+   if (val < 0)
+   val = -val;
+   return val;
+}
+
+unsigned long long ts_to_nsec(struct timespec ts)
+{
+   return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+}
+
+struct timespec nsec_to_ts(long long ns)
+{
+   struct timespec ts;
+
+   ts.tv_sec = ns/NSEC_PER_SEC;
+   ts.tv_nsec = ns%NSEC_PER_SEC;
+
+   return ts;
+}
+
+long long diff_timespec(struct timespec start, struct timespec end)
+{
+   long long start_ns, end_ns;
+
+   start_ns = ts_to_nsec(start);
+   end_ns = ts_to_nsec(end);
+
+   return end_ns - start_ns;
+}
+
+void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw)
+{
+   struct timespec start, mid, end;
+   long long diff = 0, tmp;
+   int i;
+
+   clock_gettime(CLOCK_MONOTONIC, mon);
+   clock_gettime(CLOCK_MONOTONIC_RAW, raw);
+
+   /* Try to get a more tightly bound pairing */
+   for (i = 0; i < 3; i++) {
+   long long newdiff;
+
+   

[tip:timers/urgent] clocksource: Fix abs() usage w/ 64bit values

2015-10-02 Thread tip-bot for John Stultz
Commit-ID:  67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c
Gitweb: http://git.kernel.org/tip/67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c
Author: John Stultz 
AuthorDate: Mon, 14 Sep 2015 18:05:20 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Oct 2015 22:53:01 +0200

clocksource: Fix abs() usage w/ 64bit values

This patch fixes one cases where abs() was being used with 64-bit
nanosecond values, where the result may be capped at 32-bits.

This potentially could cause watchdog false negatives on 32-bit
systems, so this patch addresses the issue by using abs64().

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/clocksource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 841b72f..3a38775 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data)
continue;
 
/* Check the deviation from the watchdog clocksource. */
-   if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
+   if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
pr_warn("timekeeping watchdog: Marking clocksource '%s' 
as unstable because the skew is too large:\n",
cs->name);
pr_warn("  '%s' wd_now: %llx 
wd_last: %llx mask: %llx\n",
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] clocksource: Fix abs() usage w/ 64bit values

2015-10-02 Thread tip-bot for John Stultz
Commit-ID:  67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c
Gitweb: http://git.kernel.org/tip/67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c
Author: John Stultz 
AuthorDate: Mon, 14 Sep 2015 18:05:20 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Oct 2015 22:53:01 +0200

clocksource: Fix abs() usage w/ 64bit values

This patch fixes one cases where abs() was being used with 64-bit
nanosecond values, where the result may be capped at 32-bits.

This potentially could cause watchdog false negatives on 32-bit
systems, so this patch addresses the issue by using abs64().

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/clocksource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 841b72f..3a38775 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data)
continue;
 
/* Check the deviation from the watchdog clocksource. */
-   if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
+   if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
pr_warn("timekeeping watchdog: Marking clocksource '%s' 
as unstable because the skew is too large:\n",
cs->name);
pr_warn("  '%s' wd_now: %llx 
wd_last: %llx mask: %llx\n",
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()

2015-09-13 Thread tip-bot for John Stultz
Commit-ID:  2619d7e9c92d524cb155ec89fd72875321512e5b
Gitweb: http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b
Author: John Stultz 
AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 13 Sep 2015 10:30:47 +0200

time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ 
nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use 
abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves 
Tested-by: Nuno Goncalves 
Signed-off-by: John Stultz 
Cc:  # v3.17+
Cc: Linus Torvalds 
Cc: Miroslav Lichvar 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f6ee2e6..3739ac6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct 
timekeeper *tk,
negative = (tick_error < 0);
 
/* Sort out the magnitude of the correction */
-   tick_error = abs(tick_error);
+   tick_error = abs64(tick_error);
for (adj = 0; tick_error > interval; adj++)
tick_error >>= 1;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()

2015-09-13 Thread tip-bot for John Stultz
Commit-ID:  2619d7e9c92d524cb155ec89fd72875321512e5b
Gitweb: http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b
Author: John Stultz 
AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 13 Sep 2015 10:30:47 +0200

time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ 
nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use 
abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves 
Tested-by: Nuno Goncalves 
Signed-off-by: John Stultz 
Cc:  # v3.17+
Cc: Linus Torvalds 
Cc: Miroslav Lichvar 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f6ee2e6..3739ac6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct 
timekeeper *tk,
negative = (tick_error < 0);
 
/* Sort out the magnitude of the correction */
-   tick_error = abs(tick_error);
+   tick_error = abs64(tick_error);
for (adj = 0; tick_error > interval; adj++)
tick_error >>= 1;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] selftest: Timers: Avoid signal deadlock in leap-a-day

2015-06-18 Thread tip-bot for John Stultz
Commit-ID:  51a16c1e887a5975ada27a3ae935a4f2783005da
Gitweb: http://git.kernel.org/tip/51a16c1e887a5975ada27a3ae935a4f2783005da
Author: John Stultz 
AuthorDate: Wed, 17 Jun 2015 11:16:43 -0700
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jun 2015 15:28:14 +0200

selftest: Timers: Avoid signal deadlock in leap-a-day

In 0c4a5fc95b1df (Add leap-second timer edge testing to
leap-a-day.c), we added a timer to the test which checks to make
sure timers near the leapsecond edge behave correctly.

However, the output generated from the timer uses ctime_r, which
isn't async-signal safe, and should that signal land while the
main test is using ctime_r to print its output, its possible for
the test to deadlock on glibc internal locks.

Thus this patch reworks the output to avoid using ctime_r in
the signal handler.

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Daniel Bristot de Oliveira 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Shuah Khan 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 tools/testing/selftests/timers/leap-a-day.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/timers/leap-a-day.c 
b/tools/testing/selftests/timers/leap-a-day.c
index 331c4f7..fb46ad6 100644
--- a/tools/testing/selftests/timers/leap-a-day.c
+++ b/tools/testing/selftests/timers/leap-a-day.c
@@ -141,27 +141,28 @@ void handler(int unused)
 void sigalarm(int signo)
 {
struct timex tx;
-   char buf[26];
int ret;
 
tx.modes = 0;
ret = adjtimex();
 
-   ctime_r(_sec, buf);
-   buf[strlen(buf)-1] = 0; /*remove trailing\n */
-   printf("%s + %6ld us (%i)\t%s - TIMER FIRED\n",
-   buf,
+   if (tx.time.tv_sec < next_leap) {
+   printf("Error: Early timer expiration! (Should be %ld)\n", 
next_leap);
+   error_found = 1;
+   printf("adjtimex: %10ld sec + %6ld us (%i)\t%s\n",
+   tx.time.tv_sec,
tx.time.tv_usec,
tx.tai,
time_state_str(ret));
-
-   if (tx.time.tv_sec < next_leap) {
-   printf("Error: Early timer expiration!\n");
-   error_found = 1;
}
if (ret != TIME_WAIT) {
-   printf("Error: Incorrect NTP state?\n");
+   printf("Error: Timer seeing incorrect NTP state? (Should be 
TIME_WAIT)\n");
error_found = 1;
+   printf("adjtimex: %10ld sec + %6ld us (%i)\t%s\n",
+   tx.time.tv_sec,
+   tx.time.tv_usec,
+   tx.tai,
+   time_state_str(ret));
}
 }
 
@@ -297,7 +298,7 @@ int main(int argc, char **argv)
printf("Scheduling leap second for %s", ctime(_leap));
 
/* Set up timer */
-   printf("Setting timer for %s", ctime(_leap));
+   printf("Setting timer for %ld -  %s", next_leap, 
ctime(_leap));
memset(, 0, sizeof(se));
se.sigev_notify = SIGEV_SIGNAL;
se.sigev_signo = signum;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Copy the shadow-timekeeper over the real timekeeper last

2015-06-18 Thread tip-bot for John Stultz
Commit-ID:  906c55579a6360dd9ef5a3101bb2e3ae396dfb97
Gitweb: http://git.kernel.org/tip/906c55579a6360dd9ef5a3101bb2e3ae396dfb97
Author: John Stultz 
AuthorDate: Wed, 17 Jun 2015 10:05:53 -0700
Committer:  Thomas Gleixner 
CommitDate: Thu, 18 Jun 2015 09:27:02 +0200

timekeeping: Copy the shadow-timekeeper over the real timekeeper last

The fix in d151832650ed9 (time: Move clock_was_set_seq update
before updating shadow-timekeeper) was unfortunately incomplete.

The main gist of that change was to do the shadow-copy update
last, so that any state changes were properly duplicated, and
we wouldn't accidentally have stale data in the shadow.

Unfortunately in the main update_wall_time() logic, we update
use the shadow-timekeeper to calculate the next update values,
then while holding the lock, copy the shadow-timekeeper over,
then call timekeeping_update() to do some additional
bookkeeping, (skipping the shadow mirror). The bug with this is
the additional bookkeeping isn't all read-only, and some
changes timkeeper state. Thus we might then overwrite this state
change on the next update.

To avoid this problem, do the timekeeping_update() on the
shadow-timekeeper prior to copying the full state over to
the real-timekeeper.

This avoids problems with both the clock_was_set_seq and
next_leap_ktime being overwritten and possibly the
fast-timekeepers as well.

Many thanks to Prarit for his rigorous testing, which discovered
this problem, along with Prarit and Daniel's work validating this
fix.

Reported-by: Prarit Bhargava 
Tested-by: Prarit Bhargava 
Tested-by: Daniel Bristot de Oliveira 
Signed-off-by: John Stultz 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5d67ffb..30b7a40 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1853,8 +1853,9 @@ void update_wall_time(void)
 * memcpy under the tk_core.seq against one before we start
 * updating.
 */
+   timekeeping_update(tk, clock_set);
memcpy(real_tk, tk, sizeof(*tk));
-   timekeeping_update(real_tk, clock_set);
+   /* The memcpy must come last. Do not put anything here! */
write_seqcount_end(_core.seq);
 out:
raw_spin_unlock_irqrestore(_lock, flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] selftest: Timers: Avoid signal deadlock in leap-a-day

2015-06-18 Thread tip-bot for John Stultz
Commit-ID:  51a16c1e887a5975ada27a3ae935a4f2783005da
Gitweb: http://git.kernel.org/tip/51a16c1e887a5975ada27a3ae935a4f2783005da
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 17 Jun 2015 11:16:43 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Thu, 18 Jun 2015 15:28:14 +0200

selftest: Timers: Avoid signal deadlock in leap-a-day

In 0c4a5fc95b1df (Add leap-second timer edge testing to
leap-a-day.c), we added a timer to the test which checks to make
sure timers near the leapsecond edge behave correctly.

However, the output generated from the timer uses ctime_r, which
isn't async-signal safe, and should that signal land while the
main test is using ctime_r to print its output, its possible for
the test to deadlock on glibc internal locks.

Thus this patch reworks the output to avoid using ctime_r in
the signal handler.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Daniel Bristot de Oliveira bris...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Shuah Khan shua...@osg.samsung.com
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 tools/testing/selftests/timers/leap-a-day.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/timers/leap-a-day.c 
b/tools/testing/selftests/timers/leap-a-day.c
index 331c4f7..fb46ad6 100644
--- a/tools/testing/selftests/timers/leap-a-day.c
+++ b/tools/testing/selftests/timers/leap-a-day.c
@@ -141,27 +141,28 @@ void handler(int unused)
 void sigalarm(int signo)
 {
struct timex tx;
-   char buf[26];
int ret;
 
tx.modes = 0;
ret = adjtimex(tx);
 
-   ctime_r(tx.time.tv_sec, buf);
-   buf[strlen(buf)-1] = 0; /*remove trailing\n */
-   printf(%s + %6ld us (%i)\t%s - TIMER FIRED\n,
-   buf,
+   if (tx.time.tv_sec  next_leap) {
+   printf(Error: Early timer expiration! (Should be %ld)\n, 
next_leap);
+   error_found = 1;
+   printf(adjtimex: %10ld sec + %6ld us (%i)\t%s\n,
+   tx.time.tv_sec,
tx.time.tv_usec,
tx.tai,
time_state_str(ret));
-
-   if (tx.time.tv_sec  next_leap) {
-   printf(Error: Early timer expiration!\n);
-   error_found = 1;
}
if (ret != TIME_WAIT) {
-   printf(Error: Incorrect NTP state?\n);
+   printf(Error: Timer seeing incorrect NTP state? (Should be 
TIME_WAIT)\n);
error_found = 1;
+   printf(adjtimex: %10ld sec + %6ld us (%i)\t%s\n,
+   tx.time.tv_sec,
+   tx.time.tv_usec,
+   tx.tai,
+   time_state_str(ret));
}
 }
 
@@ -297,7 +298,7 @@ int main(int argc, char **argv)
printf(Scheduling leap second for %s, ctime(next_leap));
 
/* Set up timer */
-   printf(Setting timer for %s, ctime(next_leap));
+   printf(Setting timer for %ld -  %s, next_leap, 
ctime(next_leap));
memset(se, 0, sizeof(se));
se.sigev_notify = SIGEV_SIGNAL;
se.sigev_signo = signum;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Copy the shadow-timekeeper over the real timekeeper last

2015-06-18 Thread tip-bot for John Stultz
Commit-ID:  906c55579a6360dd9ef5a3101bb2e3ae396dfb97
Gitweb: http://git.kernel.org/tip/906c55579a6360dd9ef5a3101bb2e3ae396dfb97
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 17 Jun 2015 10:05:53 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Thu, 18 Jun 2015 09:27:02 +0200

timekeeping: Copy the shadow-timekeeper over the real timekeeper last

The fix in d151832650ed9 (time: Move clock_was_set_seq update
before updating shadow-timekeeper) was unfortunately incomplete.

The main gist of that change was to do the shadow-copy update
last, so that any state changes were properly duplicated, and
we wouldn't accidentally have stale data in the shadow.

Unfortunately in the main update_wall_time() logic, we update
use the shadow-timekeeper to calculate the next update values,
then while holding the lock, copy the shadow-timekeeper over,
then call timekeeping_update() to do some additional
bookkeeping, (skipping the shadow mirror). The bug with this is
the additional bookkeeping isn't all read-only, and some
changes timkeeper state. Thus we might then overwrite this state
change on the next update.

To avoid this problem, do the timekeeping_update() on the
shadow-timekeeper prior to copying the full state over to
the real-timekeeper.

This avoids problems with both the clock_was_set_seq and
next_leap_ktime being overwritten and possibly the
fast-timekeepers as well.

Many thanks to Prarit for his rigorous testing, which discovered
this problem, along with Prarit and Daniel's work validating this
fix.

Reported-by: Prarit Bhargava pra...@redhat.com
Tested-by: Prarit Bhargava pra...@redhat.com
Tested-by: Daniel Bristot de Oliveira bris...@redhat.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5d67ffb..30b7a40 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1853,8 +1853,9 @@ void update_wall_time(void)
 * memcpy under the tk_core.seq against one before we start
 * updating.
 */
+   timekeeping_update(tk, clock_set);
memcpy(real_tk, tk, sizeof(*tk));
-   timekeeping_update(real_tk, clock_set);
+   /* The memcpy must come last. Do not put anything here! */
write_seqcount_end(tk_core.seq);
 out:
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] ntp: Introduce and use SECS_PER_DAY macro instead of 86400

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  90bf361ceae28dee50a584c3dd4c1a96178d982c
Gitweb: http://git.kernel.org/tip/90bf361ceae28dee50a584c3dd4c1a96178d982c
Author: John Stultz 
AuthorDate: Thu, 11 Jun 2015 15:54:54 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

ntp: Introduce and use SECS_PER_DAY macro instead of 86400

Currently the leapsecond logic uses what looks like magic values.

Improve this by defining SECS_PER_DAY and using that macro
to make the logic more clear.

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Daniel Bristot de Oliveira 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/ntp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 7a68100..7aa2161 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -35,6 +35,7 @@ unsigned long tick_nsec;
 static u64 tick_length;
 static u64 tick_length_base;
 
+#define SECS_PER_DAY   86400
 #define MAX_TICKADJ500LL   /* usecs */
 #define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -390,7 +391,7 @@ int second_overflow(unsigned long secs)
case TIME_INS:
if (!(time_status & STA_INS))
time_state = TIME_OK;
-   else if (secs % 86400 == 0) {
+   else if (secs % SECS_PER_DAY == 0) {
leap = -1;
time_state = TIME_OOP;
printk(KERN_NOTICE
@@ -400,7 +401,7 @@ int second_overflow(unsigned long secs)
case TIME_DEL:
if (!(time_status & STA_DEL))
time_state = TIME_OK;
-   else if ((secs + 1) % 86400 == 0) {
+   else if ((secs + 1) % SECS_PER_DAY == 0) {
leap = 1;
time_state = TIME_WAIT;
printk(KERN_NOTICE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] ntp: Do leapsecond adjustment in adjtimex read path

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  96efdcf2d080687e041b0353c604b708546689fd
Gitweb: http://git.kernel.org/tip/96efdcf2d080687e041b0353c604b708546689fd
Author: John Stultz 
AuthorDate: Thu, 11 Jun 2015 15:54:56 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

ntp: Do leapsecond adjustment in adjtimex read path

Since the leapsecond is applied at tick-time, this means there is a
small window of time at the start of a leap-second where we cross into
the next second before applying the leap.

This patch modified adjtimex so that the leap-second is applied on the
second edge. Providing more correct leapsecond behavior.

This does make it so that adjtimex()'s returned time values can be
inconsistent with time values read from gettimeofday() or
clock_gettime(CLOCK_REALTIME,...)  for a brief period of one tick at
the leapsecond.  However, those other interfaces do not provide the
TIME_OOP time_state return that adjtimex() provides, which allows the
leapsecond to be properly represented. They instead only see a time
discontinuity, and cannot tell the first 23:59:59 from the repeated
23:59:59 leap second.

This seems like a reasonable tradeoff given clock_gettime() /
gettimeofday() cannot properly represent a leapsecond, and users
likely care more about performance, while folks who are using
adjtimex() more likely care about leap-second correctness.

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Daniel Bristot de Oliveira 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/ntp.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 033743e..fb4d98c 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -740,6 +740,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 
*ts, s32 *time_tai)
if (!(time_status & STA_NANO))
txc->time.tv_usec /= NSEC_PER_USEC;
 
+   /* Handle leapsec adjustments */
+   if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) {
+   if ((time_state == TIME_INS) && (time_status & STA_INS)) {
+   result = TIME_OOP;
+   txc->tai++;
+   txc->time.tv_sec--;
+   }
+   if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
+   result = TIME_WAIT;
+   txc->tai--;
+   txc->time.tv_sec++;
+   }
+   if ((time_state == TIME_OOP) &&
+   (ts->tv_sec == ntp_next_leap_sec)) {
+   result = TIME_WAIT;
+   }
+   }
+
return result;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] time: Prevent early expiry of hrtimers[ CLOCK_REALTIME] at the leap second edge

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  833f32d763028c1bb371c64f457788b933773b3e
Gitweb: http://git.kernel.org/tip/833f32d763028c1bb371c64f457788b933773b3e
Author: John Stultz 
AuthorDate: Thu, 11 Jun 2015 15:54:55 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge

Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.

This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.

However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)

However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.

This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.

Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.

While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.

Originally-suggested-by: Richard Cochran 
Reported-by: Daniel Bristot de Oliveira 
Reported-by: Prarit Bhargava 
Signed-off-by: John Stultz 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Shuah Khan 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 include/linux/time64.h  |  1 +
 include/linux/timekeeper_internal.h |  2 ++
 kernel/time/ntp.c   | 42 ++---
 kernel/time/ntp_internal.h  |  1 +
 kernel/time/timekeeping.c   | 23 +++-
 5 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/linux/time64.h b/include/linux/time64.h
index 12d4e82..77b5df2 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -29,6 +29,7 @@ struct timespec64 {
 #define FSEC_PER_SEC   1000LL
 
 /* Located here for timespec[64]_valid_strict */
+#define TIME64_MAX ((s64)~((u64)1 << 63))
 #define KTIME_MAX  ((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX  (KTIME_MAX / NSEC_PER_SEC)
 
diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e1f5a11..2524722 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -50,6 +50,7 @@ struct tk_read_base {
  * @offs_tai:  Offset clock monotonic -> clock tai
  * @tai_offset:The current UTC to TAI offset in seconds
  * @clock_was_set_seq: The sequence number of clock was set events
+ * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
  * @raw_time:  Monotonic raw base time in timespec64 format
  * @cycle_interval:Number of clock cycles in one NTP interval
  * @xtime_interval:Number of clock shifted nano seconds in one NTP
@@ -90,6 +91,7 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   ktime_t next_leap_ktime;
struct timespec64   raw_time;
 
/* The following members are for timekeeping internal use */
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 7aa2161..033743e 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -77,6 +77,9 @@ static long   time_adjust;
 /* constant (boot-param configurable) NTP tick adjustment (upscaled)   */
 static s64 ntp_tick_adj;
 
+/* 

[tip:timers/core] selftests: timers: Add leap-second timer edge testing to leap-a-day.c

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0
Gitweb: http://git.kernel.org/tip/0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0
Author: John Stultz 
AuthorDate: Thu, 11 Jun 2015 15:54:57 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 12 Jun 2015 11:15:50 +0200

selftests: timers: Add leap-second timer edge testing to leap-a-day.c

Prarit reported an issue w/ timers around the leapsecond, where a
timer set for Midnight UTC (00:00:00) might fire a second early right
before the leapsecond (23:59:60 - though it appears as a repeated
23:59:59) is applied.

So I've updated the leap-a-day.c test to integrate a similar test,
where we set a timer and check if it triggers at the right time, and
if the ntp state transition is managed properly.

Reported-by: Daniel Bristot de Oliveira 
Reported-by: Prarit Bhargava 
Signed-off-by: John Stultz 
Cc: Richard Cochran 
Cc: Jan Kara 
Cc: Jiri Bohac 
Cc: Shuah Khan 
Cc: Ingo Molnar 
Link: 
http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 tools/testing/selftests/timers/leap-a-day.c | 76 +++--
 1 file changed, 72 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/timers/leap-a-day.c 
b/tools/testing/selftests/timers/leap-a-day.c
index b8272e6..331c4f7 100644
--- a/tools/testing/selftests/timers/leap-a-day.c
+++ b/tools/testing/selftests/timers/leap-a-day.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +64,9 @@ static inline int ksft_exit_fail(void)
 #define NSEC_PER_SEC 10ULL
 #define CLOCK_TAI 11
 
+time_t next_leap;
+int error_found;
+
 /* returns 1 if a <= b, 0 otherwise */
 static inline int in_order(struct timespec a, struct timespec b)
 {
@@ -134,6 +138,34 @@ void handler(int unused)
exit(0);
 }
 
+void sigalarm(int signo)
+{
+   struct timex tx;
+   char buf[26];
+   int ret;
+
+   tx.modes = 0;
+   ret = adjtimex();
+
+   ctime_r(_sec, buf);
+   buf[strlen(buf)-1] = 0; /*remove trailing\n */
+   printf("%s + %6ld us (%i)\t%s - TIMER FIRED\n",
+   buf,
+   tx.time.tv_usec,
+   tx.tai,
+   time_state_str(ret));
+
+   if (tx.time.tv_sec < next_leap) {
+   printf("Error: Early timer expiration!\n");
+   error_found = 1;
+   }
+   if (ret != TIME_WAIT) {
+   printf("Error: Incorrect NTP state?\n");
+   error_found = 1;
+   }
+}
+
+
 /* Test for known hrtimer failure */
 void test_hrtimer_failure(void)
 {
@@ -144,12 +176,19 @@ void test_hrtimer_failure(void)
clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, , NULL);
clock_gettime(CLOCK_REALTIME, );
 
-   if (!in_order(target, now))
+   if (!in_order(target, now)) {
printf("ERROR: hrtimer early expiration failure observed.\n");
+   error_found = 1;
+   }
 }
 
 int main(int argc, char **argv)
 {
+   timer_t tm1;
+   struct itimerspec its1;
+   struct sigevent se;
+   struct sigaction act;
+   int signum = SIGRTMAX;
int settime = 0;
int tai_time = 0;
int insert = 1;
@@ -191,6 +230,12 @@ int main(int argc, char **argv)
signal(SIGINT, handler);
signal(SIGKILL, handler);
 
+   /* Set up timer signal handler: */
+   sigfillset(_mask);
+   act.sa_flags = 0;
+   act.sa_handler = sigalarm;
+   sigaction(signum, , NULL);
+
if (iterations < 0)
printf("This runs continuously. Press ctrl-c to stop\n");
else
@@ -201,7 +246,7 @@ int main(int argc, char **argv)
int ret;
struct timespec ts;
struct timex tx;
-   time_t now, next_leap;
+   time_t now;
 
/* Get the current time */
clock_gettime(CLOCK_REALTIME, );
@@ -251,10 +296,27 @@ int main(int argc, char **argv)
 
printf("Scheduling leap second for %s", ctime(_leap));
 
+   /* Set up timer */
+   printf("Setting timer for %s", ctime(_leap));
+   memset(, 0, sizeof(se));
+   se.sigev_notify = SIGEV_SIGNAL;
+   se.sigev_signo = signum;
+   se.sigev_value.sival_int = 0;
+   if (timer_create(CLOCK_REALTIME, , ) == -1) {
+   printf("Error: timer_create failed\n");
+   return ksft_exit_fail();
+   }
+   its1.it_value.tv_sec = next_leap;
+   its1.it_value.tv_nsec = 0;
+   its1.it_interval.tv_sec = 0;
+   its1.it_interval.tv_nsec = 0;
+   timer_settime(tm1, TIMER_ABSTIME, , NULL);
+
/* Wake up 3 seconds before leap */
ts.tv_sec = next_leap - 3;

[tip:timers/core] time: Move clock_was_set_seq update before updating shadow-timekeeper

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  d151832650ed98961a5650e73e85c349ad7839cb
Gitweb: http://git.kernel.org/tip/d151832650ed98961a5650e73e85c349ad7839cb
Author: John Stultz 
AuthorDate: Thu, 11 Jun 2015 15:54:53 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 12 Jun 2015 10:56:20 +0200

time: Move clock_was_set_seq update before updating shadow-timekeeper

It was reported that 868a3e915f7f5eba (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.

The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.

This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.

(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).

Reported-and-tested-by: Jeremiah Mahler 
Signed-off-by: John Stultz 
Cc: Preeti U Murthy 
Cc: Peter Zijlstra 
Cc: Viresh Kumar 
Cc: Marcelo Tosatti 
Cc: Frederic Weisbecker 
Link: 
http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/timekeeping.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 90ed5db..849b932 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -585,15 +585,19 @@ static void timekeeping_update(struct timekeeper *tk, 
unsigned int action)
update_vsyscall(tk);
update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET);
 
-   if (action & TK_MIRROR)
-   memcpy(_timekeeper, _core.timekeeper,
-  sizeof(tk_core.timekeeper));
-
update_fast_timekeeper(>tkr_mono, _fast_mono);
update_fast_timekeeper(>tkr_raw,  _fast_raw);
 
if (action & TK_CLOCK_WAS_SET)
tk->clock_was_set_seq++;
+   /*
+* The mirroring of the data to the shadow-timekeeper needs
+* to happen last here to ensure we don't over-write the
+* timekeeper structure on the next update with stale data
+*/
+   if (action & TK_MIRROR)
+   memcpy(_timekeeper, _core.timekeeper,
+  sizeof(tk_core.timekeeper));
 }
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] time: Move clock_was_set_seq update before updating shadow-timekeeper

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  d151832650ed98961a5650e73e85c349ad7839cb
Gitweb: http://git.kernel.org/tip/d151832650ed98961a5650e73e85c349ad7839cb
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 11 Jun 2015 15:54:53 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 12 Jun 2015 10:56:20 +0200

time: Move clock_was_set_seq update before updating shadow-timekeeper

It was reported that 868a3e915f7f5eba (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.

The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.

This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.

(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).

Reported-and-tested-by: Jeremiah Mahler jmmah...@gmail.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Viresh Kumar viresh.ku...@linaro.org
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Link: 
http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/timekeeping.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 90ed5db..849b932 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -585,15 +585,19 @@ static void timekeeping_update(struct timekeeper *tk, 
unsigned int action)
update_vsyscall(tk);
update_pvclock_gtod(tk, action  TK_CLOCK_WAS_SET);
 
-   if (action  TK_MIRROR)
-   memcpy(shadow_timekeeper, tk_core.timekeeper,
-  sizeof(tk_core.timekeeper));
-
update_fast_timekeeper(tk-tkr_mono, tk_fast_mono);
update_fast_timekeeper(tk-tkr_raw,  tk_fast_raw);
 
if (action  TK_CLOCK_WAS_SET)
tk-clock_was_set_seq++;
+   /*
+* The mirroring of the data to the shadow-timekeeper needs
+* to happen last here to ensure we don't over-write the
+* timekeeper structure on the next update with stale data
+*/
+   if (action  TK_MIRROR)
+   memcpy(shadow_timekeeper, tk_core.timekeeper,
+  sizeof(tk_core.timekeeper));
 }
 
 /**
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] ntp: Introduce and use SECS_PER_DAY macro instead of 86400

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  90bf361ceae28dee50a584c3dd4c1a96178d982c
Gitweb: http://git.kernel.org/tip/90bf361ceae28dee50a584c3dd4c1a96178d982c
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 11 Jun 2015 15:54:54 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

ntp: Introduce and use SECS_PER_DAY macro instead of 86400

Currently the leapsecond logic uses what looks like magic values.

Improve this by defining SECS_PER_DAY and using that macro
to make the logic more clear.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Daniel Bristot de Oliveira bris...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/ntp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 7a68100..7aa2161 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -35,6 +35,7 @@ unsigned long tick_nsec;
 static u64 tick_length;
 static u64 tick_length_base;
 
+#define SECS_PER_DAY   86400
 #define MAX_TICKADJ500LL   /* usecs */
 #define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC)  NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -390,7 +391,7 @@ int second_overflow(unsigned long secs)
case TIME_INS:
if (!(time_status  STA_INS))
time_state = TIME_OK;
-   else if (secs % 86400 == 0) {
+   else if (secs % SECS_PER_DAY == 0) {
leap = -1;
time_state = TIME_OOP;
printk(KERN_NOTICE
@@ -400,7 +401,7 @@ int second_overflow(unsigned long secs)
case TIME_DEL:
if (!(time_status  STA_DEL))
time_state = TIME_OK;
-   else if ((secs + 1) % 86400 == 0) {
+   else if ((secs + 1) % SECS_PER_DAY == 0) {
leap = 1;
time_state = TIME_WAIT;
printk(KERN_NOTICE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] ntp: Do leapsecond adjustment in adjtimex read path

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  96efdcf2d080687e041b0353c604b708546689fd
Gitweb: http://git.kernel.org/tip/96efdcf2d080687e041b0353c604b708546689fd
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 11 Jun 2015 15:54:56 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

ntp: Do leapsecond adjustment in adjtimex read path

Since the leapsecond is applied at tick-time, this means there is a
small window of time at the start of a leap-second where we cross into
the next second before applying the leap.

This patch modified adjtimex so that the leap-second is applied on the
second edge. Providing more correct leapsecond behavior.

This does make it so that adjtimex()'s returned time values can be
inconsistent with time values read from gettimeofday() or
clock_gettime(CLOCK_REALTIME,...)  for a brief period of one tick at
the leapsecond.  However, those other interfaces do not provide the
TIME_OOP time_state return that adjtimex() provides, which allows the
leapsecond to be properly represented. They instead only see a time
discontinuity, and cannot tell the first 23:59:59 from the repeated
23:59:59 leap second.

This seems like a reasonable tradeoff given clock_gettime() /
gettimeofday() cannot properly represent a leapsecond, and users
likely care more about performance, while folks who are using
adjtimex() more likely care about leap-second correctness.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Daniel Bristot de Oliveira bris...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/ntp.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 033743e..fb4d98c 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -740,6 +740,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 
*ts, s32 *time_tai)
if (!(time_status  STA_NANO))
txc-time.tv_usec /= NSEC_PER_USEC;
 
+   /* Handle leapsec adjustments */
+   if (unlikely(ts-tv_sec = ntp_next_leap_sec)) {
+   if ((time_state == TIME_INS)  (time_status  STA_INS)) {
+   result = TIME_OOP;
+   txc-tai++;
+   txc-time.tv_sec--;
+   }
+   if ((time_state == TIME_DEL)  (time_status  STA_DEL)) {
+   result = TIME_WAIT;
+   txc-tai--;
+   txc-time.tv_sec++;
+   }
+   if ((time_state == TIME_OOP) 
+   (ts-tv_sec == ntp_next_leap_sec)) {
+   result = TIME_WAIT;
+   }
+   }
+
return result;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] time: Prevent early expiry of hrtimers[ CLOCK_REALTIME] at the leap second edge

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  833f32d763028c1bb371c64f457788b933773b3e
Gitweb: http://git.kernel.org/tip/833f32d763028c1bb371c64f457788b933773b3e
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 11 Jun 2015 15:54:55 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 12 Jun 2015 11:15:49 +0200

time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge

Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.

This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.

However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)

However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.

This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.

Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.

While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.

Originally-suggested-by: Richard Cochran richardcoch...@gmail.com
Reported-by: Daniel Bristot de Oliveira bris...@redhat.com
Reported-by: Prarit Bhargava pra...@redhat.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Shuah Khan shua...@osg.samsung.com
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 include/linux/time64.h  |  1 +
 include/linux/timekeeper_internal.h |  2 ++
 kernel/time/ntp.c   | 42 ++---
 kernel/time/ntp_internal.h  |  1 +
 kernel/time/timekeeping.c   | 23 +++-
 5 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/linux/time64.h b/include/linux/time64.h
index 12d4e82..77b5df2 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -29,6 +29,7 @@ struct timespec64 {
 #define FSEC_PER_SEC   1000LL
 
 /* Located here for timespec[64]_valid_strict */
+#define TIME64_MAX ((s64)~((u64)1  63))
 #define KTIME_MAX  ((s64)~((u64)1  63))
 #define KTIME_SEC_MAX  (KTIME_MAX / NSEC_PER_SEC)
 
diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e1f5a11..2524722 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -50,6 +50,7 @@ struct tk_read_base {
  * @offs_tai:  Offset clock monotonic - clock tai
  * @tai_offset:The current UTC to TAI offset in seconds
  * @clock_was_set_seq: The sequence number of clock was set events
+ * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
  * @raw_time:  Monotonic raw base time in timespec64 format
  * @cycle_interval:Number of clock cycles in one NTP interval
  * @xtime_interval:Number of clock shifted nano seconds in one NTP
@@ -90,6 +91,7 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   ktime_t next_leap_ktime;
struct timespec64   raw_time;
 
/* The following members are for timekeeping internal use */
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 7aa2161..033743e 100644
--- 

[tip:timers/core] selftests: timers: Add leap-second timer edge testing to leap-a-day.c

2015-06-12 Thread tip-bot for John Stultz
Commit-ID:  0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0
Gitweb: http://git.kernel.org/tip/0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 11 Jun 2015 15:54:57 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 12 Jun 2015 11:15:50 +0200

selftests: timers: Add leap-second timer edge testing to leap-a-day.c

Prarit reported an issue w/ timers around the leapsecond, where a
timer set for Midnight UTC (00:00:00) might fire a second early right
before the leapsecond (23:59:60 - though it appears as a repeated
23:59:59) is applied.

So I've updated the leap-a-day.c test to integrate a similar test,
where we set a timer and check if it triggers at the right time, and
if the ntp state transition is managed properly.

Reported-by: Daniel Bristot de Oliveira bris...@redhat.com
Reported-by: Prarit Bhargava pra...@redhat.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Jan Kara j...@suse.cz
Cc: Jiri Bohac jbo...@suse.cz
Cc: Shuah Khan shua...@osg.samsung.com
Cc: Ingo Molnar mi...@kernel.org
Link: 
http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 tools/testing/selftests/timers/leap-a-day.c | 76 +++--
 1 file changed, 72 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/timers/leap-a-day.c 
b/tools/testing/selftests/timers/leap-a-day.c
index b8272e6..331c4f7 100644
--- a/tools/testing/selftests/timers/leap-a-day.c
+++ b/tools/testing/selftests/timers/leap-a-day.c
@@ -44,6 +44,7 @@
 #include time.h
 #include sys/time.h
 #include sys/timex.h
+#include sys/errno.h
 #include string.h
 #include signal.h
 #include unistd.h
@@ -63,6 +64,9 @@ static inline int ksft_exit_fail(void)
 #define NSEC_PER_SEC 10ULL
 #define CLOCK_TAI 11
 
+time_t next_leap;
+int error_found;
+
 /* returns 1 if a = b, 0 otherwise */
 static inline int in_order(struct timespec a, struct timespec b)
 {
@@ -134,6 +138,34 @@ void handler(int unused)
exit(0);
 }
 
+void sigalarm(int signo)
+{
+   struct timex tx;
+   char buf[26];
+   int ret;
+
+   tx.modes = 0;
+   ret = adjtimex(tx);
+
+   ctime_r(tx.time.tv_sec, buf);
+   buf[strlen(buf)-1] = 0; /*remove trailing\n */
+   printf(%s + %6ld us (%i)\t%s - TIMER FIRED\n,
+   buf,
+   tx.time.tv_usec,
+   tx.tai,
+   time_state_str(ret));
+
+   if (tx.time.tv_sec  next_leap) {
+   printf(Error: Early timer expiration!\n);
+   error_found = 1;
+   }
+   if (ret != TIME_WAIT) {
+   printf(Error: Incorrect NTP state?\n);
+   error_found = 1;
+   }
+}
+
+
 /* Test for known hrtimer failure */
 void test_hrtimer_failure(void)
 {
@@ -144,12 +176,19 @@ void test_hrtimer_failure(void)
clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, target, NULL);
clock_gettime(CLOCK_REALTIME, now);
 
-   if (!in_order(target, now))
+   if (!in_order(target, now)) {
printf(ERROR: hrtimer early expiration failure observed.\n);
+   error_found = 1;
+   }
 }
 
 int main(int argc, char **argv)
 {
+   timer_t tm1;
+   struct itimerspec its1;
+   struct sigevent se;
+   struct sigaction act;
+   int signum = SIGRTMAX;
int settime = 0;
int tai_time = 0;
int insert = 1;
@@ -191,6 +230,12 @@ int main(int argc, char **argv)
signal(SIGINT, handler);
signal(SIGKILL, handler);
 
+   /* Set up timer signal handler: */
+   sigfillset(act.sa_mask);
+   act.sa_flags = 0;
+   act.sa_handler = sigalarm;
+   sigaction(signum, act, NULL);
+
if (iterations  0)
printf(This runs continuously. Press ctrl-c to stop\n);
else
@@ -201,7 +246,7 @@ int main(int argc, char **argv)
int ret;
struct timespec ts;
struct timex tx;
-   time_t now, next_leap;
+   time_t now;
 
/* Get the current time */
clock_gettime(CLOCK_REALTIME, ts);
@@ -251,10 +296,27 @@ int main(int argc, char **argv)
 
printf(Scheduling leap second for %s, ctime(next_leap));
 
+   /* Set up timer */
+   printf(Setting timer for %s, ctime(next_leap));
+   memset(se, 0, sizeof(se));
+   se.sigev_notify = SIGEV_SIGNAL;
+   se.sigev_signo = signum;
+   se.sigev_value.sival_int = 0;
+   if (timer_create(CLOCK_REALTIME, se, tm1) == -1) {
+   printf(Error: timer_create failed\n);
+   return ksft_exit_fail();
+   }
+   its1.it_value.tv_sec = next_leap;
+   

[tip:timers/urgent] ktime: Fix ktime_divns to do signed division

2015-05-13 Thread tip-bot for John Stultz
Commit-ID:  f7bcb70ebae0dcdb5a2d859b09e4465784d99029
Gitweb: http://git.kernel.org/tip/f7bcb70ebae0dcdb5a2d859b09e4465784d99029
Author: John Stultz 
AuthorDate: Fri, 8 May 2015 13:47:23 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 13 May 2015 10:19:35 +0200

ktime: Fix ktime_divns to do signed division

It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.

[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]

Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes 
Signed-off-by: John Stultz 
Acked-by: Nicolas Pitre 
Cc: Ingo Molnar 
Cc: Josh Boyer 
Cc: One Thousand Gnomes 
Cc: 
Link: 
http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 

---
 include/linux/ktime.h | 27 +--
 kernel/time/hrtimer.c | 14 --
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5fc3d10..2b6a204 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -166,19 +166,34 @@ static inline bool ktime_before(const ktime_t cmp1, const 
ktime_t cmp2)
 }
 
 #if BITS_PER_LONG < 64
-extern u64 __ktime_divns(const ktime_t kt, s64 div);
-static inline u64 ktime_divns(const ktime_t kt, s64 div)
+extern s64 __ktime_divns(const ktime_t kt, s64 div);
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
 {
+   /*
+* Negative divisors could cause an inf loop,
+* so bug out here.
+*/
+   BUG_ON(div < 0);
if (__builtin_constant_p(div) && !(div >> 32)) {
-   u64 ns = kt.tv64;
-   do_div(ns, div);
-   return ns;
+   s64 ns = kt.tv64;
+   u64 tmp = ns < 0 ? -ns : ns;
+
+   do_div(tmp, div);
+   return ns < 0 ? -tmp : tmp;
} else {
return __ktime_divns(kt, div);
}
 }
 #else /* BITS_PER_LONG < 64 */
-# define ktime_divns(kt, div)  (u64)((kt).tv64 / (div))
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
+{
+   /*
+* 32-bit implementation cannot handle negative divisors,
+* so catch them on 64bit as well.
+*/
+   WARN_ON(div < 0);
+   return kt.tv64 / div;
+}
 #endif
 
 static inline s64 ktime_to_us(const ktime_t kt)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 76d4bd9..93ef7190 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -266,21 +266,23 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
long *flags)
 /*
  * Divide a ktime value by a nanosecond value
  */
-u64 __ktime_divns(const ktime_t kt, s64 div)
+s64 __ktime_divns(const ktime_t kt, s64 div)
 {
-   u64 dclc;
int sft = 0;
+   s64 dclc;
+   u64 tmp;
 
dclc = ktime_to_ns(kt);
+   tmp = dclc < 0 ? -dclc : dclc;
+
/* Make sure the divisor is less than 2^32: */
while (div >> 32) {
sft++;
div >>= 1;
}
-   dclc >>= sft;
-   do_div(dclc, (unsigned long) div);
-
-   return dclc;
+   tmp >>= sft;
+   do_div(tmp, (unsigned long) div);
+   return dclc < 0 ? -tmp : tmp;
 }
 EXPORT_SYMBOL_GPL(__ktime_divns);
 #endif /* BITS_PER_LONG >= 64 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] ktime: Fix ktime_divns to do signed division

2015-05-13 Thread tip-bot for John Stultz
Commit-ID:  f7bcb70ebae0dcdb5a2d859b09e4465784d99029
Gitweb: http://git.kernel.org/tip/f7bcb70ebae0dcdb5a2d859b09e4465784d99029
Author: John Stultz john.stu...@linaro.org
AuthorDate: Fri, 8 May 2015 13:47:23 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Wed, 13 May 2015 10:19:35 +0200

ktime: Fix ktime_divns to do signed division

It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.

[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]

Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes tre...@tecnopolis.ca
Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Ingo Molnar mi...@kernel.org
Cc: Josh Boyer jwbo...@redhat.com
Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de

---
 include/linux/ktime.h | 27 +--
 kernel/time/hrtimer.c | 14 --
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5fc3d10..2b6a204 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -166,19 +166,34 @@ static inline bool ktime_before(const ktime_t cmp1, const 
ktime_t cmp2)
 }
 
 #if BITS_PER_LONG  64
-extern u64 __ktime_divns(const ktime_t kt, s64 div);
-static inline u64 ktime_divns(const ktime_t kt, s64 div)
+extern s64 __ktime_divns(const ktime_t kt, s64 div);
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
 {
+   /*
+* Negative divisors could cause an inf loop,
+* so bug out here.
+*/
+   BUG_ON(div  0);
if (__builtin_constant_p(div)  !(div  32)) {
-   u64 ns = kt.tv64;
-   do_div(ns, div);
-   return ns;
+   s64 ns = kt.tv64;
+   u64 tmp = ns  0 ? -ns : ns;
+
+   do_div(tmp, div);
+   return ns  0 ? -tmp : tmp;
} else {
return __ktime_divns(kt, div);
}
 }
 #else /* BITS_PER_LONG  64 */
-# define ktime_divns(kt, div)  (u64)((kt).tv64 / (div))
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
+{
+   /*
+* 32-bit implementation cannot handle negative divisors,
+* so catch them on 64bit as well.
+*/
+   WARN_ON(div  0);
+   return kt.tv64 / div;
+}
 #endif
 
 static inline s64 ktime_to_us(const ktime_t kt)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 76d4bd9..93ef7190 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -266,21 +266,23 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
long *flags)
 /*
  * Divide a ktime value by a nanosecond value
  */
-u64 __ktime_divns(const ktime_t kt, s64 div)
+s64 __ktime_divns(const ktime_t kt, s64 div)
 {
-   u64 dclc;
int sft = 0;
+   s64 dclc;
+   u64 tmp;
 
dclc = ktime_to_ns(kt);
+   tmp = dclc  0 ? -dclc : dclc;
+
/* Make sure the divisor is less than 2^32: */
while (div  32) {
sft++;
div = 1;
}
-   dclc = sft;
-   do_div(dclc, (unsigned long) div);
-
-   return dclc;
+   tmp = sft;
+   do_div(tmp, (unsigned long) div);
+   return dclc  0 ? -tmp : tmp;
 }
 EXPORT_SYMBOL_GPL(__ktime_divns);
 #endif /* BITS_PER_LONG = 64 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] ktime: Fix ktime_divns to do signed division

2015-05-12 Thread tip-bot for John Stultz
Commit-ID:  37e159cccb3121308bf9885530e7b3044d2edec8
Gitweb: http://git.kernel.org/tip/37e159cccb3121308bf9885530e7b3044d2edec8
Author: John Stultz 
AuthorDate: Fri, 8 May 2015 13:47:23 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 12 May 2015 09:04:09 +0200

ktime: Fix ktime_divns to do signed division

It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.

Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes 
Signed-off-by: John Stultz 
Acked-by: Nicolas Pitre 
Cc: Josh Boyer 
Cc: One Thousand Gnomes 
Cc: Ingo Molnar 
Cc:  # 3.17+
Link: 
http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 include/linux/ktime.h | 27 +++
 kernel/time/hrtimer.c | 11 ---
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5fc3d10..ab2de1c7 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t cmp1, const 
ktime_t cmp2)
 }
 
 #if BITS_PER_LONG < 64
-extern u64 __ktime_divns(const ktime_t kt, s64 div);
-static inline u64 ktime_divns(const ktime_t kt, s64 div)
+extern s64 __ktime_divns(const ktime_t kt, s64 div);
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
 {
+   /*
+* Negative divisors could cause an inf loop,
+* so bug out here.
+*/
+   BUG_ON(div < 0);
if (__builtin_constant_p(div) && !(div >> 32)) {
-   u64 ns = kt.tv64;
+   s64 ns = kt.tv64;
+   int neg = (ns < 0);
+
+   if (neg)
+   ns = -ns;
do_div(ns, div);
+   if (neg)
+   ns = -ns;
return ns;
} else {
return __ktime_divns(kt, div);
}
 }
 #else /* BITS_PER_LONG < 64 */
-# define ktime_divns(kt, div)  (u64)((kt).tv64 / (div))
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
+{
+   /*
+* 32-bit implementation cannot handle negative divisors,
+* so catch them on 64bit as well.
+*/
+   WARN_ON(div < 0);
+   return kt.tv64 / div;
+}
 #endif
 
 static inline s64 ktime_to_us(const ktime_t kt)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 76d4bd9..c98ce4d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
long *flags)
 /*
  * Divide a ktime value by a nanosecond value
  */
-u64 __ktime_divns(const ktime_t kt, s64 div)
+s64 __ktime_divns(const ktime_t kt, s64 div)
 {
-   u64 dclc;
-   int sft = 0;
+   s64 dclc;
+   int neg, sft = 0;
 
dclc = ktime_to_ns(kt);
+   neg = (dclc < 0);
+   if (neg)
+   dclc = -dclc;
/* Make sure the divisor is less than 2^32: */
while (div >> 32) {
sft++;
@@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
}
dclc >>= sft;
do_div(dclc, (unsigned long) div);
+   if (neg)
+   dclc = -dclc;
 
return dclc;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] ktime: Fix ktime_divns to do signed division

2015-05-12 Thread tip-bot for John Stultz
Commit-ID:  37e159cccb3121308bf9885530e7b3044d2edec8
Gitweb: http://git.kernel.org/tip/37e159cccb3121308bf9885530e7b3044d2edec8
Author: John Stultz john.stu...@linaro.org
AuthorDate: Fri, 8 May 2015 13:47:23 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Tue, 12 May 2015 09:04:09 +0200

ktime: Fix ktime_divns to do signed division

It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.

Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes tre...@tecnopolis.ca
Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Josh Boyer jwbo...@redhat.com
Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk
Cc: Ingo Molnar mi...@kernel.org
Cc: sta...@vger.kernel.org # 3.17+
Link: 
http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 include/linux/ktime.h | 27 +++
 kernel/time/hrtimer.c | 11 ---
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5fc3d10..ab2de1c7 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t cmp1, const 
ktime_t cmp2)
 }
 
 #if BITS_PER_LONG  64
-extern u64 __ktime_divns(const ktime_t kt, s64 div);
-static inline u64 ktime_divns(const ktime_t kt, s64 div)
+extern s64 __ktime_divns(const ktime_t kt, s64 div);
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
 {
+   /*
+* Negative divisors could cause an inf loop,
+* so bug out here.
+*/
+   BUG_ON(div  0);
if (__builtin_constant_p(div)  !(div  32)) {
-   u64 ns = kt.tv64;
+   s64 ns = kt.tv64;
+   int neg = (ns  0);
+
+   if (neg)
+   ns = -ns;
do_div(ns, div);
+   if (neg)
+   ns = -ns;
return ns;
} else {
return __ktime_divns(kt, div);
}
 }
 #else /* BITS_PER_LONG  64 */
-# define ktime_divns(kt, div)  (u64)((kt).tv64 / (div))
+static inline s64 ktime_divns(const ktime_t kt, s64 div)
+{
+   /*
+* 32-bit implementation cannot handle negative divisors,
+* so catch them on 64bit as well.
+*/
+   WARN_ON(div  0);
+   return kt.tv64 / div;
+}
 #endif
 
 static inline s64 ktime_to_us(const ktime_t kt)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 76d4bd9..c98ce4d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
long *flags)
 /*
  * Divide a ktime value by a nanosecond value
  */
-u64 __ktime_divns(const ktime_t kt, s64 div)
+s64 __ktime_divns(const ktime_t kt, s64 div)
 {
-   u64 dclc;
-   int sft = 0;
+   s64 dclc;
+   int neg, sft = 0;
 
dclc = ktime_to_ns(kt);
+   neg = (dclc  0);
+   if (neg)
+   dclc = -dclc;
/* Make sure the divisor is less than 2^32: */
while (div  32) {
sft++;
@@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
}
dclc = sft;
do_div(dclc, (unsigned long) div);
+   if (neg)
+   dclc = -dclc;
 
return dclc;
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin

2015-04-03 Thread tip-bot for John Stultz
Commit-ID:  8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09
Gitweb: http://git.kernel.org/tip/8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09
Author: John Stultz 
AuthorDate: Wed, 1 Apr 2015 20:34:39 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 3 Apr 2015 08:18:35 +0200

clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety 
margin

Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.

Signed-off-by: John Stultz 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/clocksource.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c3be3c7..15facb1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @max_cyc:   maximum cycle value before potential overflow (does not include
  * any safety margin)
  *
- * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
- * can be detected.
+ * NOTE: This function includes a safety margin of 50%, in other words, we
+ * return half the number of nanoseconds the hardware counter can technically
+ * cover. This is done so that we can potentially detect problems caused by
+ * delayed timers or bad hardware, which might result in time intervals that
+ * are larger then what the math used can handle without overflows.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cyc)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin

2015-04-03 Thread tip-bot for John Stultz
Commit-ID:  8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09
Gitweb: http://git.kernel.org/tip/8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 1 Apr 2015 20:34:39 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 3 Apr 2015 08:18:35 +0200

clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety 
margin

Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/clocksource.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c3be3c7..15facb1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @max_cyc:   maximum cycle value before potential overflow (does not include
  * any safety margin)
  *
- * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
- * can be detected.
+ * NOTE: This function includes a safety margin of 50%, in other words, we
+ * return half the number of nanoseconds the hardware counter can technically
+ * cover. This is done so that we can potentially detect problems caused by
+ * delayed timers or bad hardware, which might result in time intervals that
+ * are larger then what the math used can handle without overflows.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cyc)
 {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Add some debug info about clocksources being registered

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  8cc8c525ad4e7b581cacf84119e1a28dcb4044db
Gitweb: http://git.kernel.org/tip/8cc8c525ad4e7b581cacf84119e1a28dcb4044db
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:39 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:07 +0100

clocksource: Add some debug info about clocksources being registered

Print the mask, max_cycles, and max_idle_ns values for
clocksources being registered.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/clocksource.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 5cdf17e..1977eba 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -703,6 +703,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
cs->name);
 
clocksource_update_max_deferment(cs);
+
+   pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: 
%lld ns\n",
+   cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Rename __clocksource_updatefreq_*( ) to __clocksource_update_freq_*()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  fba9e07208c0f9d92d9f73761c99c8612039da44
Gitweb: http://git.kernel.org/tip/fba9e07208c0f9d92d9f73761c99c8612039da44
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:40 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:08 +0100

clocksource: Rename __clocksource_updatefreq_*() to 
__clocksource_update_freq_*()

Ingo requested this function be renamed to improve readability,
so I've renamed __clocksource_updatefreq_scale() as well as the
__clocksource_updatefreq_hz/khz() functions to avoid
squishedtogethernames.

This touches some of the sh clocksources, which I've not tested.

The arch/arm/plat-omap change is just a comment change for
consistency.

Signed-off-by: John Stultz 
Cc: Daniel Lezcano 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/arm/plat-omap/counter_32k.c |  2 +-
 drivers/clocksource/em_sti.c |  2 +-
 drivers/clocksource/sh_cmt.c |  2 +-
 drivers/clocksource/sh_tmu.c |  2 +-
 include/linux/clocksource.h  | 10 +-
 kernel/time/clocksource.c| 11 ++-
 6 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c
index 61b4d70..43cf745 100644
--- a/arch/arm/plat-omap/counter_32k.c
+++ b/arch/arm/plat-omap/counter_32k.c
@@ -103,7 +103,7 @@ int __init omap_init_clocksource_32k(void __iomem *vbase)
 
/*
 * 12 rough estimate from the calculations in
-* __clocksource_updatefreq_scale.
+* __clocksource_update_freq_scale.
 */
clocks_calc_mult_shift(_mult, _shift,
32768, NSEC_PER_SEC, 12);
diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index d0a7bd6..dc3c6ee 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -210,7 +210,7 @@ static int em_sti_clocksource_enable(struct clocksource *cs)
 
ret = em_sti_start(p, USER_CLOCKSOURCE);
if (!ret)
-   __clocksource_updatefreq_hz(cs, p->rate);
+   __clocksource_update_freq_hz(cs, p->rate);
return ret;
 }
 
diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 2bd13b5..b8ff3c6 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -641,7 +641,7 @@ static int sh_cmt_clocksource_enable(struct clocksource *cs)
 
ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE);
if (!ret) {
-   __clocksource_updatefreq_hz(cs, ch->rate);
+   __clocksource_update_freq_hz(cs, ch->rate);
ch->cs_enabled = true;
}
return ret;
diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index f150ca82..b6b8fa3 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -272,7 +272,7 @@ static int sh_tmu_clocksource_enable(struct clocksource *cs)
 
ret = sh_tmu_enable(ch);
if (!ret) {
-   __clocksource_updatefreq_hz(cs, ch->rate);
+   __clocksource_update_freq_hz(cs, ch->rate);
ch->cs_enabled = true;
}
 
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index bd98eaa..1355098 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -200,7 +200,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 
to, u32 minsec);
 extern int
 __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
 extern void
-__clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
+__clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
 /*
  * Don't call this unless you are a default clocksource
@@ -221,14 +221,14 @@ static inline int clocksource_register_khz(struct 
clocksource *cs, u32 khz)
return __clocksource_register_scale(cs, 1000, khz);
 }
 
-static inline void __clocksource_updatefreq_hz(struct clocksource *cs, u32 hz)
+static inline void __clocksource_update_freq_hz(struct clocksource *cs, u32 hz)
 {
-   __clocksource_updatefreq_scale(cs, 1, hz);
+   __clocksource_update_freq_scale(cs, 1, hz);
 }
 
-static inline void __clocksource_updatefreq_khz(struct clocksource *cs, u32 
khz)
+static inline void __clocksource_update_freq_khz(struct clocksource *cs, u32 
khz)
 {
-   __clocksource_updatefreq_scale(cs, 1000, khz);
+   __clocksource_update_freq_scale(cs, 1000, khz);
 }
 
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 1977eba..c3be3c7 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -643,7 +643,7 @@ static void clocksource_enqueue(struct clocksource *cs)
 }
 
 /**
- * __clocksource_updatefreq_scale - Used update clocksource with new freq
+ * __clocksource_update_freq_scale - 

[tip:timers/core] clocksource: Mostly kill clocksource_register()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  f8935983f110505daa38e8d36ee406807f83a069
Gitweb: http://git.kernel.org/tip/f8935983f110505daa38e8d36ee406807f83a069
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:37 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:06 +0100

clocksource: Mostly kill clocksource_register()

A long running project has been to clean up remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz() functions.

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register() name, and
consolidate much of the internal logic so we don't have
duplication.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: David S. Miller 
Cc: Linus Torvalds 
Cc: Martin Schwidefsky 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stu...@linaro.org
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar 
---
 arch/s390/kernel/time.c |  2 +-
 arch/sparc/kernel/time_32.c |  2 +-
 include/linux/clocksource.h | 10 +-
 kernel/time/clocksource.c   | 81 +++--
 kernel/time/jiffies.c   |  4 +--
 5 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 20660dd..6c273cd 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -283,7 +283,7 @@ void __init time_init(void)
if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
panic("Couldn't request external interrupt 0x1406");
 
-   if (clocksource_register(_tod) != 0)
+   if (__clocksource_register(_tod) != 0)
panic("Could not register TOD clock source");
 
/* Enable TOD clock interrupts on the boot cpu. */
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 2f80d23..a31c0c8 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -191,7 +191,7 @@ static __init int setup_timer_cs(void)
timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
timer_cs.shift);
 
-   return clocksource_register(_cs);
+   return __clocksource_register(_cs);
 }
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 16d048c..bd98eaa 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -179,7 +179,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 
mult, u32 shift)
 }
 
 
-extern int clocksource_register(struct clocksource*);
 extern int clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
 extern struct clocksource* clocksource_get_next(void);
@@ -203,6 +202,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 
scale, u32 freq);
 extern void
 __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
+/*
+ * Don't call this unless you are a default clocksource
+ * (AKA: jiffies) and absolutely have to.
+ */
+static inline int __clocksource_register(struct clocksource *cs)
+{
+   return __clocksource_register_scale(cs, 1, 0);
+}
+
 static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 {
return __clocksource_register_scale(cs, 1, hz);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c4cc04b..5cdf17e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -656,38 +656,52 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 
freq)
 {
u64 sec;
+
/*
-* Calc the maximum number of seconds which we can run before
-* wrapping around. For clocksources which have a mask > 32bit
-* we need to limit the max sleep time to have a good
-* conversion precision. 10 minutes is still a reasonable
-* amount. That results in a shift value of 24 for a
-* clocksource with mask >= 40bit and f >= 4GHz. That maps to
-* ~ 0.06ppm granularity for NTP.
+* Default clocksources are *special* and self-define their mult/shift.
+* But, you're not special, so you should specify a freq value.
 */
-   sec = cs->mask;
-   do_div(sec, freq);
-   do_div(sec, scale);
-   if (!sec)
-   sec = 1;
-   else if (sec > 600 && cs->mask > UINT_MAX)
-   sec = 600;
-
-   clocks_calc_mult_shift(>mult, >shift, freq,
-  NSEC_PER_SEC / scale, sec * scale);
-
+   if (freq) {
+   /*
+* Calc the maximum number of seconds which we can run before
+* wrapping around. For clocksources which have a mask > 32-bit
+* we need to limit the max sleep time to have a good
+ 

[tip:timers/core] clocksource, sparc32: Convert to using clocksource_register_hz()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  3142f76022fe46f6e0a0d3940b23fb6ccb794692
Gitweb: http://git.kernel.org/tip/3142f76022fe46f6e0a0d3940b23fb6ccb794692
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:38 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:07 +0100

clocksource, sparc32: Convert to using clocksource_register_hz()

While cleaning up some clocksource code, I noticed the
time_32 implementation uses the clocksource_hz2mult()
helper, but doesn't use the clocksource_register_hz()
method.

I don't believe the Sparc clocksource is a default
clocksource, so we shouldn't need to self-define
the mult/shift pair.

So convert the time_32.c implementation to use
clocksource_register_hz().

Untested.

Signed-off-by: John Stultz 
Acked-by: David S. Miller 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-11-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/sparc/kernel/time_32.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index a31c0c8..18147a5 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -181,17 +181,13 @@ static struct clocksource timer_cs = {
.rating = 100,
.read   = timer_cs_read,
.mask   = CLOCKSOURCE_MASK(64),
-   .shift  = 2,
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static __init int setup_timer_cs(void)
 {
timer_cs_enabled = 1;
-   timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
-   timer_cs.shift);
-
-   return __clocksource_register(_cs);
+   return clocksource_register_hz(_cs, sparc_config.clock_rate);
 }
 
 #ifdef CONFIG_SMP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Improve clocksource watchdog reporting

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8
Gitweb: http://git.kernel.org/tip/0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:36 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:06 +0100

clocksource: Improve clocksource watchdog reporting

The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.

Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stu...@linaro.org
[ Minor cleanups of kernel messages. ]
Signed-off-by: Ingo Molnar 
---
 kernel/time/clocksource.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index fc2a9de..c4cc04b 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -142,13 +142,6 @@ static void __clocksource_unstable(struct clocksource *cs)
schedule_work(_work);
 }
 
-static void clocksource_unstable(struct clocksource *cs, int64_t delta)
-{
-   printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
-  cs->name, delta);
-   __clocksource_unstable(cs);
-}
-
 /**
  * clocksource_mark_unstable - mark clocksource unstable via watchdog
  * @cs:clocksource to be marked unstable
@@ -174,7 +167,7 @@ void clocksource_mark_unstable(struct clocksource *cs)
 static void clocksource_watchdog(unsigned long data)
 {
struct clocksource *cs;
-   cycle_t csnow, wdnow, delta;
+   cycle_t csnow, wdnow, cslast, wdlast, delta;
int64_t wd_nsec, cs_nsec;
int next_cpu, reset_pending;
 
@@ -213,6 +206,8 @@ static void clocksource_watchdog(unsigned long data)
 
delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+   wdlast = cs->wd_last; /* save these in case we print them */
+   cslast = cs->cs_last;
cs->cs_last = csnow;
cs->wd_last = wdnow;
 
@@ -221,7 +216,12 @@ static void clocksource_watchdog(unsigned long data)
 
/* Check the deviation from the watchdog clocksource. */
if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
-   clocksource_unstable(cs, cs_nsec - wd_nsec);
+   pr_warn("timekeeping watchdog: Marking clocksource '%s' 
as unstable, because the skew is too large:\n", cs->name);
+   pr_warn("   '%s' wd_now: %llx wd_last: %llx mask: 
%llx\n",
+   watchdog->name, wdnow, wdlast, watchdog->mask);
+   pr_warn("   '%s' cs_now: %llx cs_last: %llx mask: 
%llx\n",
+   cs->name, csnow, cslast, cs->mask);
+   __clocksource_unstable(cs);
continue;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Try to catch clocksource delta underflows

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  057b87e3161d1194a095718f9918c01b2c389e74
Gitweb: http://git.kernel.org/tip/057b87e3161d1194a095718f9918c01b2c389e74
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:34 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:05 +0100

timekeeping: Try to catch clocksource delta underflows

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract 'now' from 'cycle_last'.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta to see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 657414c..187149b 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,13 @@ static inline cycle_t timekeeping_get_delta(struct 
tk_read_base *tkr)
/* calculate the delta since the last update_wall_time */
delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+   /*
+* Try to catch underflows by checking if we are seeing small
+* mask-relative negative values.
+*/
+   if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+   delta = 0;
+
/* Cap delta value to the max_cycles values to avoid mult overflows */
if (unlikely(delta > tkr->clock->max_cycles))
delta = tkr->clock->max_cycles;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Add warnings when overflows or underflows are observed

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  4ca22c2648f9c1cec0b242f58d7302136f5a4cbb
Gitweb: http://git.kernel.org/tip/4ca22c2648f9c1cec0b242f58d7302136f5a4cbb
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:35 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:05 +0100

timekeeping: Add warnings when overflows or underflows are observed

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
than just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequence.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calculation to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic rate-limiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 64 +--
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 187149b..892f6cb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -119,6 +119,20 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
+#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
+/*
+ * These simple flag variables are managed
+ * without locks, which is racy, but ok since
+ * we don't really care about being super
+ * precise about how many events were seen,
+ * just that a problem was observed.
+ */
+static int timekeeping_underflow_seen;
+static int timekeeping_overflow_seen;
+
+/* last_warning is only modified under the timekeeping lock */
+static long timekeeping_last_warning;
+
 static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 
@@ -136,28 +150,64 @@ static void timekeeping_check_update(struct timekeeper 
*tk, cycle_t offset)
printk_deferred("  timekeeping: Your kernel is 
still fine, but is feeling a bit nervous\n");
}
}
+
+   if (timekeeping_underflow_seen) {
+   if (jiffies - timekeeping_last_warning > WARNING_FREQ) {
+   printk_deferred("WARNING: Underflow in clocksource '%s' 
observed, time update ignored.\n", name);
+   printk_deferred(" Please report this, consider 
using a different clocksource, if possible.\n");
+   printk_deferred(" Your kernel is probably still 
fine.\n");
+   timekeeping_last_warning = jiffies;
+   }
+   timekeeping_underflow_seen = 0;
+   }
+
+   if (timekeeping_overflow_seen) {
+   if (jiffies - timekeeping_last_warning > WARNING_FREQ) {
+   printk_deferred("WARNING: Overflow in clocksource '%s' 
observed, time update capped.\n", name);
+   printk_deferred(" Please report this, consider 
using a different clocksource, if possible.\n");
+   printk_deferred(" Your kernel is probably still 
fine.\n");
+   timekeeping_last_warning = jiffies;
+   }
+   timekeeping_overflow_seen = 0;
+   }
 }
 
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 {
-   cycle_t cycle_now, delta;
+   cycle_t now, last, mask, max, delta;
+   unsigned int seq;
 
-   /* read clocksource */
-   cycle_now = tkr->read(tkr->clock);
+   /*
+* Since we're called holding a seqlock, the data may shift
+* under us while we're doing the calculation. This can cause
+* false positives, since we'd note a problem but throw the
+* results away. So nest another seqlock here to atomically
+* grab the points we are checking with.
+*/
+   do {
+   seq = read_seqcount_begin(_core.seq);
+   now = tkr->read(tkr->clock);
+   last = tkr->cycle_last;
+   mask = tkr->mask;
+   max = tkr->clock->max_cycles;
+   } while (read_seqcount_retry(_core.seq, seq));
 
-   /* calculate the delta since the last update_wall_time */
-   delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+   delta = clocksource_delta(now, last, mask);
 

[tip:timers/core] timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  a558cd021d83b65c47ee5b9bec1fcfe5298a769f
Gitweb: http://git.kernel.org/tip/a558cd021d83b65c47ee5b9bec1fcfe5298a769f
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:33 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:07:04 +0100

timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplication
overflow from occuring.

This patch introduces infrastructure to allow a cap that
limits the clocksource read delta value to the 'max_cycles' value,
which is where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING=y.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were mis-scheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 49 +--
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index acf0491..657414c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -126,9 +126,9 @@ static void timekeeping_check_update(struct timekeeper *tk, 
cycle_t offset)
const char *name = tk->tkr.clock->name;
 
if (offset > max_cycles) {
-   printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time 
overflow\n",
+   printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow 
danger\n",
offset, name, max_cycles);
-   printk_deferred(" timekeeping: Your kernel is sick, but 
tries to cope\n");
+   printk_deferred(" timekeeping: Your kernel is sick, but 
tries to cope by capping time updates\n");
} else {
if (offset > (max_cycles >> 1)) {
printk_deferred("INFO: timekeeping: Cycle offset (%lld) 
is larger than the the '%s' clock's 50%% safety margin (%lld)\n",
@@ -137,10 +137,39 @@ static void timekeeping_check_update(struct timekeeper 
*tk, cycle_t offset)
}
}
 }
+
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+   cycle_t cycle_now, delta;
+
+   /* read clocksource */
+   cycle_now = tkr->read(tkr->clock);
+
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+
+   /* Cap delta value to the max_cycles values to avoid mult overflows */
+   if (unlikely(delta > tkr->clock->max_cycles))
+   delta = tkr->clock->max_cycles;
+
+   return delta;
+}
 #else
 static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t 
offset)
 {
 }
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+   cycle_t cycle_now, delta;
+
+   /* read clocksource */
+   cycle_now = tkr->read(tkr->clock);
+
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+
+   return delta;
+}
 #endif
 
 /**
@@ -218,14 +247,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-   cycle_t cycle_now, delta;
+   cycle_t delta;
s64 nsec;
 
-   /* read clocksource: */
-   cycle_now = tkr->read(tkr->clock);
-
-   /* calculate the delta since the last update_wall_time: */
-   delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+   delta = timekeeping_get_delta(tkr);
 
nsec = delta * tkr->mult + tkr->xtime_nsec;
nsec >>= tkr->shift;
@@ -237,14 +262,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base 
*tkr)
 static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 {
struct clocksource *clock = tk->tkr.clock;
-   cycle_t cycle_now, delta;
+   cycle_t delta;
s64 nsec;
 
-   /* read clocksource: */
-   cycle_now = tk->tkr.read(clock);
-
-   /* calculate the delta since the last update_wall_time: */
-   delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
+   delta = timekeeping_get_delta(>tkr);
 
/* convert delta to 

[tip:timers/core] clocksource: Simplify the logic around clocksource wrapping safety margins

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  362fde0410377e468ca00ad363fdf3e3ec42eb6a
Gitweb: http://git.kernel.org/tip/362fde0410377e468ca00ad363fdf3e3ec42eb6a
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:30 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Simplify the logic around clocksource wrapping safety margins

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% safety margin,
which these 12.5% margins where then added to.

So to simplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Additionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.

Signed-off-by: John Stultz 
Acked-by: Stephen Boyd  (for the sched_clock.c bit)
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/clocksource.c | 26 --
 kernel/time/sched_clock.c |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2148f41..ace9576 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @shift: cycle to nanosecond divisor (power of two)
  * @maxadj:maximum adjustment value to mult (~11%)
  * @mask:  bitmask for two's complement subtraction of non 64 bit counters
+ *
+ * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
+ * can be detected.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 {
@@ -490,11 +493,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
max_cycles = min(max_cycles, mask);
max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+   /* Return 50% of the actual maximum, so we can detect bad values */
+   max_nsecs >>= 1;
+
return max_nsecs;
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * clocksource_max_deferment - Returns max time the clocksource should be 
deferred
  * @cs: Pointer to clocksource
  *
  */
@@ -504,13 +510,7 @@ static u64 clocksource_max_deferment(struct clocksource 
*cs)
 
max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
  cs->mask);
-   /*
-* To ensure that the clocksource does not wrap whilst we are idle,
-* limit the time the clocksource can be deferred by 12.5%. Please
-* note a margin of 12.5% is used because this can be computed with
-* a shift, versus say 10% which would require division.
-*/
-   return max_nsecs - (max_nsecs >> 3);
+   return max_nsecs;
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -659,10 +659,9 @@ void __clocksource_updatefreq_scale(struct clocksource 
*cs, u32 scale, u32 freq)
 * conversion precision. 10 minutes is still a reasonable
 * amount. That results in a shift value of 24 for a
 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-* ~ 0.06ppm granularity for NTP. We apply the same 12.5%
-* margin as we do in clocksource_max_deferment()
+* ~ 0.06ppm granularity for NTP.
 */
-   sec = (cs->mask - (cs->mask >> 3));
+   sec = cs->mask;
do_div(sec, freq);
do_div(sec, scale);
if (!sec)
@@ -674,9 +673,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
   NSEC_PER_SEC / scale, sec * scale);
 
/*
-* for clocksources that have large mults, to avoid overflow.
-* Since mult may be adjusted by ntp, add an safety extra margin
-*
+* Ensure clocksources that have large 'mult' values don't overflow
+* when adjusted.
 */
cs->maxadj = clocksource_max_adjustment(cs);
while ((cs->mult + cs->maxadj < cs->mult)
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15..3b8ae45 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -125,9 +125,9 @@ void __init sched_clock_register(u64 (*read)(void), int 
bits,
 
new_mask = CLOCKSOURCE_MASK(bits);
 
-   /* calculate how many ns until we wrap */
+   /* calculate how many nanosecs until we risk wrapping 

[tip:timers/core] timekeeping: Add debugging checks to warn if we see delays

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5
Gitweb: http://git.kernel.org/tip/3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:32 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 13 Mar 2015 08:06:58 +0100

timekeeping: Add debugging checks to warn if we see delays

Recently there's been requests for better sanity
checking in the time code, so that it's more clear
when something is going wrong, since timekeeping issues
could manifest in a large number of strange ways in
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check to update_wall_time() to print two new
warnings:

 1) if we see the call delayed beyond the 'max_cycles'
overflow point,

 2) or if we see the call delayed beyond the clocksource's
'max_idle_ns' value, which is currently 50% of the
overflow point.

This extra infrastructure is conditional on
a new CONFIG_DEBUG_TIMEKEEPING option, also
added in this patch - default off.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stu...@linaro.org
[ Improved the changelog and the messages a bit. ]
Signed-off-by: Ingo Molnar 
---
 kernel/time/jiffies.c |  1 +
 kernel/time/timekeeping.c | 28 
 lib/Kconfig.debug | 13 +
 3 files changed, 42 insertions(+)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a6a5bf5..7e41390 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = {
.mask   = 0x, /*32bits*/
.mult   = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
.shift  = JIFFIES_SHIFT,
+   .max_cycles = 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 91db941..acf0491 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,31 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+#ifdef CONFIG_DEBUG_TIMEKEEPING
+static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+{
+
+   cycle_t max_cycles = tk->tkr.clock->max_cycles;
+   const char *name = tk->tkr.clock->name;
+
+   if (offset > max_cycles) {
+   printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time 
overflow\n",
+   offset, name, max_cycles);
+   printk_deferred(" timekeeping: Your kernel is sick, but 
tries to cope\n");
+   } else {
+   if (offset > (max_cycles >> 1)) {
+   printk_deferred("INFO: timekeeping: Cycle offset (%lld) 
is larger than the the '%s' clock's 50%% safety margin (%lld)\n",
+   offset, name, max_cycles >> 1);
+   printk_deferred("  timekeeping: Your kernel is 
still fine, but is feeling a bit nervous\n");
+   }
+   }
+}
+#else
+static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t 
offset)
+{
+}
+#endif
+
 /**
  * tk_setup_internals - Set up internals to use clocksource clock.
  *
@@ -1630,6 +1655,9 @@ void update_wall_time(void)
if (offset < real_tk->cycle_interval)
goto out;
 
+   /* Do some additional sanity checking */
+   timekeeping_check_update(real_tk, offset);
+
/*
 * With NO_HZ we may have to accumulate many cycle_intervals
 * (think "ticks") worth of time at once. To do this efficiently,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c5cefb3..36b6fa8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -865,6 +865,19 @@ config SCHED_STACK_END_CHECK
  data corruption or a sporadic crash at a later stage once the region
  is examined. The runtime overhead introduced is minimal.
 
+config DEBUG_TIMEKEEPING
+   bool "Enable extra timekeeping sanity checking"
+   help
+ This option will enable additional timekeeping sanity checks
+ which may be helpful when diagnosing issues where timekeeping
+ problems are suspected.
+
+ This may include checks in the timekeeping hotpaths, so this
+ option may have a (very small) performance impact to some
+ workloads.
+
+ If unsure, say N.
+
 config TIMER_STATS
bool "Collect kernel timers statistics"
depends on DEBUG_KERNEL && PROC_FS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message 

[tip:timers/core] clocksource: Add 'max_cycles' to ' struct clocksource'

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  fb82fe2fe8588745edd73aa3a6229facac5c1e15
Gitweb: http://git.kernel.org/tip/fb82fe2fe8588745edd73aa3a6229facac5c1e15
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:31 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Add 'max_cycles' to 'struct clocksource'

In order to facilitate clocksource validation, add a
'max_cycles' field to the clocksource structure which
will hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 include/linux/clocksource.h |  5 +++--
 kernel/time/clocksource.c   | 28 
 kernel/time/sched_clock.c   |  2 +-
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 9c78d15..16d048c 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -56,6 +56,7 @@ struct module;
  * @shift: cycle to nanosecond divisor (power of two)
  * @max_idle_ns:   max idle time permitted by the clocksource (nsecs)
  * @maxadj:maximum adjustment value to mult (~11%)
+ * @max_cycles:maximum safe cycle value which won't overflow 
on multiplication
  * @flags: flags describing special properties
  * @archdata:  arch-specific data
  * @suspend:   suspend function for the clocksource, if necessary
@@ -76,7 +77,7 @@ struct clocksource {
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
struct arch_clocksource_data archdata;
 #endif
-
+   u64 max_cycles;
const char *name;
struct list_head list;
int rating;
@@ -189,7 +190,7 @@ extern struct clocksource * __init 
clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64
-clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask);
+clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cycles);
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ace9576..fc2a9de 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,11 +469,13 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @shift: cycle to nanosecond divisor (power of two)
  * @maxadj:maximum adjustment value to mult (~11%)
  * @mask:  bitmask for two's complement subtraction of non 64 bit counters
+ * @max_cyc:   maximum cycle value before potential overflow (does not include
+ * any safety margin)
  *
  * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
  * can be detected.
  */
-u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
+u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cyc)
 {
u64 max_nsecs, max_cycles;
 
@@ -493,6 +495,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, 
u64 mask)
max_cycles = min(max_cycles, mask);
max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+   /* return the max_cycles value as well if requested */
+   if (max_cyc)
+   *max_cyc = max_cycles;
+
/* Return 50% of the actual maximum, so we can detect bad values */
max_nsecs >>= 1;
 
@@ -500,17 +506,15 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource should be 
deferred
- * @cs: Pointer to clocksource
+ * clocksource_update_max_deferment - Updates the clocksource max_idle_ns & 
max_cycles
+ * @cs: Pointer to clocksource to be updated
  *
  */
-static u64 clocksource_max_deferment(struct clocksource *cs)
+static inline void clocksource_update_max_deferment(struct clocksource *cs)
 {
-   u64 max_nsecs;
-
-   max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
- cs->mask);
-   return max_nsecs;
+   cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+   cs->maxadj, cs->mask,
+   >max_cycles);
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -684,7 +688,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
cs->maxadj = clocksource_max_adjustment(cs);
}
 
-   cs->max_idle_ns = clocksource_max_deferment(cs);
+   clocksource_update_max_deferment(cs);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -730,8 +734,8 @@ int clocksource_register(struct clocksource 

[tip:timers/core] clocksource: Simplify the clocks_calc_max_nsecs () logic

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  6086e346fdea1ae64d974c94c1acacc2605567ae
Gitweb: http://git.kernel.org/tip/6086e346fdea1ae64d974c94c1acacc2605567ae
Author: John Stultz 
AuthorDate: Wed, 11 Mar 2015 21:16:29 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Simplify the clocks_calc_max_nsecs() logic

The previous clocks_calc_max_nsecs() code had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows with signed 64-bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other safety margins
were then added to, reducing the intended interval length.

Signed-off-by: John Stultz 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/clocksource.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4892352..2148f41 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -476,19 +476,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
 
/*
 * Calculate the maximum number of cycles that we can pass to the
-* cyc2ns function without overflowing a 64-bit signed result. The
-* maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj)
-* which is equivalent to the below.
-* max_cycles < (2^63)/(mult + maxadj)
-* max_cycles < 2^(log2((2^63)/(mult + maxadj)))
-* max_cycles < 2^(log2(2^63) - log2(mult + maxadj))
-* max_cycles < 2^(63 - log2(mult + maxadj))
-* max_cycles < 1 << (63 - log2(mult + maxadj))
-* Please note that we add 1 to the result of the log2 to account for
-* any rounding errors, ensure the above inequality is satisfied and
-* no overflow will occur.
+* cyc2ns() function without overflowing a 64-bit result.
 */
-   max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1));
+   max_cycles = ULLONG_MAX;
+   do_div(max_cycles, mult+maxadj);
 
/*
 * The actual maximum number of cycles we can defer the clocksource is
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Add warnings when overflows or underflows are observed

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  4ca22c2648f9c1cec0b242f58d7302136f5a4cbb
Gitweb: http://git.kernel.org/tip/4ca22c2648f9c1cec0b242f58d7302136f5a4cbb
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:35 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:05 +0100

timekeeping: Add warnings when overflows or underflows are observed

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
than just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequence.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calculation to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic rate-limiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c | 64 +--
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 187149b..892f6cb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -119,6 +119,20 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
+#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
+/*
+ * These simple flag variables are managed
+ * without locks, which is racy, but ok since
+ * we don't really care about being super
+ * precise about how many events were seen,
+ * just that a problem was observed.
+ */
+static int timekeeping_underflow_seen;
+static int timekeeping_overflow_seen;
+
+/* last_warning is only modified under the timekeeping lock */
+static long timekeeping_last_warning;
+
 static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 
@@ -136,28 +150,64 @@ static void timekeeping_check_update(struct timekeeper 
*tk, cycle_t offset)
printk_deferred(  timekeeping: Your kernel is 
still fine, but is feeling a bit nervous\n);
}
}
+
+   if (timekeeping_underflow_seen) {
+   if (jiffies - timekeeping_last_warning  WARNING_FREQ) {
+   printk_deferred(WARNING: Underflow in clocksource '%s' 
observed, time update ignored.\n, name);
+   printk_deferred( Please report this, consider 
using a different clocksource, if possible.\n);
+   printk_deferred( Your kernel is probably still 
fine.\n);
+   timekeeping_last_warning = jiffies;
+   }
+   timekeeping_underflow_seen = 0;
+   }
+
+   if (timekeeping_overflow_seen) {
+   if (jiffies - timekeeping_last_warning  WARNING_FREQ) {
+   printk_deferred(WARNING: Overflow in clocksource '%s' 
observed, time update capped.\n, name);
+   printk_deferred( Please report this, consider 
using a different clocksource, if possible.\n);
+   printk_deferred( Your kernel is probably still 
fine.\n);
+   timekeeping_last_warning = jiffies;
+   }
+   timekeeping_overflow_seen = 0;
+   }
 }
 
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 {
-   cycle_t cycle_now, delta;
+   cycle_t now, last, mask, max, delta;
+   unsigned int seq;
 
-   /* read clocksource */
-   cycle_now = tkr-read(tkr-clock);
+   /*
+* Since we're called holding a seqlock, the data may shift
+* under us while we're doing the calculation. This can cause
+* false positives, since we'd note a problem but throw the
+* results away. So nest another seqlock here to atomically
+* grab the points we are checking with.
+*/
+   do {
+   seq = read_seqcount_begin(tk_core.seq);
+   now = tkr-read(tkr-clock);
+   last = tkr-cycle_last;
+   mask = tkr-mask;
+   max = tkr-clock-max_cycles;
+   } while (read_seqcount_retry(tk_core.seq, 

[tip:timers/core] timekeeping: Try to catch clocksource delta underflows

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  057b87e3161d1194a095718f9918c01b2c389e74
Gitweb: http://git.kernel.org/tip/057b87e3161d1194a095718f9918c01b2c389e74
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:34 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:05 +0100

timekeeping: Try to catch clocksource delta underflows

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small negative
deltas when we subtract 'now' from 'cycle_last'.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta to see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 657414c..187149b 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,13 @@ static inline cycle_t timekeeping_get_delta(struct 
tk_read_base *tkr)
/* calculate the delta since the last update_wall_time */
delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask);
 
+   /*
+* Try to catch underflows by checking if we are seeing small
+* mask-relative negative values.
+*/
+   if (unlikely((~delta  tkr-mask)  (tkr-mask  3)))
+   delta = 0;
+
/* Cap delta value to the max_cycles values to avoid mult overflows */
if (unlikely(delta  tkr-clock-max_cycles))
delta = tkr-clock-max_cycles;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource, sparc32: Convert to using clocksource_register_hz()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  3142f76022fe46f6e0a0d3940b23fb6ccb794692
Gitweb: http://git.kernel.org/tip/3142f76022fe46f6e0a0d3940b23fb6ccb794692
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:38 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:07 +0100

clocksource, sparc32: Convert to using clocksource_register_hz()

While cleaning up some clocksource code, I noticed the
time_32 implementation uses the clocksource_hz2mult()
helper, but doesn't use the clocksource_register_hz()
method.

I don't believe the Sparc clocksource is a default
clocksource, so we shouldn't need to self-define
the mult/shift pair.

So convert the time_32.c implementation to use
clocksource_register_hz().

Untested.

Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: David S. Miller da...@davemloft.net
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-11-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/sparc/kernel/time_32.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index a31c0c8..18147a5 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -181,17 +181,13 @@ static struct clocksource timer_cs = {
.rating = 100,
.read   = timer_cs_read,
.mask   = CLOCKSOURCE_MASK(64),
-   .shift  = 2,
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static __init int setup_timer_cs(void)
 {
timer_cs_enabled = 1;
-   timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
-   timer_cs.shift);
-
-   return __clocksource_register(timer_cs);
+   return clocksource_register_hz(timer_cs, sparc_config.clock_rate);
 }
 
 #ifdef CONFIG_SMP
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Improve clocksource watchdog reporting

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8
Gitweb: http://git.kernel.org/tip/0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:36 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:06 +0100

clocksource: Improve clocksource watchdog reporting

The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.

Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stu...@linaro.org
[ Minor cleanups of kernel messages. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/clocksource.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index fc2a9de..c4cc04b 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -142,13 +142,6 @@ static void __clocksource_unstable(struct clocksource *cs)
schedule_work(watchdog_work);
 }
 
-static void clocksource_unstable(struct clocksource *cs, int64_t delta)
-{
-   printk(KERN_WARNING Clocksource %s unstable (delta = %Ld ns)\n,
-  cs-name, delta);
-   __clocksource_unstable(cs);
-}
-
 /**
  * clocksource_mark_unstable - mark clocksource unstable via watchdog
  * @cs:clocksource to be marked unstable
@@ -174,7 +167,7 @@ void clocksource_mark_unstable(struct clocksource *cs)
 static void clocksource_watchdog(unsigned long data)
 {
struct clocksource *cs;
-   cycle_t csnow, wdnow, delta;
+   cycle_t csnow, wdnow, cslast, wdlast, delta;
int64_t wd_nsec, cs_nsec;
int next_cpu, reset_pending;
 
@@ -213,6 +206,8 @@ static void clocksource_watchdog(unsigned long data)
 
delta = clocksource_delta(csnow, cs-cs_last, cs-mask);
cs_nsec = clocksource_cyc2ns(delta, cs-mult, cs-shift);
+   wdlast = cs-wd_last; /* save these in case we print them */
+   cslast = cs-cs_last;
cs-cs_last = csnow;
cs-wd_last = wdnow;
 
@@ -221,7 +216,12 @@ static void clocksource_watchdog(unsigned long data)
 
/* Check the deviation from the watchdog clocksource. */
if ((abs(cs_nsec - wd_nsec)  WATCHDOG_THRESHOLD)) {
-   clocksource_unstable(cs, cs_nsec - wd_nsec);
+   pr_warn(timekeeping watchdog: Marking clocksource '%s' 
as unstable, because the skew is too large:\n, cs-name);
+   pr_warn(   '%s' wd_now: %llx wd_last: %llx mask: 
%llx\n,
+   watchdog-name, wdnow, wdlast, watchdog-mask);
+   pr_warn(   '%s' cs_now: %llx cs_last: %llx mask: 
%llx\n,
+   cs-name, csnow, cslast, cs-mask);
+   __clocksource_unstable(cs);
continue;
}
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  a558cd021d83b65c47ee5b9bec1fcfe5298a769f
Gitweb: http://git.kernel.org/tip/a558cd021d83b65c47ee5b9bec1fcfe5298a769f
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:33 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:04 +0100

timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplication
overflow from occuring.

This patch introduces infrastructure to allow a cap that
limits the clocksource read delta value to the 'max_cycles' value,
which is where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING=y.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were mis-scheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c | 49 +--
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index acf0491..657414c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -126,9 +126,9 @@ static void timekeeping_check_update(struct timekeeper *tk, 
cycle_t offset)
const char *name = tk-tkr.clock-name;
 
if (offset  max_cycles) {
-   printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time 
overflow\n,
+   printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow 
danger\n,
offset, name, max_cycles);
-   printk_deferred( timekeeping: Your kernel is sick, but 
tries to cope\n);
+   printk_deferred( timekeeping: Your kernel is sick, but 
tries to cope by capping time updates\n);
} else {
if (offset  (max_cycles  1)) {
printk_deferred(INFO: timekeeping: Cycle offset (%lld) 
is larger than the the '%s' clock's 50%% safety margin (%lld)\n,
@@ -137,10 +137,39 @@ static void timekeeping_check_update(struct timekeeper 
*tk, cycle_t offset)
}
}
 }
+
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+   cycle_t cycle_now, delta;
+
+   /* read clocksource */
+   cycle_now = tkr-read(tkr-clock);
+
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask);
+
+   /* Cap delta value to the max_cycles values to avoid mult overflows */
+   if (unlikely(delta  tkr-clock-max_cycles))
+   delta = tkr-clock-max_cycles;
+
+   return delta;
+}
 #else
 static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t 
offset)
 {
 }
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+   cycle_t cycle_now, delta;
+
+   /* read clocksource */
+   cycle_now = tkr-read(tkr-clock);
+
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask);
+
+   return delta;
+}
 #endif
 
 /**
@@ -218,14 +247,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-   cycle_t cycle_now, delta;
+   cycle_t delta;
s64 nsec;
 
-   /* read clocksource: */
-   cycle_now = tkr-read(tkr-clock);
-
-   /* calculate the delta since the last update_wall_time: */
-   delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask);
+   delta = timekeeping_get_delta(tkr);
 
nsec = delta * tkr-mult + tkr-xtime_nsec;
nsec = tkr-shift;
@@ -237,14 +262,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base 
*tkr)
 static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 {
struct clocksource *clock = tk-tkr.clock;
-   cycle_t cycle_now, delta;
+   cycle_t delta;
s64 nsec;
 
-   /* read clocksource: */
-   cycle_now = tk-tkr.read(clock);
-
-   /* calculate the delta since 

[tip:timers/core] clocksource: Simplify the logic around clocksource wrapping safety margins

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  362fde0410377e468ca00ad363fdf3e3ec42eb6a
Gitweb: http://git.kernel.org/tip/362fde0410377e468ca00ad363fdf3e3ec42eb6a
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:30 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Simplify the logic around clocksource wrapping safety margins

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% safety margin,
which these 12.5% margins where then added to.

So to simplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Additionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.

Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: Stephen Boyd sb...@codeaurora.org (for the sched_clock.c bit)
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/clocksource.c | 26 --
 kernel/time/sched_clock.c |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2148f41..ace9576 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @shift: cycle to nanosecond divisor (power of two)
  * @maxadj:maximum adjustment value to mult (~11%)
  * @mask:  bitmask for two's complement subtraction of non 64 bit counters
+ *
+ * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
+ * can be detected.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 {
@@ -490,11 +493,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
max_cycles = min(max_cycles, mask);
max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+   /* Return 50% of the actual maximum, so we can detect bad values */
+   max_nsecs = 1;
+
return max_nsecs;
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * clocksource_max_deferment - Returns max time the clocksource should be 
deferred
  * @cs: Pointer to clocksource
  *
  */
@@ -504,13 +510,7 @@ static u64 clocksource_max_deferment(struct clocksource 
*cs)
 
max_nsecs = clocks_calc_max_nsecs(cs-mult, cs-shift, cs-maxadj,
  cs-mask);
-   /*
-* To ensure that the clocksource does not wrap whilst we are idle,
-* limit the time the clocksource can be deferred by 12.5%. Please
-* note a margin of 12.5% is used because this can be computed with
-* a shift, versus say 10% which would require division.
-*/
-   return max_nsecs - (max_nsecs  3);
+   return max_nsecs;
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -659,10 +659,9 @@ void __clocksource_updatefreq_scale(struct clocksource 
*cs, u32 scale, u32 freq)
 * conversion precision. 10 minutes is still a reasonable
 * amount. That results in a shift value of 24 for a
 * clocksource with mask = 40bit and f = 4GHz. That maps to
-* ~ 0.06ppm granularity for NTP. We apply the same 12.5%
-* margin as we do in clocksource_max_deferment()
+* ~ 0.06ppm granularity for NTP.
 */
-   sec = (cs-mask - (cs-mask  3));
+   sec = cs-mask;
do_div(sec, freq);
do_div(sec, scale);
if (!sec)
@@ -674,9 +673,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
   NSEC_PER_SEC / scale, sec * scale);
 
/*
-* for clocksources that have large mults, to avoid overflow.
-* Since mult may be adjusted by ntp, add an safety extra margin
-*
+* Ensure clocksources that have large 'mult' values don't overflow
+* when adjusted.
 */
cs-maxadj = clocksource_max_adjustment(cs);
while ((cs-mult + cs-maxadj  cs-mult)
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15..3b8ae45 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -125,9 +125,9 @@ void __init 

[tip:timers/core] timekeeping: Add debugging checks to warn if we see delays

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5
Gitweb: http://git.kernel.org/tip/3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:32 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:06:58 +0100

timekeeping: Add debugging checks to warn if we see delays

Recently there's been requests for better sanity
checking in the time code, so that it's more clear
when something is going wrong, since timekeeping issues
could manifest in a large number of strange ways in
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check to update_wall_time() to print two new
warnings:

 1) if we see the call delayed beyond the 'max_cycles'
overflow point,

 2) or if we see the call delayed beyond the clocksource's
'max_idle_ns' value, which is currently 50% of the
overflow point.

This extra infrastructure is conditional on
a new CONFIG_DEBUG_TIMEKEEPING option, also
added in this patch - default off.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stu...@linaro.org
[ Improved the changelog and the messages a bit. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/jiffies.c |  1 +
 kernel/time/timekeeping.c | 28 
 lib/Kconfig.debug | 13 +
 3 files changed, 42 insertions(+)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a6a5bf5..7e41390 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = {
.mask   = 0x, /*32bits*/
.mult   = NSEC_PER_JIFFY  JIFFIES_SHIFT, /* details above */
.shift  = JIFFIES_SHIFT,
+   .max_cycles = 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 91db941..acf0491 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,31 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
tk-offs_boot = ktime_add(tk-offs_boot, delta);
 }
 
+#ifdef CONFIG_DEBUG_TIMEKEEPING
+static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+{
+
+   cycle_t max_cycles = tk-tkr.clock-max_cycles;
+   const char *name = tk-tkr.clock-name;
+
+   if (offset  max_cycles) {
+   printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time 
overflow\n,
+   offset, name, max_cycles);
+   printk_deferred( timekeeping: Your kernel is sick, but 
tries to cope\n);
+   } else {
+   if (offset  (max_cycles  1)) {
+   printk_deferred(INFO: timekeeping: Cycle offset (%lld) 
is larger than the the '%s' clock's 50%% safety margin (%lld)\n,
+   offset, name, max_cycles  1);
+   printk_deferred(  timekeeping: Your kernel is 
still fine, but is feeling a bit nervous\n);
+   }
+   }
+}
+#else
+static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t 
offset)
+{
+}
+#endif
+
 /**
  * tk_setup_internals - Set up internals to use clocksource clock.
  *
@@ -1630,6 +1655,9 @@ void update_wall_time(void)
if (offset  real_tk-cycle_interval)
goto out;
 
+   /* Do some additional sanity checking */
+   timekeeping_check_update(real_tk, offset);
+
/*
 * With NO_HZ we may have to accumulate many cycle_intervals
 * (think ticks) worth of time at once. To do this efficiently,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c5cefb3..36b6fa8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -865,6 +865,19 @@ config SCHED_STACK_END_CHECK
  data corruption or a sporadic crash at a later stage once the region
  is examined. The runtime overhead introduced is minimal.
 
+config DEBUG_TIMEKEEPING
+   bool Enable extra timekeeping sanity checking
+   help
+ This option will enable additional timekeeping sanity checks
+ which may be helpful when diagnosing issues where timekeeping
+ problems are suspected.
+
+ This may include checks in the timekeeping hotpaths, so this
+ option may have a (very small) performance impact to some
+ workloads.
+
+ If unsure, say N.
+
 config 

[tip:timers/core] clocksource: Add 'max_cycles' to ' struct clocksource'

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  fb82fe2fe8588745edd73aa3a6229facac5c1e15
Gitweb: http://git.kernel.org/tip/fb82fe2fe8588745edd73aa3a6229facac5c1e15
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:31 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Add 'max_cycles' to 'struct clocksource'

In order to facilitate clocksource validation, add a
'max_cycles' field to the clocksource structure which
will hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/clocksource.h |  5 +++--
 kernel/time/clocksource.c   | 28 
 kernel/time/sched_clock.c   |  2 +-
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 9c78d15..16d048c 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -56,6 +56,7 @@ struct module;
  * @shift: cycle to nanosecond divisor (power of two)
  * @max_idle_ns:   max idle time permitted by the clocksource (nsecs)
  * @maxadj:maximum adjustment value to mult (~11%)
+ * @max_cycles:maximum safe cycle value which won't overflow 
on multiplication
  * @flags: flags describing special properties
  * @archdata:  arch-specific data
  * @suspend:   suspend function for the clocksource, if necessary
@@ -76,7 +77,7 @@ struct clocksource {
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
struct arch_clocksource_data archdata;
 #endif
-
+   u64 max_cycles;
const char *name;
struct list_head list;
int rating;
@@ -189,7 +190,7 @@ extern struct clocksource * __init 
clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64
-clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask);
+clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cycles);
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ace9576..fc2a9de 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,11 +469,13 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @shift: cycle to nanosecond divisor (power of two)
  * @maxadj:maximum adjustment value to mult (~11%)
  * @mask:  bitmask for two's complement subtraction of non 64 bit counters
+ * @max_cyc:   maximum cycle value before potential overflow (does not include
+ * any safety margin)
  *
  * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
  * can be detected.
  */
-u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
+u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cyc)
 {
u64 max_nsecs, max_cycles;
 
@@ -493,6 +495,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, 
u64 mask)
max_cycles = min(max_cycles, mask);
max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+   /* return the max_cycles value as well if requested */
+   if (max_cyc)
+   *max_cyc = max_cycles;
+
/* Return 50% of the actual maximum, so we can detect bad values */
max_nsecs = 1;
 
@@ -500,17 +506,15 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource should be 
deferred
- * @cs: Pointer to clocksource
+ * clocksource_update_max_deferment - Updates the clocksource max_idle_ns  
max_cycles
+ * @cs: Pointer to clocksource to be updated
  *
  */
-static u64 clocksource_max_deferment(struct clocksource *cs)
+static inline void clocksource_update_max_deferment(struct clocksource *cs)
 {
-   u64 max_nsecs;
-
-   max_nsecs = clocks_calc_max_nsecs(cs-mult, cs-shift, cs-maxadj,
- cs-mask);
-   return max_nsecs;
+   cs-max_idle_ns = clocks_calc_max_nsecs(cs-mult, cs-shift,
+   cs-maxadj, cs-mask,
+   cs-max_cycles);
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -684,7 +688,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
cs-maxadj = clocksource_max_adjustment(cs);
}
 
-   

[tip:timers/core] clocksource: Add some debug info about clocksources being registered

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  8cc8c525ad4e7b581cacf84119e1a28dcb4044db
Gitweb: http://git.kernel.org/tip/8cc8c525ad4e7b581cacf84119e1a28dcb4044db
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:39 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:07 +0100

clocksource: Add some debug info about clocksources being registered

Print the mask, max_cycles, and max_idle_ns values for
clocksources being registered.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/clocksource.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 5cdf17e..1977eba 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -703,6 +703,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
cs-name);
 
clocksource_update_max_deferment(cs);
+
+   pr_info(clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: 
%lld ns\n,
+   cs-name, cs-mask, cs-max_cycles, cs-max_idle_ns);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] clocksource: Mostly kill clocksource_register()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  f8935983f110505daa38e8d36ee406807f83a069
Gitweb: http://git.kernel.org/tip/f8935983f110505daa38e8d36ee406807f83a069
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:37 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:06 +0100

clocksource: Mostly kill clocksource_register()

A long running project has been to clean up remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz() functions.

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register() name, and
consolidate much of the internal logic so we don't have
duplication.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: David S. Miller da...@davemloft.net
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Martin Schwidefsky schwidef...@de.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stu...@linaro.org
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/s390/kernel/time.c |  2 +-
 arch/sparc/kernel/time_32.c |  2 +-
 include/linux/clocksource.h | 10 +-
 kernel/time/clocksource.c   | 81 +++--
 kernel/time/jiffies.c   |  4 +--
 5 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 20660dd..6c273cd 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -283,7 +283,7 @@ void __init time_init(void)
if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
panic(Couldn't request external interrupt 0x1406);
 
-   if (clocksource_register(clocksource_tod) != 0)
+   if (__clocksource_register(clocksource_tod) != 0)
panic(Could not register TOD clock source);
 
/* Enable TOD clock interrupts on the boot cpu. */
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 2f80d23..a31c0c8 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -191,7 +191,7 @@ static __init int setup_timer_cs(void)
timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
timer_cs.shift);
 
-   return clocksource_register(timer_cs);
+   return __clocksource_register(timer_cs);
 }
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 16d048c..bd98eaa 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -179,7 +179,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 
mult, u32 shift)
 }
 
 
-extern int clocksource_register(struct clocksource*);
 extern int clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
 extern struct clocksource* clocksource_get_next(void);
@@ -203,6 +202,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 
scale, u32 freq);
 extern void
 __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
+/*
+ * Don't call this unless you are a default clocksource
+ * (AKA: jiffies) and absolutely have to.
+ */
+static inline int __clocksource_register(struct clocksource *cs)
+{
+   return __clocksource_register_scale(cs, 1, 0);
+}
+
 static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 {
return __clocksource_register_scale(cs, 1, hz);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c4cc04b..5cdf17e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -656,38 +656,52 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 
freq)
 {
u64 sec;
+
/*
-* Calc the maximum number of seconds which we can run before
-* wrapping around. For clocksources which have a mask  32bit
-* we need to limit the max sleep time to have a good
-* conversion precision. 10 minutes is still a reasonable
-* amount. That results in a shift value of 24 for a
-* clocksource with mask = 40bit and f = 4GHz. That maps to
-* ~ 0.06ppm granularity for NTP.
+* Default clocksources are *special* and self-define their mult/shift.
+* But, you're not special, so you should specify a freq value.
 */
-   sec = cs-mask;
-   do_div(sec, freq);
-   do_div(sec, scale);
-   if (!sec)
-   sec = 1;
-   else if (sec  600  cs-mask  UINT_MAX)
-   sec = 600;
-
-   clocks_calc_mult_shift(cs-mult, cs-shift, freq,
-  NSEC_PER_SEC / 

[tip:timers/core] clocksource: Rename __clocksource_updatefreq_*( ) to __clocksource_update_freq_*()

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  fba9e07208c0f9d92d9f73761c99c8612039da44
Gitweb: http://git.kernel.org/tip/fba9e07208c0f9d92d9f73761c99c8612039da44
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:40 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 13 Mar 2015 08:07:08 +0100

clocksource: Rename __clocksource_updatefreq_*() to 
__clocksource_update_freq_*()

Ingo requested this function be renamed to improve readability,
so I've renamed __clocksource_updatefreq_scale() as well as the
__clocksource_updatefreq_hz/khz() functions to avoid
squishedtogethernames.

This touches some of the sh clocksources, which I've not tested.

The arch/arm/plat-omap change is just a comment change for
consistency.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Daniel Lezcano daniel.lezc...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/arm/plat-omap/counter_32k.c |  2 +-
 drivers/clocksource/em_sti.c |  2 +-
 drivers/clocksource/sh_cmt.c |  2 +-
 drivers/clocksource/sh_tmu.c |  2 +-
 include/linux/clocksource.h  | 10 +-
 kernel/time/clocksource.c| 11 ++-
 6 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c
index 61b4d70..43cf745 100644
--- a/arch/arm/plat-omap/counter_32k.c
+++ b/arch/arm/plat-omap/counter_32k.c
@@ -103,7 +103,7 @@ int __init omap_init_clocksource_32k(void __iomem *vbase)
 
/*
 * 12 rough estimate from the calculations in
-* __clocksource_updatefreq_scale.
+* __clocksource_update_freq_scale.
 */
clocks_calc_mult_shift(persistent_mult, persistent_shift,
32768, NSEC_PER_SEC, 12);
diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index d0a7bd6..dc3c6ee 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -210,7 +210,7 @@ static int em_sti_clocksource_enable(struct clocksource *cs)
 
ret = em_sti_start(p, USER_CLOCKSOURCE);
if (!ret)
-   __clocksource_updatefreq_hz(cs, p-rate);
+   __clocksource_update_freq_hz(cs, p-rate);
return ret;
 }
 
diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 2bd13b5..b8ff3c6 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -641,7 +641,7 @@ static int sh_cmt_clocksource_enable(struct clocksource *cs)
 
ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE);
if (!ret) {
-   __clocksource_updatefreq_hz(cs, ch-rate);
+   __clocksource_update_freq_hz(cs, ch-rate);
ch-cs_enabled = true;
}
return ret;
diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index f150ca82..b6b8fa3 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -272,7 +272,7 @@ static int sh_tmu_clocksource_enable(struct clocksource *cs)
 
ret = sh_tmu_enable(ch);
if (!ret) {
-   __clocksource_updatefreq_hz(cs, ch-rate);
+   __clocksource_update_freq_hz(cs, ch-rate);
ch-cs_enabled = true;
}
 
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index bd98eaa..1355098 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -200,7 +200,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 
to, u32 minsec);
 extern int
 __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
 extern void
-__clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
+__clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
 /*
  * Don't call this unless you are a default clocksource
@@ -221,14 +221,14 @@ static inline int clocksource_register_khz(struct 
clocksource *cs, u32 khz)
return __clocksource_register_scale(cs, 1000, khz);
 }
 
-static inline void __clocksource_updatefreq_hz(struct clocksource *cs, u32 hz)
+static inline void __clocksource_update_freq_hz(struct clocksource *cs, u32 hz)
 {
-   __clocksource_updatefreq_scale(cs, 1, hz);
+   __clocksource_update_freq_scale(cs, 1, hz);
 }
 
-static inline void __clocksource_updatefreq_khz(struct clocksource *cs, u32 
khz)
+static inline void __clocksource_update_freq_khz(struct clocksource *cs, u32 
khz)
 {
-   __clocksource_updatefreq_scale(cs, 1000, khz);
+   __clocksource_update_freq_scale(cs, 1000, khz);
 }
 
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 1977eba..c3be3c7 100644

[tip:timers/core] clocksource: Simplify the clocks_calc_max_nsecs () logic

2015-03-13 Thread tip-bot for John Stultz
Commit-ID:  6086e346fdea1ae64d974c94c1acacc2605567ae
Gitweb: http://git.kernel.org/tip/6086e346fdea1ae64d974c94c1acacc2605567ae
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Mar 2015 21:16:29 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 12 Mar 2015 10:16:38 +0100

clocksource: Simplify the clocks_calc_max_nsecs() logic

The previous clocks_calc_max_nsecs() code had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows with signed 64-bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other safety margins
were then added to, reducing the intended interval length.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Dave Jones da...@codemonkey.org.uk
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Cc: Stephen Boyd sb...@codeaurora.org
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/clocksource.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4892352..2148f41 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -476,19 +476,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 
maxadj, u64 mask)
 
/*
 * Calculate the maximum number of cycles that we can pass to the
-* cyc2ns function without overflowing a 64-bit signed result. The
-* maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj)
-* which is equivalent to the below.
-* max_cycles  (2^63)/(mult + maxadj)
-* max_cycles  2^(log2((2^63)/(mult + maxadj)))
-* max_cycles  2^(log2(2^63) - log2(mult + maxadj))
-* max_cycles  2^(63 - log2(mult + maxadj))
-* max_cycles  1  (63 - log2(mult + maxadj))
-* Please note that we add 1 to the result of the log2 to account for
-* any rounding errors, ensure the above inequality is satisfied and
-* no overflow will occur.
+* cyc2ns() function without overflowing a 64-bit result.
 */
-   max_cycles = 1ULL  (63 - (ilog2(mult + maxadj) + 1));
+   max_cycles = ULLONG_MAX;
+   do_div(max_cycles, mult+maxadj);
 
/*
 * The actual maximum number of cycles we can defer the clocksource is
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] ntp: Fixup adjtimex freq validation on 32-bit systems

2015-02-18 Thread tip-bot for John Stultz
Commit-ID:  29183a70b0b828500816bd794b3fe192fce89f73
Gitweb: http://git.kernel.org/tip/29183a70b0b828500816bd794b3fe192fce89f73
Author: John Stultz 
AuthorDate: Mon, 9 Feb 2015 23:30:36 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 18 Feb 2015 14:50:10 +0100

ntp: Fixup adjtimex freq validation on 32-bit systems

Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer 
Reported-by: George Joseph 
Tested-by: George Joseph 
Signed-off-by: John Stultz 
Signed-off-by: Peter Zijlstra (Intel) 
Cc:  # v3.19+
Cc: Linus Torvalds 
Cc: Sasha Levin 
Link: 
http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stu...@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar 
---
 kernel/time/ntp.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 4b585e0..0f60b08 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -633,10 +633,14 @@ int ntp_validate_timex(struct timex *txc)
if ((txc->modes & ADJ_SETOFFSET) && (!capable(CAP_SYS_TIME)))
return -EPERM;
 
-   if (txc->modes & ADJ_FREQUENCY) {
-   if (LONG_MIN / PPM_SCALE > txc->freq)
+   /*
+* Check for potential multiplication overflows that can
+* only happen on 64-bit systems:
+*/
+   if ((txc->modes & ADJ_FREQUENCY) && (BITS_PER_LONG == 64)) {
+   if (LLONG_MIN / PPM_SCALE > txc->freq)
return -EINVAL;
-   if (LONG_MAX / PPM_SCALE < txc->freq)
+   if (LLONG_MAX / PPM_SCALE < txc->freq)
return -EINVAL;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] ntp: Fixup adjtimex freq validation on 32-bit systems

2015-02-18 Thread tip-bot for John Stultz
Commit-ID:  29183a70b0b828500816bd794b3fe192fce89f73
Gitweb: http://git.kernel.org/tip/29183a70b0b828500816bd794b3fe192fce89f73
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 9 Feb 2015 23:30:36 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 18 Feb 2015 14:50:10 +0100

ntp: Fixup adjtimex freq validation on 32-bit systems

Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer jwbo...@fedoraproject.org
Reported-by: George Joseph george.jos...@fairview5.com
Tested-by: George Joseph george.jos...@fairview5.com
Signed-off-by: John Stultz john.stu...@linaro.org
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: sta...@vger.kernel.org # v3.19+
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Sasha Levin sasha.le...@oracle.com
Link: 
http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stu...@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/ntp.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 4b585e0..0f60b08 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -633,10 +633,14 @@ int ntp_validate_timex(struct timex *txc)
if ((txc-modes  ADJ_SETOFFSET)  (!capable(CAP_SYS_TIME)))
return -EPERM;
 
-   if (txc-modes  ADJ_FREQUENCY) {
-   if (LONG_MIN / PPM_SCALE  txc-freq)
+   /*
+* Check for potential multiplication overflows that can
+* only happen on 64-bit systems:
+*/
+   if ((txc-modes  ADJ_FREQUENCY)  (BITS_PER_LONG == 64)) {
+   if (LLONG_MIN / PPM_SCALE  txc-freq)
return -EINVAL;
-   if (LONG_MAX / PPM_SCALE  txc-freq)
+   if (LLONG_MAX / PPM_SCALE  txc-freq)
return -EINVAL;
}
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

2015-02-04 Thread tip-bot for John Stultz
Commit-ID:  2d926c15d629a13914ce3e5f26354f6a0ac99e70
Gitweb: http://git.kernel.org/tip/2d926c15d629a13914ce3e5f26354f6a0ac99e70
Author: John Stultz 
AuthorDate: Wed, 4 Feb 2015 16:45:26 -0800
Committer:  Ingo Molnar 
CommitDate: Thu, 5 Feb 2015 08:39:37 +0100

hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892d9 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.

This results in CLOCK_TAI timers expiring early on non-highres
capable machines.

This patch fixes the issue, calculating the tai time properly
from the monotonic base.

Signed-off-by: John Stultz 
Cc: Thomas Gleixner 
Cc: stable  # 3.17+
Link: 
http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 37e50aa..d8c724c 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -122,7 +122,7 @@ static void hrtimer_get_softirq_time(struct 
hrtimer_cpu_base *base)
mono = ktime_get_update_offsets_tick(_real, _boot, _tai);
boot = ktime_add(mono, off_boot);
xtim = ktime_add(mono, off_real);
-   tai = ktime_add(xtim, off_tai);
+   tai = ktime_add(mono, off_tai);
 
base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim;
base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

2015-02-04 Thread tip-bot for John Stultz
Commit-ID:  2d926c15d629a13914ce3e5f26354f6a0ac99e70
Gitweb: http://git.kernel.org/tip/2d926c15d629a13914ce3e5f26354f6a0ac99e70
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 4 Feb 2015 16:45:26 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 5 Feb 2015 08:39:37 +0100

hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892d9 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.

This results in CLOCK_TAI timers expiring early on non-highres
capable machines.

This patch fixes the issue, calculating the tai time properly
from the monotonic base.

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: stable sta...@vger.kernel.org # 3.17+
Link: 
http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 37e50aa..d8c724c 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -122,7 +122,7 @@ static void hrtimer_get_softirq_time(struct 
hrtimer_cpu_base *base)
mono = ktime_get_update_offsets_tick(off_real, off_boot, off_tai);
boot = ktime_add(mono, off_boot);
xtim = ktime_add(mono, off_real);
-   tai = ktime_add(xtim, off_tai);
+   tai = ktime_add(mono, off_tai);
 
base-clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim;
base-clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] time: Fix sign bug in NTP mult overflow warning

2014-11-24 Thread tip-bot for John Stultz
Commit-ID:  cb2aa63469f81426c7406227be70b628b42f7a05
Gitweb: http://git.kernel.org/tip/cb2aa63469f81426c7406227be70b628b42f7a05
Author: John Stultz 
AuthorDate: Mon, 24 Nov 2014 20:35:45 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Nov 2014 07:18:34 +0100

time: Fix sign bug in NTP mult overflow warning

In commit 6067dc5a8c2b ("time: Avoid possible NTP adjustment
mult overflow") a new check was added to watch for adjustments
that could cause a mult overflow.

Unfortunately the check compares a signed with unsigned value
and ignored the case where the adjustment was negative, which
causes spurious warn-ons on some systems (and seems like it
would result in problematic time adjustments there as well, due
to the early return).

Thus this patch adds a check to make sure the adjustment is
positive before we check for an overflow, and resovles the issue
in my testing.

Reported-by: Fengguang Wu 
Debugged-by: pang.xunlei 
Signed-off-by: John Stultz 
Link: 
http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 29a7d67..2dc0646 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1330,7 +1330,7 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *
 * XXX - TODO: Doc ntp_error calculation.
 */
-   if (tk->tkr.mult + mult_adj < mult_adj) {
+   if ((mult_adj > 0) && (tk->tkr.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
WARN_ON_ONCE(1);
return;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] time: Fix sign bug in NTP mult overflow warning

2014-11-24 Thread tip-bot for John Stultz
Commit-ID:  cb2aa63469f81426c7406227be70b628b42f7a05
Gitweb: http://git.kernel.org/tip/cb2aa63469f81426c7406227be70b628b42f7a05
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 24 Nov 2014 20:35:45 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 25 Nov 2014 07:18:34 +0100

time: Fix sign bug in NTP mult overflow warning

In commit 6067dc5a8c2b (time: Avoid possible NTP adjustment
mult overflow) a new check was added to watch for adjustments
that could cause a mult overflow.

Unfortunately the check compares a signed with unsigned value
and ignored the case where the adjustment was negative, which
causes spurious warn-ons on some systems (and seems like it
would result in problematic time adjustments there as well, due
to the early return).

Thus this patch adds a check to make sure the adjustment is
positive before we check for an overflow, and resovles the issue
in my testing.

Reported-by: Fengguang Wu fengguang...@intel.com
Debugged-by: pang.xunlei pang.xun...@linaro.org
Signed-off-by: John Stultz john.stu...@linaro.org
Link: 
http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 29a7d67..2dc0646 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1330,7 +1330,7 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *
 * XXX - TODO: Doc ntp_error calculation.
 */
-   if (tk-tkr.mult + mult_adj  mult_adj) {
+   if ((mult_adj  0)  (tk-tkr.mult + mult_adj  mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
WARN_ON_ONCE(1);
return;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Fixup typo in update_vsyscall_old definition

2014-07-30 Thread tip-bot for John Stultz
Commit-ID:  953dec21aed4038464fec02f96a2f1b8701a5bce
Gitweb: http://git.kernel.org/tip/953dec21aed4038464fec02f96a2f1b8701a5bce
Author: John Stultz 
AuthorDate: Fri, 25 Jul 2014 21:37:19 -0700
Committer:  Thomas Gleixner 
CommitDate: Wed, 30 Jul 2014 09:26:25 +0200

timekeeping: Fixup typo in update_vsyscall_old definition

In commit 4a0e637738f0 ("clocksource: Get rid of cycle_last"),
currently in the -tip tree, there was a small typo where cycles_t
was used intstead of cycle_t. This broke ppc64 builds.

Fix this by using the proper cycle_t type for this usage, in
both the definition and the ia64 implementation.

Now, having both cycle_t and cycles_t types seems like a very
bad idea just asking for these sorts of issues. But that
will be a cleanup for another day.

Reported-by: Stephen Rothwell 
Signed-off-by: John Stultz 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1406349439-11785-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 arch/ia64/kernel/time.c | 2 +-
 include/linux/timekeeper_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 11dc42d..3e71ef8 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -441,7 +441,7 @@ void update_vsyscall_tz(void)
 }
 
 void update_vsyscall_old(struct timespec *wall, struct timespec *wtm,
-struct clocksource *c, u32 mult, cycles_t cycle_last)
+struct clocksource *c, u32 mult, cycle_t cycle_last)
 {
write_seqcount_begin(_gtod_data.seq);
 
diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e9660e5..95640dc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -113,7 +113,7 @@ extern void update_vsyscall_tz(void);
 
 extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
struct clocksource *c, u32 mult,
-   cycles_t cycle_last);
+   cycle_t cycle_last);
 extern void update_vsyscall_tz(void);
 
 #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Fixup typo in update_vsyscall_old definition

2014-07-30 Thread tip-bot for John Stultz
Commit-ID:  953dec21aed4038464fec02f96a2f1b8701a5bce
Gitweb: http://git.kernel.org/tip/953dec21aed4038464fec02f96a2f1b8701a5bce
Author: John Stultz john.stu...@linaro.org
AuthorDate: Fri, 25 Jul 2014 21:37:19 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Wed, 30 Jul 2014 09:26:25 +0200

timekeeping: Fixup typo in update_vsyscall_old definition

In commit 4a0e637738f0 (clocksource: Get rid of cycle_last),
currently in the -tip tree, there was a small typo where cycles_t
was used intstead of cycle_t. This broke ppc64 builds.

Fix this by using the proper cycle_t type for this usage, in
both the definition and the ia64 implementation.

Now, having both cycle_t and cycles_t types seems like a very
bad idea just asking for these sorts of issues. But that
will be a cleanup for another day.

Reported-by: Stephen Rothwell s...@canb.auug.org.au
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Ingo Molnar mi...@kernel.org
Cc: H. Peter Anvin h...@zytor.com
Cc: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1406349439-11785-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 arch/ia64/kernel/time.c | 2 +-
 include/linux/timekeeper_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 11dc42d..3e71ef8 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -441,7 +441,7 @@ void update_vsyscall_tz(void)
 }
 
 void update_vsyscall_old(struct timespec *wall, struct timespec *wtm,
-struct clocksource *c, u32 mult, cycles_t cycle_last)
+struct clocksource *c, u32 mult, cycle_t cycle_last)
 {
write_seqcount_begin(fsyscall_gtod_data.seq);
 
diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e9660e5..95640dc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -113,7 +113,7 @@ extern void update_vsyscall_tz(void);
 
 extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
struct clocksource *c, u32 mult,
-   cycles_t cycle_last);
+   cycle_t cycle_last);
 extern void update_vsyscall_tz(void);
 
 #else
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] alarmtimer: Fix bug where relative alarm timers were treated as absolute

2014-07-08 Thread tip-bot for John Stultz
Commit-ID:  16927776ae757d0d132bdbfabbfe2c498342bd59
Gitweb: http://git.kernel.org/tip/16927776ae757d0d132bdbfabbfe2c498342bd59
Author: John Stultz 
AuthorDate: Mon, 7 Jul 2014 14:06:11 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 8 Jul 2014 10:49:36 +0200

alarmtimer: Fix bug where relative alarm timers were treated as absolute

Sharvil noticed with the posix timer_settime interface, using the
CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users
tried to specify a relative time timer, it would incorrectly be
treated as absolute regardless of the state of the flags argument.

This patch corrects this, properly checking the absolute/relative flag,
as well as adds further error checking that no invalid flag bits are set.

Reported-by: Sharvil Nanavati 
Signed-off-by: John Stultz 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Prarit Bhargava 
Cc: Sharvil Nanavati 
Cc: stable  #3.0+
Link: 
http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/alarmtimer.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 88c9c65..fe75444 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int 
flags,
struct itimerspec *new_setting,
struct itimerspec *old_setting)
 {
+   ktime_t exp;
+
if (!rtcdev)
return -ENOTSUPP;
 
+   if (flags & ~TIMER_ABSTIME)
+   return -EINVAL;
+
if (old_setting)
alarm_timer_get(timr, old_setting);
 
@@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int 
flags,
 
/* start the timer */
timr->it.alarm.interval = timespec_to_ktime(new_setting->it_interval);
-   alarm_start(>it.alarm.alarmtimer,
-   timespec_to_ktime(new_setting->it_value));
+   exp = timespec_to_ktime(new_setting->it_value);
+   /* Convert (if necessary) to absolute time */
+   if (flags != TIMER_ABSTIME) {
+   ktime_t now;
+
+   now = alarm_bases[timr->it.alarm.alarmtimer.type].gettime();
+   exp = ktime_add(now, exp);
+   }
+
+   alarm_start(>it.alarm.alarmtimer, exp);
return 0;
 }
 
@@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, 
int flags,
if (!alarmtimer_get_rtcdev())
return -ENOTSUPP;
 
+   if (flags & ~TIMER_ABSTIME)
+   return -EINVAL;
+
if (!capable(CAP_WAKE_ALARM))
return -EPERM;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] alarmtimer: Fix bug where relative alarm timers were treated as absolute

2014-07-08 Thread tip-bot for John Stultz
Commit-ID:  16927776ae757d0d132bdbfabbfe2c498342bd59
Gitweb: http://git.kernel.org/tip/16927776ae757d0d132bdbfabbfe2c498342bd59
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 7 Jul 2014 14:06:11 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Tue, 8 Jul 2014 10:49:36 +0200

alarmtimer: Fix bug where relative alarm timers were treated as absolute

Sharvil noticed with the posix timer_settime interface, using the
CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users
tried to specify a relative time timer, it would incorrectly be
treated as absolute regardless of the state of the flags argument.

This patch corrects this, properly checking the absolute/relative flag,
as well as adds further error checking that no invalid flag bits are set.

Reported-by: Sharvil Nanavati shar...@google.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@kernel.org
Cc: Prarit Bhargava pra...@redhat.com
Cc: Sharvil Nanavati shar...@google.com
Cc: stable sta...@vger.kernel.org #3.0+
Link: 
http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/alarmtimer.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 88c9c65..fe75444 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int 
flags,
struct itimerspec *new_setting,
struct itimerspec *old_setting)
 {
+   ktime_t exp;
+
if (!rtcdev)
return -ENOTSUPP;
 
+   if (flags  ~TIMER_ABSTIME)
+   return -EINVAL;
+
if (old_setting)
alarm_timer_get(timr, old_setting);
 
@@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int 
flags,
 
/* start the timer */
timr-it.alarm.interval = timespec_to_ktime(new_setting-it_interval);
-   alarm_start(timr-it.alarm.alarmtimer,
-   timespec_to_ktime(new_setting-it_value));
+   exp = timespec_to_ktime(new_setting-it_value);
+   /* Convert (if necessary) to absolute time */
+   if (flags != TIMER_ABSTIME) {
+   ktime_t now;
+
+   now = alarm_bases[timr-it.alarm.alarmtimer.type].gettime();
+   exp = ktime_add(now, exp);
+   }
+
+   alarm_start(timr-it.alarm.alarmtimer, exp);
return 0;
 }
 
@@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, 
int flags,
if (!alarmtimer_get_rtcdev())
return -ENOTSUPP;
 
+   if (flags  ~TIMER_ABSTIME)
+   return -EINVAL;
+
if (!capable(CAP_WAKE_ALARM))
return -EPERM;
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Revert to calling clock_was_set_delayed () while in irq context

2014-03-28 Thread tip-bot for John Stultz
Commit-ID:  cab5e127eef040399902caa8e1510795583fa03a
Gitweb: http://git.kernel.org/tip/cab5e127eef040399902caa8e1510795583fa03a
Author: John Stultz 
AuthorDate: Thu, 27 Mar 2014 16:30:49 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 28 Mar 2014 08:07:07 +0100

time: Revert to calling clock_was_set_delayed() while in irq context

In commit 47a1b796306356f35 ("tick/timekeeping: Call
update_wall_time outside the jiffies lock"), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz 
Cc: Linus Torvalds 
Link: 
http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0aa4ce8..5b40279 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1435,7 +1435,8 @@ void update_wall_time(void)
 out:
raw_spin_unlock_irqrestore(_lock, flags);
if (clock_set)
-   clock_was_set();
+   /* Have to call _delayed version, since in irq context*/
+   clock_was_set_delayed();
 }
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Revert to calling clock_was_set_delayed () while in irq context

2014-03-28 Thread tip-bot for John Stultz
Commit-ID:  cab5e127eef040399902caa8e1510795583fa03a
Gitweb: http://git.kernel.org/tip/cab5e127eef040399902caa8e1510795583fa03a
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 27 Mar 2014 16:30:49 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 28 Mar 2014 08:07:07 +0100

time: Revert to calling clock_was_set_delayed() while in irq context

In commit 47a1b796306356f35 (tick/timekeeping: Call
update_wall_time outside the jiffies lock), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Linus Torvalds torva...@linux-foundation.org
Link: 
http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0aa4ce8..5b40279 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1435,7 +1435,8 @@ void update_wall_time(void)
 out:
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
if (clock_set)
-   clock_was_set();
+   /* Have to call _delayed version, since in irq context*/
+   clock_was_set_delayed();
 }
 
 /**
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:core/urgent] seqlock: Use raw_ prefix instead of _no_lockdep

2014-01-12 Thread tip-bot for John Stultz
Commit-ID:  0c3351d451ae2fa438d5d1ed719fc43354fbffbb
Gitweb: http://git.kernel.org/tip/0c3351d451ae2fa438d5d1ed719fc43354fbffbb
Author: John Stultz 
AuthorDate: Thu, 2 Jan 2014 15:11:13 -0800
Committer:  Ingo Molnar 
CommitDate: Sun, 12 Jan 2014 10:13:59 +0100

seqlock: Use raw_ prefix instead of _no_lockdep

Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds 
Reviewed-by: Stephen Boyd 
Signed-off-by: John Stultz 
Signed-off-by: Peter Zijlstra 
Cc: Krzysztof Hałasa 
Cc: Uwe Kleine-König 
Cc: Willy Tarreau 
Link: 
http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/vdso/vclock_gettime.c |  8 
 include/linux/seqlock.h| 27 +++
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 2ada505..eb5d7a5 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct 
timespec *ts)
 
ts->tv_nsec = 0;
do {
-   seq = read_seqcount_begin_no_lockdep(>seq);
+   seq = raw_read_seqcount_begin(>seq);
mode = gtod->clock.vclock_mode;
ts->tv_sec = gtod->wall_time_sec;
ns = gtod->wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
ts->tv_nsec = 0;
do {
-   seq = read_seqcount_begin_no_lockdep(>seq);
+   seq = raw_read_seqcount_begin(>seq);
mode = gtod->clock.vclock_mode;
ts->tv_sec = gtod->monotonic_time_sec;
ns = gtod->monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin_no_lockdep(>seq);
+   seq = raw_read_seqcount_begin(>seq);
ts->tv_sec = gtod->wall_time_coarse.tv_sec;
ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(>seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin_no_lockdep(>seq);
+   seq = raw_read_seqcount_begin(>seq);
ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(>seq, seq)));
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cf87a24..535f158 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -117,15 +117,15 @@ repeat:
 }
 
 /**
- * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep
+ * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * read_seqcount_begin_no_lockdep opens a read critical section of the given
+ * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount, but without any lockdep checking. Validity of the critical
  * section is tested by checking read_seqcount_retry function.
  */
-static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s)
+static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
unsigned ret = __read_seqcount_begin(s);
smp_rmb();
@@ -144,7 +144,7 @@ static inline unsigned read_seqcount_begin_no_lockdep(const 
seqcount_t *s)
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
seqcount_lockdep_reader_access(s);
-   return read_seqcount_begin_no_lockdep(s);
+   return raw_read_seqcount_begin(s);
 }
 
 /**
@@ -206,14 +206,26 @@ static inline int read_seqcount_retry(const seqcount_t 
*s, unsigned start)
 }
 
 
+
+static inline void raw_write_seqcount_begin(seqcount_t *s)
+{
+   s->sequence++;
+   smp_wmb();
+}
+
+static inline void raw_write_seqcount_end(seqcount_t *s)
+{
+   smp_wmb();
+   s->sequence++;
+}
+
 /*
  * Sequence counter only version assumes that callers are using their
  * own mutexing.
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-   s->sequence++;
-   smp_wmb();
+   raw_write_seqcount_begin(s);
seqcount_acquire(>dep_map, subclass, 0, _RET_IP_);
 }
 
@@ -225,8 +237,7 @@ static inline void write_seqcount_begin(seqcount_t *s)
 static inline void write_seqcount_end(seqcount_t *s)
 {
seqcount_release(>dep_map, 1, _RET_IP_);
-   smp_wmb();
-   s->sequence++;
+   raw_write_seqcount_end(s);
 }
 
 /**
--
To unsubscribe 

[tip:core/urgent] sched_clock: Disable seqlock lockdep usage in sched_clock()

2014-01-12 Thread tip-bot for John Stultz
Commit-ID:  7a06c41cbec33c6dbe7eec575c61986122617408
Gitweb: http://git.kernel.org/tip/7a06c41cbec33c6dbe7eec575c61986122617408
Author: John Stultz 
AuthorDate: Thu, 2 Jan 2014 15:11:14 -0800
Committer:  Ingo Molnar 
CommitDate: Sun, 12 Jan 2014 10:14:00 +0100

sched_clock: Disable seqlock lockdep usage in sched_clock()

Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.

Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.

Acked-by: Linus Torvalds 
Reviewed-by: Stephen Boyd 
Reported-by: Krzysztof Hałasa 
Signed-off-by: John Stultz 
Cc: Uwe Kleine-König 
Cc: Willy Tarreau 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/sched_clock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 68b7993..0abb364 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -74,7 +74,7 @@ unsigned long long notrace sched_clock(void)
return cd.epoch_ns;
 
do {
-   seq = read_seqcount_begin();
+   seq = raw_read_seqcount_begin();
epoch_cyc = cd.epoch_cyc;
epoch_ns = cd.epoch_ns;
} while (read_seqcount_retry(, seq));
@@ -99,10 +99,10 @@ static void notrace update_sched_clock(void)
  cd.mult, cd.shift);
 
raw_local_irq_save(flags);
-   write_seqcount_begin();
+   raw_write_seqcount_begin();
cd.epoch_ns = ns;
cd.epoch_cyc = cyc;
-   write_seqcount_end();
+   raw_write_seqcount_end();
raw_local_irq_restore(flags);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:core/urgent] sched_clock: Disable seqlock lockdep usage in sched_clock()

2014-01-12 Thread tip-bot for John Stultz
Commit-ID:  7a06c41cbec33c6dbe7eec575c61986122617408
Gitweb: http://git.kernel.org/tip/7a06c41cbec33c6dbe7eec575c61986122617408
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 2 Jan 2014 15:11:14 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Sun, 12 Jan 2014 10:14:00 +0100

sched_clock: Disable seqlock lockdep usage in sched_clock()

Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.

Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.

Acked-by: Linus Torvalds torva...@linux-foundation.org
Reviewed-by: Stephen Boyd sb...@codeaurora.org
Reported-by: Krzysztof Hałasa khal...@piap.pl
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de
Cc: Willy Tarreau w...@1wt.eu
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/sched_clock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 68b7993..0abb364 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -74,7 +74,7 @@ unsigned long long notrace sched_clock(void)
return cd.epoch_ns;
 
do {
-   seq = read_seqcount_begin(cd.seq);
+   seq = raw_read_seqcount_begin(cd.seq);
epoch_cyc = cd.epoch_cyc;
epoch_ns = cd.epoch_ns;
} while (read_seqcount_retry(cd.seq, seq));
@@ -99,10 +99,10 @@ static void notrace update_sched_clock(void)
  cd.mult, cd.shift);
 
raw_local_irq_save(flags);
-   write_seqcount_begin(cd.seq);
+   raw_write_seqcount_begin(cd.seq);
cd.epoch_ns = ns;
cd.epoch_cyc = cyc;
-   write_seqcount_end(cd.seq);
+   raw_write_seqcount_end(cd.seq);
raw_local_irq_restore(flags);
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:core/urgent] seqlock: Use raw_ prefix instead of _no_lockdep

2014-01-12 Thread tip-bot for John Stultz
Commit-ID:  0c3351d451ae2fa438d5d1ed719fc43354fbffbb
Gitweb: http://git.kernel.org/tip/0c3351d451ae2fa438d5d1ed719fc43354fbffbb
Author: John Stultz john.stu...@linaro.org
AuthorDate: Thu, 2 Jan 2014 15:11:13 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Sun, 12 Jan 2014 10:13:59 +0100

seqlock: Use raw_ prefix instead of _no_lockdep

Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds torva...@linux-foundation.org
Reviewed-by: Stephen Boyd sb...@codeaurora.org
Signed-off-by: John Stultz john.stu...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Cc: Krzysztof Hałasa khal...@piap.pl
Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de
Cc: Willy Tarreau w...@1wt.eu
Link: 
http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/vdso/vclock_gettime.c |  8 
 include/linux/seqlock.h| 27 +++
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 2ada505..eb5d7a5 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct 
timespec *ts)
 
ts-tv_nsec = 0;
do {
-   seq = read_seqcount_begin_no_lockdep(gtod-seq);
+   seq = raw_read_seqcount_begin(gtod-seq);
mode = gtod-clock.vclock_mode;
ts-tv_sec = gtod-wall_time_sec;
ns = gtod-wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
ts-tv_nsec = 0;
do {
-   seq = read_seqcount_begin_no_lockdep(gtod-seq);
+   seq = raw_read_seqcount_begin(gtod-seq);
mode = gtod-clock.vclock_mode;
ts-tv_sec = gtod-monotonic_time_sec;
ns = gtod-monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin_no_lockdep(gtod-seq);
+   seq = raw_read_seqcount_begin(gtod-seq);
ts-tv_sec = gtod-wall_time_coarse.tv_sec;
ts-tv_nsec = gtod-wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin_no_lockdep(gtod-seq);
+   seq = raw_read_seqcount_begin(gtod-seq);
ts-tv_sec = gtod-monotonic_time_coarse.tv_sec;
ts-tv_nsec = gtod-monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cf87a24..535f158 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -117,15 +117,15 @@ repeat:
 }
 
 /**
- * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep
+ * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * read_seqcount_begin_no_lockdep opens a read critical section of the given
+ * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount, but without any lockdep checking. Validity of the critical
  * section is tested by checking read_seqcount_retry function.
  */
-static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s)
+static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
unsigned ret = __read_seqcount_begin(s);
smp_rmb();
@@ -144,7 +144,7 @@ static inline unsigned read_seqcount_begin_no_lockdep(const 
seqcount_t *s)
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
seqcount_lockdep_reader_access(s);
-   return read_seqcount_begin_no_lockdep(s);
+   return raw_read_seqcount_begin(s);
 }
 
 /**
@@ -206,14 +206,26 @@ static inline int read_seqcount_retry(const seqcount_t 
*s, unsigned start)
 }
 
 
+
+static inline void raw_write_seqcount_begin(seqcount_t *s)
+{
+   s-sequence++;
+   smp_wmb();
+}
+
+static inline void raw_write_seqcount_end(seqcount_t *s)
+{
+   smp_wmb();
+   s-sequence++;
+}
+
 /*
  * Sequence counter only version assumes that callers are using their
  * own mutexing.
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-   s-sequence++;
-   smp_wmb();
+   raw_write_seqcount_begin(s);
seqcount_acquire(s-dep_map, subclass, 0, _RET_IP_);
 }
 
@@ -225,8 +237,7 @@ static inline void 

[tip:core/locking] ipv6: Fix possible ipv6 seqlock deadlock

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b
Gitweb: http://git.kernel.org/tip/5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b
Author: John Stultz 
AuthorDate: Mon, 7 Oct 2013 15:52:01 -0700
Committer:  Ingo Molnar 
CommitDate: Wed, 6 Nov 2013 12:40:28 +0100

ipv6: Fix possible ipv6 seqlock deadlock

While enabling lockdep on seqlocks, I ran across the warning below
caused by the ipv6 stats being updated in both irq and non-irq context.

This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
by Eric Dumazet) to resolve this problem.

[   11.120383] =
[   11.121024] [ INFO: inconsistent lock state ]
[   11.121663] 3.12.0-rc1+ #68 Not tainted
[   11.19] -
[   11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
[   11.124505]  (>syncp.seq#6){+.?...}, at: [] 
ndisc_send_ns+0xe2/0x130
[   11.125736] {SOFTIRQ-ON-W} state was registered at:
[   11.126447]   [] __lock_acquire+0x5c7/0x1af0
[   11.127222]   [] lock_acquire+0x96/0xd0
[   11.127925]   [] write_seqcount_begin+0x33/0x40
[   11.128766]   [] ip6_dst_lookup_tail+0x3a3/0x460
[   11.129582]   [] ip6_dst_lookup_flow+0x2e/0x80
[   11.130014]   [] ip6_datagram_connect+0x150/0x4e0
[   11.130014]   [] inet_dgram_connect+0x25/0x70
[   11.130014]   [] SYSC_connect+0xa1/0xc0
[   11.130014]   [] SyS_connect+0x11/0x20
[   11.130014]   [] SyS_socketcall+0x12b/0x300
[   11.130014]   [] syscall_call+0x7/0xb
[   11.130014] irq event stamp: 1184
[   11.130014] hardirqs last  enabled at (1184): [] 
local_bh_enable+0x71/0x110
[   11.130014] hardirqs last disabled at (1183): [] 
local_bh_enable+0x3d/0x110
[   11.130014] softirqs last  enabled at (0): [] 
copy_process.part.42+0x45d/0x11a0
[   11.130014] softirqs last disabled at (1147): [] irq_exit+0xa5/0xb0
[   11.130014]
[   11.130014] other info that might help us debug this:
[   11.130014]  Possible unsafe locking scenario:
[   11.130014]
[   11.130014]CPU0
[   11.130014]
[   11.130014]   lock(>syncp.seq#6);
[   11.130014]   
[   11.130014] lock(>syncp.seq#6);
[   11.130014]
[   11.130014]  *** DEADLOCK ***
[   11.130014]
[   11.130014] 3 locks held by init/4483:
[   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [] 
SyS_setpriority+0x4c/0x620
[   11.130014]  #1:  (((>dad_timer))){+.-...}, at: [] 
call_timer_fn+0x0/0xf0
[   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [] 
ndisc_send_skb+0x54/0x5d0
[   11.130014]
[   11.130014] stack backtrace:
[   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
[   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   11.130014]    c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 
c1ec1123
[   11.130014]  c1ec1484 1183   0001 0003 0001 

[   11.130014]  c1ec1484 0004 c5712dcc  c55e5c84 c10de492 0004 
c10755f2
[   11.130014] Call Trace:
[   11.130014]  [] dump_stack+0x4b/0x66
[   11.130014]  [] print_usage_bug+0x1d3/0x1dd
[   11.130014]  [] mark_lock+0x282/0x2f0
[   11.130014]  [] ? kvm_clock_read+0x22/0x30
[   11.130014]  [] ? check_usage_backwards+0x150/0x150
[   11.130014]  [] __lock_acquire+0x584/0x1af0
[   11.130014]  [] ? sched_clock_cpu+0xef/0x190
[   11.130014]  [] ? mark_held_locks+0x8c/0xf0
[   11.130014]  [] lock_acquire+0x96/0xd0
[   11.130014]  [] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [] ndisc_send_skb+0x293/0x5d0
[   11.130014]  [] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [] ndisc_send_ns+0xe2/0x130
[   11.130014]  [] ? mod_timer+0xf2/0x160
[   11.130014]  [] ? addrconf_dad_timer+0xce/0x150
[   11.130014]  [] addrconf_dad_timer+0x10a/0x150
[   11.130014]  [] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [] call_timer_fn+0x73/0xf0
[   11.130014]  [] ? __internal_add_timer+0xb0/0xb0
[   11.130014]  [] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [] run_timer_softirq+0x141/0x1e0
[   11.130014]  [] ? __do_softirq+0x70/0x1b0
[   11.130014]  [] __do_softirq+0xc0/0x1b0
[   11.130014]  [] irq_exit+0xa5/0xb0
[   11.130014]  [] smp_apic_timer_interrupt+0x35/0x50
[   11.130014]  [] apic_timer_interrupt+0x32/0x38
[   11.130014]  [] ? SyS_setpriority+0xfd/0x620
[   11.130014]  [] ? lock_release+0x9/0x240
[   11.130014]  [] ? SyS_setpriority+0xe7/0x620
[   11.130014]  [] ? _raw_read_unlock+0x1d/0x30
[   11.130014]  [] SyS_setpriority+0x111/0x620
[   11.130014]  [] ? SyS_setpriority+0x4c/0x620
[   11.130014]  [] syscall_call+0x7/0xb

Signed-off-by: John Stultz 
Acked-by: Eric Dumazet 
Signed-off-by: Peter Zijlstra 
Cc: Alexey Kuznetsov 
Cc: "David S. Miller" 
Cc: Hideaki YOSHIFUJI 
Cc: James Morris 
Cc: Mathieu Desnoyers 
Cc: Patrick McHardy 
Cc: Steven Rostedt 
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 net/ipv6/ip6_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 

[tip:core/locking] net: Explicitly initialize u64_stats_sync structures for lockdep

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  827da44c61419f29ae3be198c342e2147f1a10cb
Gitweb: http://git.kernel.org/tip/827da44c61419f29ae3be198c342e2147f1a10cb
Author: John Stultz 
AuthorDate: Mon, 7 Oct 2013 15:51:58 -0700
Committer:  Ingo Molnar 
CommitDate: Wed, 6 Nov 2013 12:40:25 +0100

net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz 
Acked-by: Julian Anastasov 
Signed-off-by: Peter Zijlstra 
Cc: Alexey Kuznetsov 
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Hideaki YOSHIFUJI 
Cc: James Morris 
Cc: Jesse Gross 
Cc: Mathieu Desnoyers 
Cc: "Michael S. Tsirkin" 
Cc: Mirko Lindner 
Cc: Patrick McHardy 
Cc: Roger Luethi 
Cc: Rusty Russell 
Cc: Simon Horman 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Thomas Petazzoni 
Cc: Wensong Zhang 
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 drivers/net/dummy.c|  6 ++
 drivers/net/ethernet/emulex/benet/be_main.c|  4 
 drivers/net/ethernet/intel/igb/igb_main.c  |  5 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  4 
 drivers/net/ethernet/marvell/mvneta.c  |  3 +++
 drivers/net/ethernet/marvell/sky2.c|  3 +++
 drivers/net/ethernet/neterion/vxge/vxge-main.c |  4 
 drivers/net/ethernet/nvidia/forcedeth.c|  2 ++
 drivers/net/ethernet/realtek/8139too.c |  3 +++
 drivers/net/ethernet/tile/tilepro.c|  2 ++
 drivers/net/ethernet/via/via-rhine.c   |  3 +++
 drivers/net/ifb.c  |  5 +
 drivers/net/loopback.c |  6 ++
 drivers/net/macvlan.c  |  7 +++
 drivers/net/nlmon.c|  8 
 drivers/net/team/team.c|  6 ++
 drivers/net/team/team_mode_loadbalance.c   |  9 -
 drivers/net/veth.c |  8 
 drivers/net/virtio_net.c   |  8 
 drivers/net/vxlan.c|  8 
 drivers/net/xen-netfront.c |  6 ++
 include/linux/u64_stats_sync.h |  7 +++
 net/8021q/vlan_dev.c   |  9 -
 net/bridge/br_device.c |  7 +++
 net/ipv4/af_inet.c | 14 ++
 net/ipv4/ip_tunnel.c   |  8 +++-
 net/ipv6/addrconf.c| 14 ++
 net/ipv6/af_inet6.c| 14 ++
 net/ipv6/ip6_gre.c | 15 +++
 net/ipv6/ip6_tunnel.c  |  7 +++
 net/ipv6/sit.c | 15 +++
 net/netfilter/ipvs/ip_vs_ctl.c | 25 ++---
 net/openvswitch/datapath.c |  6 ++
 net/openvswitch/vport.c|  8 
 34 files changed, 253 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index b710c6b..bd8f84b 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -88,10 +88,16 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+   int i;
dev->dstats = alloc_percpu(struct pcpu_dstats);
if (!dev->dstats)
return -ENOMEM;
 
+   for_each_possible_cpu(i) {
+   struct pcpu_dstats *dstats;
+   dstats = per_cpu_ptr(dev->dstats, i);
+   u64_stats_init(>syncp);
+   }
return 0;
 }
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 2c38cc4..edd7595 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2047,6 +2047,9 @@ static int be_tx_qs_create(struct be_adapter *adapter)
if (status)
return status;
 
+   u64_stats_init(>stats.sync);
+   u64_stats_init(>stats.sync_compl);
+
/* If num_evt_qs is less than num_tx_qs, then more than

[tip:core/locking] cpuset: Fix potential deadlock w/ set_mems_allowed

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  db751fe3ea6880ff5ac5abe60cb7b80deb5a4140
Gitweb: http://git.kernel.org/tip/db751fe3ea6880ff5ac5abe60cb7b80deb5a4140
Author: John Stultz 
AuthorDate: Mon, 7 Oct 2013 15:52:00 -0700
Committer:  Ingo Molnar 
CommitDate: Wed, 6 Nov 2013 12:40:27 +0100

cpuset: Fix potential deadlock w/ set_mems_allowed

After adding lockdep support to seqlock/seqcount structures,
I started seeing the following warning:

[1.070907] ==
[1.072015] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[1.073181] 3.11.0+ #67 Not tainted
[1.073801] --
[1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[1.076088]  (>mems_allowed_seq){+.+...}, at: [] 
new_slab+0x5f/0x280
[1.077572]
[1.077572] and this task is already holding:
[1.078593]  (&(>__queue_lock)->rlock){..-...}, at: [] 
blk_execute_rq_nowait+0x53/0xf0
[1.080042] which would create a new lock dependency:
[1.080042]  (&(>__queue_lock)->rlock){..-...} -> 
(>mems_allowed_seq){+.+...}
[1.080042]
[1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock:
[1.080042]  (&(>__queue_lock)->rlock){..-...}
[1.080042] ... which became SOFTIRQ-irq-safe at:
[1.080042]   [] __lock_acquire+0x5b9/0x1db0
[1.080042]   [] lock_acquire+0x95/0x130
[1.080042]   [] _raw_spin_lock+0x41/0x80
[1.080042]   [] scsi_device_unbusy+0x7e/0xd0
[1.080042]   [] scsi_finish_command+0x32/0xf0
[1.080042]   [] scsi_softirq_done+0xa1/0x130
[1.080042]   [] blk_done_softirq+0x73/0x90
[1.080042]   [] __do_softirq+0x110/0x2f0
[1.080042]   [] run_ksoftirqd+0x2d/0x60
[1.080042]   [] smpboot_thread_fn+0x156/0x1e0
[1.080042]   [] kthread+0xd6/0xe0
[1.080042]   [] ret_from_fork+0x7c/0xb0
[1.080042]
[1.080042] to a SOFTIRQ-irq-unsafe lock:
[1.080042]  (>mems_allowed_seq){+.+...}
[1.080042] ... which became SOFTIRQ-irq-unsafe at:
[1.080042] ...  [] __lock_acquire+0x613/0x1db0
[1.080042]   [] lock_acquire+0x95/0x130
[1.080042]   [] kthreadd+0x82/0x180
[1.080042]   [] ret_from_fork+0x7c/0xb0
[1.080042]
[1.080042] other info that might help us debug this:
[1.080042]
[1.080042]  Possible interrupt unsafe locking scenario:
[1.080042]
[1.080042]CPU0CPU1
[1.080042]
[1.080042]   lock(>mems_allowed_seq);
[1.080042]local_irq_disable();
[1.080042]lock(&(>__queue_lock)->rlock);
[1.080042]lock(>mems_allowed_seq);
[1.080042]   
[1.080042] lock(&(>__queue_lock)->rlock);
[1.080042]
[1.080042]  *** DEADLOCK ***

The issue stems from the kthreadd() function calling set_mems_allowed
with irqs enabled. While its possibly unlikely for the actual deadlock
to trigger, a fix is fairly simple: disable irqs before taking the
mems_allowed_seq lock.

Signed-off-by: John Stultz 
Signed-off-by: Peter Zijlstra 
Acked-by: Li Zefan 
Cc: Mathieu Desnoyers 
Cc: Steven Rostedt 
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 include/linux/cpuset.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index cc1b01c..3fe661f 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq)
 
 static inline void set_mems_allowed(nodemask_t nodemask)
 {
+   unsigned long flags;
+
task_lock(current);
+   local_irq_save(flags);
write_seqcount_begin(>mems_allowed_seq);
current->mems_allowed = nodemask;
write_seqcount_end(>mems_allowed_seq);
+   local_irq_restore(flags);
task_unlock(current);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:core/locking] seqcount: Add lockdep functionality to seqcount/seqlock structures

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  1ca7d67cf5d5a2aef26a8d9afd789006fa098347
Gitweb: http://git.kernel.org/tip/1ca7d67cf5d5a2aef26a8d9afd789006fa098347
Author: John Stultz 
AuthorDate: Mon, 7 Oct 2013 15:51:59 -0700
Committer:  Ingo Molnar 
CommitDate: Wed, 6 Nov 2013 12:40:26 +0100

seqcount: Add lockdep functionality to seqcount/seqlock structures

Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.

I've also handled one case where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!

Signed-off-by: John Stultz 
Signed-off-by: Peter Zijlstra 
Cc: Eric Dumazet 
Cc: Li Zefan 
Cc: Mathieu Desnoyers 
Cc: Steven Rostedt 
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/vdso/vclock_gettime.c |  8 ++---
 fs/dcache.c|  4 +--
 fs/fs_struct.c |  2 +-
 include/linux/init_task.h  |  8 ++---
 include/linux/lockdep.h|  8 +++--
 include/linux/seqlock.h| 79 ++
 mm/filemap_xip.c   |  2 +-
 7 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 72074d5..2ada505 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct 
timespec *ts)
 
ts->tv_nsec = 0;
do {
-   seq = read_seqcount_begin(>seq);
+   seq = read_seqcount_begin_no_lockdep(>seq);
mode = gtod->clock.vclock_mode;
ts->tv_sec = gtod->wall_time_sec;
ns = gtod->wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
ts->tv_nsec = 0;
do {
-   seq = read_seqcount_begin(>seq);
+   seq = read_seqcount_begin_no_lockdep(>seq);
mode = gtod->clock.vclock_mode;
ts->tv_sec = gtod->monotonic_time_sec;
ns = gtod->monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin(>seq);
+   seq = read_seqcount_begin_no_lockdep(>seq);
ts->tv_sec = gtod->wall_time_coarse.tv_sec;
ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(>seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin(>seq);
+   seq = read_seqcount_begin_no_lockdep(>seq);
ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(>seq, seq)));
diff --git a/fs/dcache.c b/fs/dcache.c
index ae6ebb8..f750be2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2574,7 +2574,7 @@ static void __d_move(struct dentry * dentry, struct 
dentry * target)
dentry_lock_for_move(dentry, target);
 
write_seqcount_begin(>d_seq);
-   write_seqcount_begin(>d_seq);
+   write_seqcount_begin_nested(>d_seq, DENTRY_D_LOCK_NESTED);
 
/* __d_drop does write_seqcount_barrier, but they're OK to nest. */
 
@@ -2706,7 +2706,7 @@ static void __d_materialise_dentry(struct dentry *dentry, 
struct dentry *anon)
dentry_lock_for_move(anon, dentry);
 
write_seqcount_begin(>d_seq);
-   write_seqcount_begin(>d_seq);
+   write_seqcount_begin_nested(>d_seq, DENTRY_D_LOCK_NESTED);
 
dparent = dentry->d_parent;
 
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index d8ac61d..7dca743 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -161,6 +161,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
.users  = 1,
.lock   = __SPIN_LOCK_UNLOCKED(init_fs.lock),
-   .seq= SEQCNT_ZERO,
+   .seq= SEQCNT_ZERO(init_fs.seq),
.umask  = 0022,
 };
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 5cd0f09..b0ed422 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -32,10 +32,10 @@ extern struct fs_struct init_fs;
 #endif
 
 #ifdef CONFIG_CPUSETS
-#define INIT_CPUSET_SEQ
\
-   .mems_allowed_seq = SEQCNT_ZERO,
+#define INIT_CPUSET_SEQ(tsk)  

[tip:core/locking] seqcount: Add lockdep functionality to seqcount/seqlock structures

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  1ca7d67cf5d5a2aef26a8d9afd789006fa098347
Gitweb: http://git.kernel.org/tip/1ca7d67cf5d5a2aef26a8d9afd789006fa098347
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 7 Oct 2013 15:51:59 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 6 Nov 2013 12:40:26 +0100

seqcount: Add lockdep functionality to seqcount/seqlock structures

Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.

I've also handled one case where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!

Signed-off-by: John Stultz john.stu...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Cc: Eric Dumazet eric.duma...@gmail.com
Cc: Li Zefan lize...@huawei.com
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: David S. Miller da...@davemloft.net
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/vdso/vclock_gettime.c |  8 ++---
 fs/dcache.c|  4 +--
 fs/fs_struct.c |  2 +-
 include/linux/init_task.h  |  8 ++---
 include/linux/lockdep.h|  8 +++--
 include/linux/seqlock.h| 79 ++
 mm/filemap_xip.c   |  2 +-
 7 files changed, 90 insertions(+), 21 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 72074d5..2ada505 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct 
timespec *ts)
 
ts-tv_nsec = 0;
do {
-   seq = read_seqcount_begin(gtod-seq);
+   seq = read_seqcount_begin_no_lockdep(gtod-seq);
mode = gtod-clock.vclock_mode;
ts-tv_sec = gtod-wall_time_sec;
ns = gtod-wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
ts-tv_nsec = 0;
do {
-   seq = read_seqcount_begin(gtod-seq);
+   seq = read_seqcount_begin_no_lockdep(gtod-seq);
mode = gtod-clock.vclock_mode;
ts-tv_sec = gtod-monotonic_time_sec;
ns = gtod-monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin(gtod-seq);
+   seq = read_seqcount_begin_no_lockdep(gtod-seq);
ts-tv_sec = gtod-wall_time_coarse.tv_sec;
ts-tv_nsec = gtod-wall_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
unsigned long seq;
do {
-   seq = read_seqcount_begin(gtod-seq);
+   seq = read_seqcount_begin_no_lockdep(gtod-seq);
ts-tv_sec = gtod-monotonic_time_coarse.tv_sec;
ts-tv_nsec = gtod-monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(gtod-seq, seq)));
diff --git a/fs/dcache.c b/fs/dcache.c
index ae6ebb8..f750be2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2574,7 +2574,7 @@ static void __d_move(struct dentry * dentry, struct 
dentry * target)
dentry_lock_for_move(dentry, target);
 
write_seqcount_begin(dentry-d_seq);
-   write_seqcount_begin(target-d_seq);
+   write_seqcount_begin_nested(target-d_seq, DENTRY_D_LOCK_NESTED);
 
/* __d_drop does write_seqcount_barrier, but they're OK to nest. */
 
@@ -2706,7 +2706,7 @@ static void __d_materialise_dentry(struct dentry *dentry, 
struct dentry *anon)
dentry_lock_for_move(anon, dentry);
 
write_seqcount_begin(dentry-d_seq);
-   write_seqcount_begin(anon-d_seq);
+   write_seqcount_begin_nested(anon-d_seq, DENTRY_D_LOCK_NESTED);
 
dparent = dentry-d_parent;
 
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index d8ac61d..7dca743 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -161,6 +161,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
.users  = 1,
.lock   = __SPIN_LOCK_UNLOCKED(init_fs.lock),
-   .seq= SEQCNT_ZERO,
+   .seq= SEQCNT_ZERO(init_fs.seq),
.umask  = 0022,
 };
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 5cd0f09..b0ed422 100644
--- a/include/linux/init_task.h
+++ 

[tip:core/locking] cpuset: Fix potential deadlock w/ set_mems_allowed

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  db751fe3ea6880ff5ac5abe60cb7b80deb5a4140
Gitweb: http://git.kernel.org/tip/db751fe3ea6880ff5ac5abe60cb7b80deb5a4140
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 7 Oct 2013 15:52:00 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 6 Nov 2013 12:40:27 +0100

cpuset: Fix potential deadlock w/ set_mems_allowed

After adding lockdep support to seqlock/seqcount structures,
I started seeing the following warning:

[1.070907] ==
[1.072015] [ INFO: SOFTIRQ-safe - SOFTIRQ-unsafe lock order detected ]
[1.073181] 3.11.0+ #67 Not tainted
[1.073801] --
[1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[1.076088]  (p-mems_allowed_seq){+.+...}, at: [81187d7f] 
new_slab+0x5f/0x280
[1.077572]
[1.077572] and this task is already holding:
[1.078593]  ((q-__queue_lock)-rlock){..-...}, at: [81339f03] 
blk_execute_rq_nowait+0x53/0xf0
[1.080042] which would create a new lock dependency:
[1.080042]  ((q-__queue_lock)-rlock){..-...} - 
(p-mems_allowed_seq){+.+...}
[1.080042]
[1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock:
[1.080042]  ((q-__queue_lock)-rlock){..-...}
[1.080042] ... which became SOFTIRQ-irq-safe at:
[1.080042]   [810ec179] __lock_acquire+0x5b9/0x1db0
[1.080042]   [810edfe5] lock_acquire+0x95/0x130
[1.080042]   [818968a1] _raw_spin_lock+0x41/0x80
[1.080042]   [81560c9e] scsi_device_unbusy+0x7e/0xd0
[1.080042]   [8155a612] scsi_finish_command+0x32/0xf0
[1.080042]   [81560e91] scsi_softirq_done+0xa1/0x130
[1.080042]   [8133b0f3] blk_done_softirq+0x73/0x90
[1.080042]   [81095dc0] __do_softirq+0x110/0x2f0
[1.080042]   [81095fcd] run_ksoftirqd+0x2d/0x60
[1.080042]   [810bc506] smpboot_thread_fn+0x156/0x1e0
[1.080042]   [810b3916] kthread+0xd6/0xe0
[1.080042]   [818980ac] ret_from_fork+0x7c/0xb0
[1.080042]
[1.080042] to a SOFTIRQ-irq-unsafe lock:
[1.080042]  (p-mems_allowed_seq){+.+...}
[1.080042] ... which became SOFTIRQ-irq-unsafe at:
[1.080042] ...  [810ec1d3] __lock_acquire+0x613/0x1db0
[1.080042]   [810edfe5] lock_acquire+0x95/0x130
[1.080042]   [810b3df2] kthreadd+0x82/0x180
[1.080042]   [818980ac] ret_from_fork+0x7c/0xb0
[1.080042]
[1.080042] other info that might help us debug this:
[1.080042]
[1.080042]  Possible interrupt unsafe locking scenario:
[1.080042]
[1.080042]CPU0CPU1
[1.080042]
[1.080042]   lock(p-mems_allowed_seq);
[1.080042]local_irq_disable();
[1.080042]lock((q-__queue_lock)-rlock);
[1.080042]lock(p-mems_allowed_seq);
[1.080042]   Interrupt
[1.080042] lock((q-__queue_lock)-rlock);
[1.080042]
[1.080042]  *** DEADLOCK ***

The issue stems from the kthreadd() function calling set_mems_allowed
with irqs enabled. While its possibly unlikely for the actual deadlock
to trigger, a fix is fairly simple: disable irqs before taking the
mems_allowed_seq lock.

Signed-off-by: John Stultz john.stu...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Acked-by: Li Zefan lize...@huawei.com
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: David S. Miller da...@davemloft.net
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-4-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/cpuset.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index cc1b01c..3fe661f 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq)
 
 static inline void set_mems_allowed(nodemask_t nodemask)
 {
+   unsigned long flags;
+
task_lock(current);
+   local_irq_save(flags);
write_seqcount_begin(current-mems_allowed_seq);
current-mems_allowed = nodemask;
write_seqcount_end(current-mems_allowed_seq);
+   local_irq_restore(flags);
task_unlock(current);
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:core/locking] ipv6: Fix possible ipv6 seqlock deadlock

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b
Gitweb: http://git.kernel.org/tip/5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 7 Oct 2013 15:52:01 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 6 Nov 2013 12:40:28 +0100

ipv6: Fix possible ipv6 seqlock deadlock

While enabling lockdep on seqlocks, I ran across the warning below
caused by the ipv6 stats being updated in both irq and non-irq context.

This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
by Eric Dumazet) to resolve this problem.

[   11.120383] =
[   11.121024] [ INFO: inconsistent lock state ]
[   11.121663] 3.12.0-rc1+ #68 Not tainted
[   11.19] -
[   11.122867] inconsistent {SOFTIRQ-ON-W} - {IN-SOFTIRQ-W} usage.
[   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
[   11.124505]  (stats-syncp.seq#6){+.?...}, at: [c1ab80c2] 
ndisc_send_ns+0xe2/0x130
[   11.125736] {SOFTIRQ-ON-W} state was registered at:
[   11.126447]   [c10e0eb7] __lock_acquire+0x5c7/0x1af0
[   11.127222]   [c10e2996] lock_acquire+0x96/0xd0
[   11.127925]   [c1a9a2c3] write_seqcount_begin+0x33/0x40
[   11.128766]   [c1a9aa03] ip6_dst_lookup_tail+0x3a3/0x460
[   11.129582]   [c1a9e0ce] ip6_dst_lookup_flow+0x2e/0x80
[   11.130014]   [c1ad18e0] ip6_datagram_connect+0x150/0x4e0
[   11.130014]   [c1a4d0b5] inet_dgram_connect+0x25/0x70
[   11.130014]   [c198dd61] SYSC_connect+0xa1/0xc0
[   11.130014]   [c198f571] SyS_connect+0x11/0x20
[   11.130014]   [c198fe6b] SyS_socketcall+0x12b/0x300
[   11.130014]   [c1bbf880] syscall_call+0x7/0xb
[   11.130014] irq event stamp: 1184
[   11.130014] hardirqs last  enabled at (1184): [c1086901] 
local_bh_enable+0x71/0x110
[   11.130014] hardirqs last disabled at (1183): [c10868cd] 
local_bh_enable+0x3d/0x110
[   11.130014] softirqs last  enabled at (0): [c108014d] 
copy_process.part.42+0x45d/0x11a0
[   11.130014] softirqs last disabled at (1147): [c1086e05] irq_exit+0xa5/0xb0
[   11.130014]
[   11.130014] other info that might help us debug this:
[   11.130014]  Possible unsafe locking scenario:
[   11.130014]
[   11.130014]CPU0
[   11.130014]
[   11.130014]   lock(stats-syncp.seq#6);
[   11.130014]   Interrupt
[   11.130014] lock(stats-syncp.seq#6);
[   11.130014]
[   11.130014]  *** DEADLOCK ***
[   11.130014]
[   11.130014] 3 locks held by init/4483:
[   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [c109363c] 
SyS_setpriority+0x4c/0x620
[   11.130014]  #1:  (((ifa-dad_timer))){+.-...}, at: [c108c1c0] 
call_timer_fn+0x0/0xf0
[   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [c1ab6494] 
ndisc_send_skb+0x54/0x5d0
[   11.130014]
[   11.130014] stack backtrace:
[   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
[   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   11.130014]    c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 
c1ec1123
[   11.130014]  c1ec1484 1183   0001 0003 0001 

[   11.130014]  c1ec1484 0004 c5712dcc  c55e5c84 c10de492 0004 
c10755f2
[   11.130014] Call Trace:
[   11.130014]  [c1bb0e71] dump_stack+0x4b/0x66
[   11.130014]  [c1badf79] print_usage_bug+0x1d3/0x1dd
[   11.130014]  [c10de492] mark_lock+0x282/0x2f0
[   11.130014]  [c10755f2] ? kvm_clock_read+0x22/0x30
[   11.130014]  [c10dd8b0] ? check_usage_backwards+0x150/0x150
[   11.130014]  [c10e0e74] __lock_acquire+0x584/0x1af0
[   11.130014]  [c10b1baf] ? sched_clock_cpu+0xef/0x190
[   11.130014]  [c10de58c] ? mark_held_locks+0x8c/0xf0
[   11.130014]  [c10e2996] lock_acquire+0x96/0xd0
[   11.130014]  [c1ab80c2] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [c1ab66d3] ndisc_send_skb+0x293/0x5d0
[   11.130014]  [c1ab80c2] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [c1ab80c2] ndisc_send_ns+0xe2/0x130
[   11.130014]  [c108cc32] ? mod_timer+0xf2/0x160
[   11.130014]  [c1aa706e] ? addrconf_dad_timer+0xce/0x150
[   11.130014]  [c1aa70aa] addrconf_dad_timer+0x10a/0x150
[   11.130014]  [c1aa6fa0] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [c108c233] call_timer_fn+0x73/0xf0
[   11.130014]  [c108c1c0] ? __internal_add_timer+0xb0/0xb0
[   11.130014]  [c1aa6fa0] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [c108c5b1] run_timer_softirq+0x141/0x1e0
[   11.130014]  [c1086b20] ? __do_softirq+0x70/0x1b0
[   11.130014]  [c1086b70] __do_softirq+0xc0/0x1b0
[   11.130014]  [c1086e05] irq_exit+0xa5/0xb0
[   11.130014]  [c106cfd5] smp_apic_timer_interrupt+0x35/0x50
[   11.130014]  [c1bbfbca] apic_timer_interrupt+0x32/0x38
[   11.130014]  [c10936ed] ? SyS_setpriority+0xfd/0x620
[   11.130014]  [c10e26c9] ? lock_release+0x9/0x240
[   11.130014]  [c10936d7] ? SyS_setpriority+0xe7/0x620
[   11.130014]  [c1bbee6d] ? _raw_read_unlock+0x1d/0x30
[   11.130014]  [c1093701] SyS_setpriority+0x111/0x620
[   11.130014]  [c109363c] ? SyS_setpriority+0x4c/0x620
[   11.130014]  [c1bbf880] 

[tip:core/locking] net: Explicitly initialize u64_stats_sync structures for lockdep

2013-11-06 Thread tip-bot for John Stultz
Commit-ID:  827da44c61419f29ae3be198c342e2147f1a10cb
Gitweb: http://git.kernel.org/tip/827da44c61419f29ae3be198c342e2147f1a10cb
Author: John Stultz john.stu...@linaro.org
AuthorDate: Mon, 7 Oct 2013 15:51:58 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 6 Nov 2013 12:40:25 +0100

net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: Julian Anastasov j...@ssi.bg
Signed-off-by: Peter Zijlstra pet...@infradead.org
Cc: Alexey Kuznetsov kuz...@ms2.inr.ac.ru
Cc: David S. Miller da...@davemloft.net
Cc: Eric Dumazet eric.duma...@gmail.com
Cc: Hideaki YOSHIFUJI yoshf...@linux-ipv6.org
Cc: James Morris jmor...@namei.org
Cc: Jesse Gross je...@nicira.com
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Mirko Lindner mlind...@marvell.com
Cc: Patrick McHardy ka...@trash.net
Cc: Roger Luethi r...@hellgate.ch
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Simon Horman ho...@verge.net.au
Cc: Stephen Hemminger step...@networkplumber.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Petazzoni thomas.petazz...@free-electrons.com
Cc: Wensong Zhang wens...@linux-vs.org
Cc: net...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 drivers/net/dummy.c|  6 ++
 drivers/net/ethernet/emulex/benet/be_main.c|  4 
 drivers/net/ethernet/intel/igb/igb_main.c  |  5 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  4 
 drivers/net/ethernet/marvell/mvneta.c  |  3 +++
 drivers/net/ethernet/marvell/sky2.c|  3 +++
 drivers/net/ethernet/neterion/vxge/vxge-main.c |  4 
 drivers/net/ethernet/nvidia/forcedeth.c|  2 ++
 drivers/net/ethernet/realtek/8139too.c |  3 +++
 drivers/net/ethernet/tile/tilepro.c|  2 ++
 drivers/net/ethernet/via/via-rhine.c   |  3 +++
 drivers/net/ifb.c  |  5 +
 drivers/net/loopback.c |  6 ++
 drivers/net/macvlan.c  |  7 +++
 drivers/net/nlmon.c|  8 
 drivers/net/team/team.c|  6 ++
 drivers/net/team/team_mode_loadbalance.c   |  9 -
 drivers/net/veth.c |  8 
 drivers/net/virtio_net.c   |  8 
 drivers/net/vxlan.c|  8 
 drivers/net/xen-netfront.c |  6 ++
 include/linux/u64_stats_sync.h |  7 +++
 net/8021q/vlan_dev.c   |  9 -
 net/bridge/br_device.c |  7 +++
 net/ipv4/af_inet.c | 14 ++
 net/ipv4/ip_tunnel.c   |  8 +++-
 net/ipv6/addrconf.c| 14 ++
 net/ipv6/af_inet6.c| 14 ++
 net/ipv6/ip6_gre.c | 15 +++
 net/ipv6/ip6_tunnel.c  |  7 +++
 net/ipv6/sit.c | 15 +++
 net/netfilter/ipvs/ip_vs_ctl.c | 25 ++---
 net/openvswitch/datapath.c |  6 ++
 net/openvswitch/vport.c|  8 
 34 files changed, 253 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index b710c6b..bd8f84b 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -88,10 +88,16 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+   int i;
dev-dstats = alloc_percpu(struct pcpu_dstats);
if (!dev-dstats)
return -ENOMEM;
 
+   for_each_possible_cpu(i) {
+   struct pcpu_dstats *dstats;
+   dstats = per_cpu_ptr(dev-dstats, i);
+   u64_stats_init(dstats-syncp);
+   }
return 0;
 }
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 

[tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-12 Thread tip-bot for John Stultz
Commit-ID:  7bd36014460f793c19e7d6c94dab67b0afcfcb7f
Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f
Author: John Stultz 
AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 12 Sep 2013 07:49:51 +0200

timekeeping: Fix HRTICK related deadlock from ntp lock changes

Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&(>lock)->rlock){..-...}, at: [] 
__queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [] 
do_adjtimex+0x7f/0x100



[   15.850051] Chain exists of: &(>lock)->rlock --> >pi_lock --> 
timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]CPU0CPU1
[   15.850051]
[   15.850051]   lock(timekeeper_lock);
[   15.850051]lock(>pi_lock);
[   15.850051] lock(timekeeper_lock);
[   15.850051] lock(&(>lock)->rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto 
Tested-by: Lin Ming 
Signed-off-by: John Stultz 
Cc: Mathieu Desnoyers 
Cc: stable  #3.11, 3.10
Link: 
http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 include/linux/timex.h | 1 +
 kernel/time/ntp.c | 6 ++
 kernel/time/timekeeping.c | 2 ++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index b3726e6..dd3edd7 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
 
 int read_current_timer(unsigned long *timer_val);
+void ntp_notify_cmos_timer(void);
 
 /* The clock frequency of the i8253/i8254 PIT */
 #define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8f5b3b9..bb22151 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
schedule_delayed_work(_cmos_work, timespec_to_jiffies());
 }
 
-static void notify_cmos_timer(void)
+void ntp_notify_cmos_timer(void)
 {
schedule_delayed_work(_cmos_work, 0);
 }
 
 #else
-static inline void notify_cmos_timer(void) { }
+void ntp_notify_cmos_timer(void) { }
 #endif
 
 
@@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, 
s32 *time_tai)
if (!(time_status & STA_NANO))
txc->time.tv_usec /= NSEC_PER_USEC;
 
-   notify_cmos_timer();
-
return result;
 }
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 48b9fff..947ba25 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   ntp_notify_cmos_timer();
+
return ret;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-12 Thread tip-bot for John Stultz
Commit-ID:  7bd36014460f793c19e7d6c94dab67b0afcfcb7f
Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 12 Sep 2013 07:49:51 +0200

timekeeping: Fix HRTICK related deadlock from ntp lock changes

Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  ((pool-lock)-rlock){..-...}, at: [810aa9b5] 
__queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [810df6df] 
do_adjtimex+0x7f/0x100

snip

[   15.850051] Chain exists of: (pool-lock)-rlock -- p-pi_lock -- 
timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]CPU0CPU1
[   15.850051]
[   15.850051]   lock(timekeeper_lock);
[   15.850051]lock(p-pi_lock);
[   15.850051] lock(timekeeper_lock);
[   15.850051] lock((pool-lock)-rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4dc48451a (timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps) in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto gerlando.fala...@keymile.com
Tested-by: Lin Ming min...@gmail.com
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: stable sta...@vger.kernel.org #3.11, 3.10
Link: 
http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/timex.h | 1 +
 kernel/time/ntp.c | 6 ++
 kernel/time/timekeeping.c | 2 ++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index b3726e6..dd3edd7 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
 
 int read_current_timer(unsigned long *timer_val);
+void ntp_notify_cmos_timer(void);
 
 /* The clock frequency of the i8253/i8254 PIT */
 #define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8f5b3b9..bb22151 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
schedule_delayed_work(sync_cmos_work, timespec_to_jiffies(next));
 }
 
-static void notify_cmos_timer(void)
+void ntp_notify_cmos_timer(void)
 {
schedule_delayed_work(sync_cmos_work, 0);
 }
 
 #else
-static inline void notify_cmos_timer(void) { }
+void ntp_notify_cmos_timer(void) { }
 #endif
 
 
@@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, 
s32 *time_tai)
if (!(time_status  STA_NANO))
txc-time.tv_usec /= NSEC_PER_USEC;
 
-   notify_cmos_timer();
-
return result;
 }
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 48b9fff..947ba25 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(timekeeper_seq);
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 
+   ntp_notify_cmos_timer();
+
return ret;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-05-14 Thread tip-bot for John Stultz
Commit-ID:  b4f711ee03d28f776fd2324fd0bd999cc428e4d2
Gitweb: http://git.kernel.org/tip/b4f711ee03d28f776fd2324fd0bd999cc428e4d2
Author: John Stultz 
AuthorDate: Wed, 24 Apr 2013 11:32:56 -0700
Committer:  Thomas Gleixner 
CommitDate: Tue, 14 May 2013 20:54:06 +0200

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.

Reported-by: Kay Sievers 
Signed-off-by: John Stultz 
Cc: stable  #3.9
Cc: Feng Tang 
Cc: Jason Gunthorpe 
Link: 
http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 arch/x86/Kconfig | 1 -
 drivers/rtc/Kconfig  | 2 --
 include/linux/time.h | 4 
 kernel/time/Kconfig  | 5 -
 4 files changed, 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5db2117..45c4124 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -108,7 +108,6 @@ config X86
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && 
X86_LOCAL_APIC)
select GENERIC_TIME_VSYSCALL if X86_64
select KTIME_SCALAR if X86_32
-   select ALWAYS_USE_PERSISTENT_CLOCK
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select HAVE_CONTEXT_TRACKING if X86_64
diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 0c81915..b983813 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -20,7 +20,6 @@ if RTC_CLASS
 config RTC_HCTOSYS
bool "Set system time from RTC on startup and resume"
default y
-   depends on !ALWAYS_USE_PERSISTENT_CLOCK
help
  If you say yes here, the system time (wall clock) will be set using
  the value read from a specified RTC device. This is useful to avoid
@@ -29,7 +28,6 @@ config RTC_HCTOSYS
 config RTC_SYSTOHC
bool "Set the RTC time based on NTP synchronization"
default y
-   depends on !ALWAYS_USE_PERSISTENT_CLOCK
help
  If you say yes here, the system time (wall clock) will be stored
  in the RTC specified by RTC_HCTOSYS_DEVICE approximately every 11
diff --git a/include/linux/time.h b/include/linux/time.h
index 22d81b3..d5d229b 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -117,14 +117,10 @@ static inline bool timespec_valid_strict(const struct 
timespec *ts)
 
 extern bool persistent_clock_exist;
 
-#ifdef ALWAYS_USE_PERSISTENT_CLOCK
-#define has_persistent_clock() true
-#else
 static inline bool has_persistent_clock(void)
 {
return persistent_clock_exist;
 }
-#endif
 
 extern void read_persistent_clock(struct timespec *ts);
 extern void read_boot_clock(struct timespec *ts);
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 24510d8..b696922 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -12,11 +12,6 @@ config CLOCKSOURCE_WATCHDOG
 config ARCH_CLOCKSOURCE_DATA
bool
 
-# Platforms has a persistent clock
-config ALWAYS_USE_PERSISTENT_CLOCK
-   bool
-   default n
-
 # Timekeeping vsyscall support
 config GENERIC_TIME_VSYSCALL
bool
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-05-14 Thread tip-bot for John Stultz
Commit-ID:  b4f711ee03d28f776fd2324fd0bd999cc428e4d2
Gitweb: http://git.kernel.org/tip/b4f711ee03d28f776fd2324fd0bd999cc428e4d2
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 24 Apr 2013 11:32:56 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Tue, 14 May 2013 20:54:06 +0200

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.

Reported-by: Kay Sievers k...@vrfy.org
Signed-off-by: John Stultz john.stu...@linaro.org
Cc: stable sta...@vger.kernel.org #3.9
Cc: Feng Tang feng.t...@intel.com
Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
Link: 
http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 arch/x86/Kconfig | 1 -
 drivers/rtc/Kconfig  | 2 --
 include/linux/time.h | 4 
 kernel/time/Kconfig  | 5 -
 4 files changed, 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5db2117..45c4124 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -108,7 +108,6 @@ config X86
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32  
X86_LOCAL_APIC)
select GENERIC_TIME_VSYSCALL if X86_64
select KTIME_SCALAR if X86_32
-   select ALWAYS_USE_PERSISTENT_CLOCK
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select HAVE_CONTEXT_TRACKING if X86_64
diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 0c81915..b983813 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -20,7 +20,6 @@ if RTC_CLASS
 config RTC_HCTOSYS
bool Set system time from RTC on startup and resume
default y
-   depends on !ALWAYS_USE_PERSISTENT_CLOCK
help
  If you say yes here, the system time (wall clock) will be set using
  the value read from a specified RTC device. This is useful to avoid
@@ -29,7 +28,6 @@ config RTC_HCTOSYS
 config RTC_SYSTOHC
bool Set the RTC time based on NTP synchronization
default y
-   depends on !ALWAYS_USE_PERSISTENT_CLOCK
help
  If you say yes here, the system time (wall clock) will be stored
  in the RTC specified by RTC_HCTOSYS_DEVICE approximately every 11
diff --git a/include/linux/time.h b/include/linux/time.h
index 22d81b3..d5d229b 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -117,14 +117,10 @@ static inline bool timespec_valid_strict(const struct 
timespec *ts)
 
 extern bool persistent_clock_exist;
 
-#ifdef ALWAYS_USE_PERSISTENT_CLOCK
-#define has_persistent_clock() true
-#else
 static inline bool has_persistent_clock(void)
 {
return persistent_clock_exist;
 }
-#endif
 
 extern void read_persistent_clock(struct timespec *ts);
 extern void read_boot_clock(struct timespec *ts);
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 24510d8..b696922 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -12,11 +12,6 @@ config CLOCKSOURCE_WATCHDOG
 config ARCH_CLOCKSOURCE_DATA
bool
 
-# Platforms has a persistent clock
-config ALWAYS_USE_PERSISTENT_CLOCK
-   bool
-   default n
-
 # Timekeeping vsyscall support
 config GENERIC_TIME_VSYSCALL
bool
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Make sure to notify hrtimers when TAI offset changes

2013-04-11 Thread tip-bot for John Stultz
Commit-ID:  4e8f8b34b92b6514cc070aeb94d317cadd5071d7
Gitweb: http://git.kernel.org/tip/4e8f8b34b92b6514cc070aeb94d317cadd5071d7
Author: John Stultz 
AuthorDate: Wed, 10 Apr 2013 12:41:49 -0700
Committer:  Thomas Gleixner 
CommitDate: Thu, 11 Apr 2013 10:19:44 +0200

timekeeping: Make sure to notify hrtimers when TAI offset changes

Now that we have CLOCK_TAI timers, make sure we notify hrtimer
code when TAI offset is changed.

Signed-off-by: John Stultz 
Link: 
http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
---
 kernel/time/timekeeping.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c4d2a87..675f720 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -607,6 +607,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
__timekeeping_set_tai_offset(tk, tai_offset);
write_seqcount_end(_seq);
raw_spin_unlock_irqrestore(_lock, flags);
+   clock_was_set();
 }
 
 /**
@@ -1639,7 +1640,7 @@ int do_adjtimex(struct timex *txc)
struct timekeeper *tk = 
unsigned long flags;
struct timespec ts;
-   s32 tai;
+   s32 orig_tai, tai;
int ret;
 
/* Validate the data before disabling interrupts */
@@ -1663,10 +1664,13 @@ int do_adjtimex(struct timex *txc)
raw_spin_lock_irqsave(_lock, flags);
write_seqcount_begin(_seq);
 
-   tai = tk->tai_offset;
+   orig_tai = tai = tk->tai_offset;
ret = __do_adjtimex(txc, , );
 
-   __timekeeping_set_tai_offset(tk, tai);
+   if (tai != orig_tai) {
+   __timekeeping_set_tai_offset(tk, tai);
+   clock_was_set_delayed();
+   }
write_seqcount_end(_seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/core] timekeeping: Make sure to notify hrtimers when TAI offset changes

2013-04-11 Thread tip-bot for John Stultz
Commit-ID:  4e8f8b34b92b6514cc070aeb94d317cadd5071d7
Gitweb: http://git.kernel.org/tip/4e8f8b34b92b6514cc070aeb94d317cadd5071d7
Author: John Stultz john.stu...@linaro.org
AuthorDate: Wed, 10 Apr 2013 12:41:49 -0700
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Thu, 11 Apr 2013 10:19:44 +0200

timekeeping: Make sure to notify hrtimers when TAI offset changes

Now that we have CLOCK_TAI timers, make sure we notify hrtimer
code when TAI offset is changed.

Signed-off-by: John Stultz john.stu...@linaro.org
Link: 
http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 kernel/time/timekeeping.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c4d2a87..675f720 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -607,6 +607,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
__timekeeping_set_tai_offset(tk, tai_offset);
write_seqcount_end(timekeeper_seq);
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
+   clock_was_set();
 }
 
 /**
@@ -1639,7 +1640,7 @@ int do_adjtimex(struct timex *txc)
struct timekeeper *tk = timekeeper;
unsigned long flags;
struct timespec ts;
-   s32 tai;
+   s32 orig_tai, tai;
int ret;
 
/* Validate the data before disabling interrupts */
@@ -1663,10 +1664,13 @@ int do_adjtimex(struct timex *txc)
raw_spin_lock_irqsave(timekeeper_lock, flags);
write_seqcount_begin(timekeeper_seq);
 
-   tai = tk-tai_offset;
+   orig_tai = tai = tk-tai_offset;
ret = __do_adjtimex(txc, ts, tai);
 
-   __timekeeping_set_tai_offset(tk, tai);
+   if (tai != orig_tai) {
+   __timekeeping_set_tai_offset(tk, tai);
+   clock_was_set_delayed();
+   }
write_seqcount_end(timekeeper_seq);
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Fix timeekeping_get_ns overflow on 32bit systems

2012-09-14 Thread tip-bot for John Stultz
Commit-ID:  ec145babe754f9ea1079034a108104b6001e001c
Gitweb: http://git.kernel.org/tip/ec145babe754f9ea1079034a108104b6001e001c
Author: John Stultz 
AuthorDate: Tue, 11 Sep 2012 19:26:03 -0400
Committer:  Ingo Molnar 
CommitDate: Thu, 13 Sep 2012 17:39:14 +0200

time: Fix timeekeping_get_ns overflow on 32bit systems

Daniel Lezcano reported seeing multi-second stalls from
keyboard input on his T61 laptop when NOHZ and CPU_IDLE
were enabled on a 32bit kernel.

He bisected the problem down to commit
1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec").

After reproducing this issue, I narrowed the problem down
to the fact that timekeeping_get_ns() returns a 64bit
nsec value that hasn't been accumulated. In some cases
this value was being then stored in timespec.tv_nsec
(which is a long).

On 32bit systems, with idle times larger then 4 seconds
(or less, depending on the value of xtime_nsec), the
returned nsec value would overflow 32bits. This limited
kept time from increasing, causing timers to not expire.

The fix is to make sure we don't directly store the
result of timekeeping_get_ns() into a tv_nsec field,
instead using a 64bit nsec value which can then be
added into the timespec via timespec_add_ns().

Reported-and-bisected-by: Daniel Lezcano 
Tested-by: Daniel Lezcano 
Signed-off-by: John Stultz 
Acked-by: Prarit Bhargava 
Cc: Richard Cochran 
Link: 
http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34e5eac..d3b91e7 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts)
seq = read_seqbegin(>lock);
 
ts->tv_sec = tk->xtime_sec;
-   ts->tv_nsec = timekeeping_get_ns(tk);
+   nsecs = timekeeping_get_ns(tk);
 
} while (read_seqretry(>lock, seq));
 
+   ts->tv_nsec = 0;
timespec_add_ns(ts, nsecs);
 }
 EXPORT_SYMBOL(getnstimeofday);
@@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts)
 {
struct timekeeper *tk = 
struct timespec tomono;
+   s64 nsec;
unsigned int seq;
 
WARN_ON(timekeeping_suspended);
@@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts)
do {
seq = read_seqbegin(>lock);
ts->tv_sec = tk->xtime_sec;
-   ts->tv_nsec = timekeeping_get_ns(tk);
+   nsec = timekeeping_get_ns(tk);
tomono = tk->wall_to_monotonic;
 
} while (read_seqretry(>lock, seq));
 
-   set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
-   ts->tv_nsec + tomono.tv_nsec);
+   ts->tv_sec += tomono.tv_sec;
+   ts->tv_nsec = 0;
+   timespec_add_ns(ts, nsec + tomono.tv_nsec);
 }
 EXPORT_SYMBOL_GPL(ktime_get_ts);
 
@@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts)
 {
struct timekeeper *tk = 
struct timespec tomono, sleep;
+   s64 nsec;
unsigned int seq;
 
WARN_ON(timekeeping_suspended);
@@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts)
do {
seq = read_seqbegin(>lock);
ts->tv_sec = tk->xtime_sec;
-   ts->tv_nsec = timekeeping_get_ns(tk);
+   nsec = timekeeping_get_ns(tk);
tomono = tk->wall_to_monotonic;
sleep = tk->total_sleep_time;
 
} while (read_seqretry(>lock, seq));
 
-   set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec + sleep.tv_sec,
-   ts->tv_nsec + tomono.tv_nsec + sleep.tv_nsec);
+   ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
+   ts->tv_nsec = 0;
+   timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec);
 }
 EXPORT_SYMBOL_GPL(get_monotonic_boottime);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] time: Fix timeekeping_get_ns overflow on 32bit systems

2012-09-14 Thread tip-bot for John Stultz
Commit-ID:  ec145babe754f9ea1079034a108104b6001e001c
Gitweb: http://git.kernel.org/tip/ec145babe754f9ea1079034a108104b6001e001c
Author: John Stultz john.stu...@linaro.org
AuthorDate: Tue, 11 Sep 2012 19:26:03 -0400
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 13 Sep 2012 17:39:14 +0200

time: Fix timeekeping_get_ns overflow on 32bit systems

Daniel Lezcano reported seeing multi-second stalls from
keyboard input on his T61 laptop when NOHZ and CPU_IDLE
were enabled on a 32bit kernel.

He bisected the problem down to commit
1e75fa8be9fb6 (time: Condense timekeeper.xtime into xtime_sec).

After reproducing this issue, I narrowed the problem down
to the fact that timekeeping_get_ns() returns a 64bit
nsec value that hasn't been accumulated. In some cases
this value was being then stored in timespec.tv_nsec
(which is a long).

On 32bit systems, with idle times larger then 4 seconds
(or less, depending on the value of xtime_nsec), the
returned nsec value would overflow 32bits. This limited
kept time from increasing, causing timers to not expire.

The fix is to make sure we don't directly store the
result of timekeeping_get_ns() into a tv_nsec field,
instead using a 64bit nsec value which can then be
added into the timespec via timespec_add_ns().

Reported-and-bisected-by: Daniel Lezcano daniel.lezc...@linaro.org
Tested-by: Daniel Lezcano daniel.lezc...@linaro.org
Signed-off-by: John Stultz john.stu...@linaro.org
Acked-by: Prarit Bhargava pra...@redhat.com
Cc: Richard Cochran richardcoch...@gmail.com
Link: 
http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/time/timekeeping.c |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34e5eac..d3b91e7 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts)
seq = read_seqbegin(tk-lock);
 
ts-tv_sec = tk-xtime_sec;
-   ts-tv_nsec = timekeeping_get_ns(tk);
+   nsecs = timekeeping_get_ns(tk);
 
} while (read_seqretry(tk-lock, seq));
 
+   ts-tv_nsec = 0;
timespec_add_ns(ts, nsecs);
 }
 EXPORT_SYMBOL(getnstimeofday);
@@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts)
 {
struct timekeeper *tk = timekeeper;
struct timespec tomono;
+   s64 nsec;
unsigned int seq;
 
WARN_ON(timekeeping_suspended);
@@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts)
do {
seq = read_seqbegin(tk-lock);
ts-tv_sec = tk-xtime_sec;
-   ts-tv_nsec = timekeeping_get_ns(tk);
+   nsec = timekeeping_get_ns(tk);
tomono = tk-wall_to_monotonic;
 
} while (read_seqretry(tk-lock, seq));
 
-   set_normalized_timespec(ts, ts-tv_sec + tomono.tv_sec,
-   ts-tv_nsec + tomono.tv_nsec);
+   ts-tv_sec += tomono.tv_sec;
+   ts-tv_nsec = 0;
+   timespec_add_ns(ts, nsec + tomono.tv_nsec);
 }
 EXPORT_SYMBOL_GPL(ktime_get_ts);
 
@@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts)
 {
struct timekeeper *tk = timekeeper;
struct timespec tomono, sleep;
+   s64 nsec;
unsigned int seq;
 
WARN_ON(timekeeping_suspended);
@@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts)
do {
seq = read_seqbegin(tk-lock);
ts-tv_sec = tk-xtime_sec;
-   ts-tv_nsec = timekeeping_get_ns(tk);
+   nsec = timekeeping_get_ns(tk);
tomono = tk-wall_to_monotonic;
sleep = tk-total_sleep_time;
 
} while (read_seqretry(tk-lock, seq));
 
-   set_normalized_timespec(ts, ts-tv_sec + tomono.tv_sec + sleep.tv_sec,
-   ts-tv_nsec + tomono.tv_nsec + sleep.tv_nsec);
+   ts-tv_sec += tomono.tv_sec + sleep.tv_sec;
+   ts-tv_nsec = 0;
+   timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec);
 }
 EXPORT_SYMBOL_GPL(get_monotonic_boottime);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >