[tip:timers/urgent] time: Fix ktime_get_raw() incorrect base accumulation
Commit-ID: 0bcdc0987cce9880436b70836c6a92bb8e744fd1 Gitweb: http://git.kernel.org/tip/0bcdc0987cce9880436b70836c6a92bb8e744fd1 Author: John StultzAuthorDate: Fri, 25 Aug 2017 15:57:04 -0700 Committer: Thomas Gleixner CommitDate: Sat, 26 Aug 2017 16:06:12 +0200 time: Fix ktime_get_raw() incorrect base accumulation In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling"), the following code got mistakenly added to the update of the raw timekeeper: /* Update the monotonic raw base */ seconds = tk->raw_sec; nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the raw_sec value and the shifted down raw xtime_nsec to the base value. But the read function adds the shifted down tk->tkr_raw.xtime_nsec value another time, The result of this is that ktime_get_raw() users (which are all internal users) see the raw time move faster then it should (the rate at which can vary with the current size of tkr_raw.xtime_nsec), which has resulted in at least problems with graphics rendering performance. The change tried to match the monotonic base update logic: seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec); nsec = (u32) tk->wall_to_monotonic.tv_nsec; tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the wall_to_monotonic.tv_nsec value, but not the tk->tkr_mono.xtime_nsec value to the base. To fix this, simplify the tkr_raw.base accumulation to only accumulate the raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which will be added at read time. Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling") Reported-and-tested-by: Chris Wilson Signed-off-by: John Stultz Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Kevin Brodsky Cc: Richard Cochran Cc: Stephen Boyd Cc: Will Deacon Cc: Miroslav Lichvar Cc: Daniel Mentz Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stu...@linaro.org --- kernel/time/timekeeping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cedafa0..7e7e61c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper *tk) tk->ktime_sec = seconds; /* Update the monotonic raw base */ - seconds = tk->raw_sec; - nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); - tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); + tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); } /* must hold timekeeper_lock */
[tip:timers/urgent] time: Fix ktime_get_raw() incorrect base accumulation
Commit-ID: 0bcdc0987cce9880436b70836c6a92bb8e744fd1 Gitweb: http://git.kernel.org/tip/0bcdc0987cce9880436b70836c6a92bb8e744fd1 Author: John Stultz AuthorDate: Fri, 25 Aug 2017 15:57:04 -0700 Committer: Thomas Gleixner CommitDate: Sat, 26 Aug 2017 16:06:12 +0200 time: Fix ktime_get_raw() incorrect base accumulation In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling"), the following code got mistakenly added to the update of the raw timekeeper: /* Update the monotonic raw base */ seconds = tk->raw_sec; nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the raw_sec value and the shifted down raw xtime_nsec to the base value. But the read function adds the shifted down tk->tkr_raw.xtime_nsec value another time, The result of this is that ktime_get_raw() users (which are all internal users) see the raw time move faster then it should (the rate at which can vary with the current size of tkr_raw.xtime_nsec), which has resulted in at least problems with graphics rendering performance. The change tried to match the monotonic base update logic: seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec); nsec = (u32) tk->wall_to_monotonic.tv_nsec; tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); Which adds the wall_to_monotonic.tv_nsec value, but not the tk->tkr_mono.xtime_nsec value to the base. To fix this, simplify the tkr_raw.base accumulation to only accumulate the raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which will be added at read time. Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling") Reported-and-tested-by: Chris Wilson Signed-off-by: John Stultz Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Kevin Brodsky Cc: Richard Cochran Cc: Stephen Boyd Cc: Will Deacon Cc: Miroslav Lichvar Cc: Daniel Mentz Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stu...@linaro.org --- kernel/time/timekeeping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cedafa0..7e7e61c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper *tk) tk->ktime_sec = seconds; /* Update the monotonic raw base */ - seconds = tk->raw_sec; - nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift); - tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); + tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); } /* must hold timekeeper_lock */
[tip:timers/urgent] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Commit-ID: 3d88d56c5873f6eebe23e05c3da701960146b801 Gitweb: http://git.kernel.org/tip/3d88d56c5873f6eebe23e05c3da701960146b801 Author: John StultzAuthorDate: Thu, 8 Jun 2017 16:44:21 -0700 Committer: Thomas Gleixner CommitDate: Tue, 20 Jun 2017 10:41:50 +0200 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting Due to how the MONOTONIC_RAW accumulation logic was handled, there is the potential for a 1ns discontinuity when we do accumulations. This small discontinuity has for the most part gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW in their vDSO clock_gettime implementation, we've seen failures with the inconsistency-check test in kselftest. This patch addresses the issue by using the same sub-ns accumulation handling that CLOCK_MONOTONIC uses, which avoids the issue for in-kernel users. Since the ARM64 vDSO implementation has its own clock_gettime calculation logic, this patch reduces the frequency of errors, but failures are still seen. The ARM64 vDSO will need to be updated to include the sub-nanosecond xtime_nsec values in its calculation for this issue to be completely fixed. Signed-off-by: John Stultz Tested-by: Daniel Mentz Cc: Prarit Bhargava Cc: Kevin Brodsky Cc: Richard Cochran Cc: Stephen Boyd Cc: Will Deacon Cc: "stable #4 . 8+" Cc: Miroslav Lichvar Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeper_internal.h | 4 ++-- kernel/time/timekeeping.c | 19 ++- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e9834ad..f7043cc 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -57,7 +57,7 @@ struct tk_read_base { * interval. * @xtime_remainder: Shifted nano seconds left over when rounding * @cycle_interval - * @raw_interval: Raw nano seconds accumulated per NTP interval. + * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. * @ntp_error: Difference between accumulated time and NTP time in ntp * shifted nano seconds. * @ntp_error_shift: Shift conversion between clock shifted nano seconds and @@ -99,7 +99,7 @@ struct timekeeper { u64 cycle_interval; u64 xtime_interval; s64 xtime_remainder; - u32 raw_interval; + u64 raw_interval; /* The ntp_tick_length() value currently being used. * This cached copy ensures we consistently apply the tick * length for an entire tick, as ntp_tick_length may change diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index eff94cb..b602c48 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -280,7 +280,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) /* Go back from cycles -> shifted ns */ tk->xtime_interval = interval * clock->mult; tk->xtime_remainder = ntpinterval - tk->xtime_interval; - tk->raw_interval = (interval * clock->mult) >> clock->shift; + tk->raw_interval = interval * clock->mult; /* if changing clocks, convert xtime_nsec shift units */ if (old_clock) { @@ -1996,7 +1996,7 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, u32 shift, unsigned int *clock_set) { u64 interval = tk->cycle_interval << shift; - u64 raw_nsecs; + u64 snsec_per_sec; /* If the offset is smaller than a shifted interval, do nothing */ if (offset < interval) @@ -2011,14 +2011,15 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, *clock_set |= accumulate_nsecs_to_secs(tk); /* Accumulate raw time */ - raw_nsecs = (u64)tk->raw_interval << shift; - raw_nsecs += tk->raw_time.tv_nsec; - if (raw_nsecs >= NSEC_PER_SEC) { - u64 raw_secs = raw_nsecs; - raw_nsecs = do_div(raw_secs, NSEC_PER_SEC); - tk->raw_time.tv_sec += raw_secs; + tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift; + tk->tkr_raw.xtime_nsec += tk->raw_interval << shift; + snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift; + while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) { + tk->tkr_raw.xtime_nsec -= snsec_per_sec; + tk->raw_time.tv_sec++; } - tk->raw_time.tv_nsec =
[tip:timers/urgent] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Commit-ID: 3d88d56c5873f6eebe23e05c3da701960146b801 Gitweb: http://git.kernel.org/tip/3d88d56c5873f6eebe23e05c3da701960146b801 Author: John Stultz AuthorDate: Thu, 8 Jun 2017 16:44:21 -0700 Committer: Thomas Gleixner CommitDate: Tue, 20 Jun 2017 10:41:50 +0200 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting Due to how the MONOTONIC_RAW accumulation logic was handled, there is the potential for a 1ns discontinuity when we do accumulations. This small discontinuity has for the most part gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW in their vDSO clock_gettime implementation, we've seen failures with the inconsistency-check test in kselftest. This patch addresses the issue by using the same sub-ns accumulation handling that CLOCK_MONOTONIC uses, which avoids the issue for in-kernel users. Since the ARM64 vDSO implementation has its own clock_gettime calculation logic, this patch reduces the frequency of errors, but failures are still seen. The ARM64 vDSO will need to be updated to include the sub-nanosecond xtime_nsec values in its calculation for this issue to be completely fixed. Signed-off-by: John Stultz Tested-by: Daniel Mentz Cc: Prarit Bhargava Cc: Kevin Brodsky Cc: Richard Cochran Cc: Stephen Boyd Cc: Will Deacon Cc: "stable #4 . 8+" Cc: Miroslav Lichvar Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeper_internal.h | 4 ++-- kernel/time/timekeeping.c | 19 ++- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e9834ad..f7043cc 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -57,7 +57,7 @@ struct tk_read_base { * interval. * @xtime_remainder: Shifted nano seconds left over when rounding * @cycle_interval - * @raw_interval: Raw nano seconds accumulated per NTP interval. + * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. * @ntp_error: Difference between accumulated time and NTP time in ntp * shifted nano seconds. * @ntp_error_shift: Shift conversion between clock shifted nano seconds and @@ -99,7 +99,7 @@ struct timekeeper { u64 cycle_interval; u64 xtime_interval; s64 xtime_remainder; - u32 raw_interval; + u64 raw_interval; /* The ntp_tick_length() value currently being used. * This cached copy ensures we consistently apply the tick * length for an entire tick, as ntp_tick_length may change diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index eff94cb..b602c48 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -280,7 +280,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) /* Go back from cycles -> shifted ns */ tk->xtime_interval = interval * clock->mult; tk->xtime_remainder = ntpinterval - tk->xtime_interval; - tk->raw_interval = (interval * clock->mult) >> clock->shift; + tk->raw_interval = interval * clock->mult; /* if changing clocks, convert xtime_nsec shift units */ if (old_clock) { @@ -1996,7 +1996,7 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, u32 shift, unsigned int *clock_set) { u64 interval = tk->cycle_interval << shift; - u64 raw_nsecs; + u64 snsec_per_sec; /* If the offset is smaller than a shifted interval, do nothing */ if (offset < interval) @@ -2011,14 +2011,15 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, *clock_set |= accumulate_nsecs_to_secs(tk); /* Accumulate raw time */ - raw_nsecs = (u64)tk->raw_interval << shift; - raw_nsecs += tk->raw_time.tv_nsec; - if (raw_nsecs >= NSEC_PER_SEC) { - u64 raw_secs = raw_nsecs; - raw_nsecs = do_div(raw_secs, NSEC_PER_SEC); - tk->raw_time.tv_sec += raw_secs; + tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift; + tk->tkr_raw.xtime_nsec += tk->raw_interval << shift; + snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift; + while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) { + tk->tkr_raw.xtime_nsec -= snsec_per_sec; + tk->raw_time.tv_sec++; } - tk->raw_time.tv_nsec = raw_nsecs; + tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift; + tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift; /* Accumulate error between NTP and clock interval */ tk->ntp_error += tk->ntp_tick <<
[tip:timers/urgent] time: Fix clock->read(clock) race around clocksource changes
Commit-ID: ceea5e3771ed2378668455fa21861bead7504df5 Gitweb: http://git.kernel.org/tip/ceea5e3771ed2378668455fa21861bead7504df5 Author: John StultzAuthorDate: Thu, 8 Jun 2017 16:44:20 -0700 Committer: Thomas Gleixner CommitDate: Tue, 20 Jun 2017 10:41:50 +0200 time: Fix clock->read(clock) race around clocksource changes In tests, which excercise switching of clocksources, a NULL pointer dereference can be observed on AMR64 platforms in the clocksource read() function: u64 clocksource_mmio_readl_down(struct clocksource *c) { return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask; } This is called from the core timekeeping code via: cycle_now = tkr->read(tkr->clock); tkr->read is the cached tkr->clock->read() function pointer. When the clocksource is changed then tkr->clock and tkr->read are updated sequentially. The code above results in a sequential load operation of tkr->read and tkr->clock as well. If the store to tkr->clock hits between the loads of tkr->read and tkr->clock, then the old read() function is called with the new clock pointer. As a consequence the read() function dereferences a different data structure and the resulting 'reg' pointer can point anywhere including NULL. This problem was introduced when the timekeeping code was switched over to use struct tk_read_base. Before that, it was theoretically possible as well when the compiler decided to reload clock in the code sequence: now = tk->clock->read(tk->clock); Add a helper function which avoids the issue by reading tk_read_base->clock once into a local variable clk and then issue the read function via clk->read(clk). This guarantees that the read() function always gets the proper clocksource pointer handed in. Since there is now no use for the tkr.read pointer, this patch also removes it, and to address stopping the fast timekeeper during suspend/resume, it introduces a dummy clocksource to use rather then just a dummy read function. Signed-off-by: John Stultz Acked-by: Ingo Molnar Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: stable Cc: Miroslav Lichvar Cc: Daniel Mentz Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeper_internal.h | 1 - kernel/time/timekeeping.c | 52 + 2 files changed, 36 insertions(+), 17 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 110f453..e9834ad 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -29,7 +29,6 @@ */ struct tk_read_base { struct clocksource *clock; - u64 (*read)(struct clocksource *cs); u64 mask; u64 cycle_last; u32 mult; diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 9652bc5..eff94cb 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) tk->offs_boot = ktime_add(tk->offs_boot, delta); } +/* + * tk_clock_read - atomic clocksource read() helper + * + * This helper is necessary to use in the read paths because, while the + * seqlock ensures we don't return a bad value while structures are updated, + * it doesn't protect from potential crashes. There is the possibility that + * the tkr's clocksource may change between the read reference, and the + * clock reference passed to the read function. This can cause crashes if + * the wrong clocksource is passed to the wrong read function. + * This isn't necessary to use when holding the timekeeper_lock or doing + * a read of the fast-timekeeper tkrs (which is protected by its own locking + * and update logic). + */ +static inline u64 tk_clock_read(struct tk_read_base *tkr) +{ + struct clocksource *clock = READ_ONCE(tkr->clock); + + return clock->read(clock); +} + #ifdef CONFIG_DEBUG_TIMEKEEPING #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ @@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) */ do { seq = read_seqcount_begin(_core.seq); - now = tkr->read(tkr->clock); + now = tk_clock_read(tkr); last = tkr->cycle_last; mask = tkr->mask; max = tkr->clock->max_cycles; @@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) u64 cycle_now, delta; /* read clocksource */ - cycle_now =
[tip:timers/urgent] time: Fix clock->read(clock) race around clocksource changes
Commit-ID: ceea5e3771ed2378668455fa21861bead7504df5 Gitweb: http://git.kernel.org/tip/ceea5e3771ed2378668455fa21861bead7504df5 Author: John Stultz AuthorDate: Thu, 8 Jun 2017 16:44:20 -0700 Committer: Thomas Gleixner CommitDate: Tue, 20 Jun 2017 10:41:50 +0200 time: Fix clock->read(clock) race around clocksource changes In tests, which excercise switching of clocksources, a NULL pointer dereference can be observed on AMR64 platforms in the clocksource read() function: u64 clocksource_mmio_readl_down(struct clocksource *c) { return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask; } This is called from the core timekeeping code via: cycle_now = tkr->read(tkr->clock); tkr->read is the cached tkr->clock->read() function pointer. When the clocksource is changed then tkr->clock and tkr->read are updated sequentially. The code above results in a sequential load operation of tkr->read and tkr->clock as well. If the store to tkr->clock hits between the loads of tkr->read and tkr->clock, then the old read() function is called with the new clock pointer. As a consequence the read() function dereferences a different data structure and the resulting 'reg' pointer can point anywhere including NULL. This problem was introduced when the timekeeping code was switched over to use struct tk_read_base. Before that, it was theoretically possible as well when the compiler decided to reload clock in the code sequence: now = tk->clock->read(tk->clock); Add a helper function which avoids the issue by reading tk_read_base->clock once into a local variable clk and then issue the read function via clk->read(clk). This guarantees that the read() function always gets the proper clocksource pointer handed in. Since there is now no use for the tkr.read pointer, this patch also removes it, and to address stopping the fast timekeeper during suspend/resume, it introduces a dummy clocksource to use rather then just a dummy read function. Signed-off-by: John Stultz Acked-by: Ingo Molnar Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: stable Cc: Miroslav Lichvar Cc: Daniel Mentz Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeper_internal.h | 1 - kernel/time/timekeeping.c | 52 + 2 files changed, 36 insertions(+), 17 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 110f453..e9834ad 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -29,7 +29,6 @@ */ struct tk_read_base { struct clocksource *clock; - u64 (*read)(struct clocksource *cs); u64 mask; u64 cycle_last; u32 mult; diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 9652bc5..eff94cb 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) tk->offs_boot = ktime_add(tk->offs_boot, delta); } +/* + * tk_clock_read - atomic clocksource read() helper + * + * This helper is necessary to use in the read paths because, while the + * seqlock ensures we don't return a bad value while structures are updated, + * it doesn't protect from potential crashes. There is the possibility that + * the tkr's clocksource may change between the read reference, and the + * clock reference passed to the read function. This can cause crashes if + * the wrong clocksource is passed to the wrong read function. + * This isn't necessary to use when holding the timekeeper_lock or doing + * a read of the fast-timekeeper tkrs (which is protected by its own locking + * and update logic). + */ +static inline u64 tk_clock_read(struct tk_read_base *tkr) +{ + struct clocksource *clock = READ_ONCE(tkr->clock); + + return clock->read(clock); +} + #ifdef CONFIG_DEBUG_TIMEKEEPING #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ @@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) */ do { seq = read_seqcount_begin(_core.seq); - now = tkr->read(tkr->clock); + now = tk_clock_read(tkr); last = tkr->cycle_last; mask = tkr->mask; max = tkr->clock->max_cycles; @@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr) u64 cycle_now, delta; /* read clocksource */ - cycle_now = tkr->read(tkr->clock); + cycle_now = tk_clock_read(tkr); /* calculate the delta since the last update_wall_time */ delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); @@ -238,12 +258,10 @@ static void tk_setup_internals(struct
[tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression
Commit-ID: 58bfea9532552d422bde7afa207e1a0f08dffa7d Gitweb: http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d Author: John StultzAuthorDate: Tue, 4 Oct 2016 19:55:48 -0700 Committer: Thomas Gleixner CommitDate: Wed, 5 Oct 2016 15:44:46 +0200 timekeeping: Fix __ktime_get_fast_ns() regression In commit 27727df240c7 ("Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code the timekeeping_get_ns() function, but I forgot to include the unit conversion from cycles to nanoseconds, breaking the function's output, which impacts users like perf. This results in bogus perf timestamps like: swapper 0 [000] 253.427536: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426573: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426687: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426800: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426905: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427022: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427127: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427239: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427346: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427463: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 255.426572: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Instead of more reasonable expected timestamps like: swapper 0 [000]39.953768: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.064839: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.175956: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.287103: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.398217: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.509324: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.620437: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.731546: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.842654: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.953772: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]41.064881: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Add the proper use of timekeeping_delta_to_ns() to convert the cycle delta to nanoseconds as needed. Thanks to Brendan and Alexei for finding this quickly after the v4.8 release. Unfortunately the problematic commit has landed in some -stable trees so they'll need this fix as well. Many apologies for this mistake. I'll be looking to add a perf-clock sanity test to the kselftest timers tests soon. Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING" Reported-by: Brendan Gregg Reported-by: Alexei Starovoitov Tested-and-reviewed-by: Mathieu Desnoyers Signed-off-by: John Stultz Cc: Peter Zijlstra Cc: stable Cc: Steven Rostedt Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index e07fb09..37dec7e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) tkr = tkf->base + (seq & 0x01); now = ktime_to_ns(tkr->base); - now += clocksource_delta(tkr->read(tkr->clock), -tkr->cycle_last, tkr->mask); + now += timekeeping_delta_to_ns(tkr, +
[tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression
Commit-ID: 58bfea9532552d422bde7afa207e1a0f08dffa7d Gitweb: http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d Author: John Stultz AuthorDate: Tue, 4 Oct 2016 19:55:48 -0700 Committer: Thomas Gleixner CommitDate: Wed, 5 Oct 2016 15:44:46 +0200 timekeeping: Fix __ktime_get_fast_ns() regression In commit 27727df240c7 ("Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code the timekeeping_get_ns() function, but I forgot to include the unit conversion from cycles to nanoseconds, breaking the function's output, which impacts users like perf. This results in bogus perf timestamps like: swapper 0 [000] 253.427536: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426573: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426687: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426800: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426905: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427022: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427127: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427239: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427346: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427463: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 255.426572: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Instead of more reasonable expected timestamps like: swapper 0 [000]39.953768: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.064839: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.175956: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.287103: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.398217: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.509324: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.620437: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.731546: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.842654: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]40.953772: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000]41.064881: 1 cpu-clock: 810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Add the proper use of timekeeping_delta_to_ns() to convert the cycle delta to nanoseconds as needed. Thanks to Brendan and Alexei for finding this quickly after the v4.8 release. Unfortunately the problematic commit has landed in some -stable trees so they'll need this fix as well. Many apologies for this mistake. I'll be looking to add a perf-clock sanity test to the kselftest timers tests soon. Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING" Reported-by: Brendan Gregg Reported-by: Alexei Starovoitov Tested-and-reviewed-by: Mathieu Desnoyers Signed-off-by: John Stultz Cc: Peter Zijlstra Cc: stable Cc: Steven Rostedt Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index e07fb09..37dec7e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) tkr = tkf->base + (seq & 0x01); now = ktime_to_ns(tkr->base); - now += clocksource_delta(tkr->read(tkr->clock), -tkr->cycle_last, tkr->mask); + now += timekeeping_delta_to_ns(tkr, + clocksource_delta( + tkr->read(tkr->clock), + tkr->cycle_last, + tkr->mask)); } while (read_seqcount_retry(>seq, seq));
[tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
Commit-ID: 27727df240c7cc84f2ba6047c6f18d5addfd25ef Gitweb: http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef Author: John StultzAuthorDate: Tue, 23 Aug 2016 16:08:21 -0700 Committer: Thomas Gleixner CommitDate: Wed, 24 Aug 2016 09:34:31 +0200 timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING When I added some extra sanity checking in timekeeping_get_ns() under CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() method was using timekeeping_get_ns(). Thus the locking added to the debug checks broke the NMI-safety of __ktime_get_fast_ns(). This patch open-codes the timekeeping_get_ns() logic for __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed" Reported-by: Steven Rostedt Reported-by: Peter Zijlstra Signed-off-by: John Stultz Cc: stable Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 3b65746..e07fb09 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) do { seq = raw_read_seqcount_latch(>seq); tkr = tkf->base + (seq & 0x01); - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); + now = ktime_to_ns(tkr->base); + + now += clocksource_delta(tkr->read(tkr->clock), +tkr->cycle_last, tkr->mask); } while (read_seqcount_retry(>seq, seq)); return now;
[tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
Commit-ID: 27727df240c7cc84f2ba6047c6f18d5addfd25ef Gitweb: http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef Author: John Stultz AuthorDate: Tue, 23 Aug 2016 16:08:21 -0700 Committer: Thomas Gleixner CommitDate: Wed, 24 Aug 2016 09:34:31 +0200 timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING When I added some extra sanity checking in timekeeping_get_ns() under CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() method was using timekeeping_get_ns(). Thus the locking added to the debug checks broke the NMI-safety of __ktime_get_fast_ns(). This patch open-codes the timekeeping_get_ns() logic for __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed" Reported-by: Steven Rostedt Reported-by: Peter Zijlstra Signed-off-by: John Stultz Cc: stable Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 3b65746..e07fb09 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) do { seq = raw_read_seqcount_latch(>seq); tkr = tkf->base + (seq & 0x01); - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); + now = ktime_to_ns(tkr->base); + + now += clocksource_delta(tkr->read(tkr->clock), +tkr->cycle_last, tkr->mask); } while (read_seqcount_retry(>seq, seq)); return now;
[tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug
Commit-ID: a4f8f6667f099036c88f231dcad4cf233652c824 Gitweb: http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824 Author: John StultzAuthorDate: Tue, 23 Aug 2016 16:08:22 -0700 Committer: Thomas Gleixner CommitDate: Wed, 24 Aug 2016 09:34:32 +0200 timekeeping: Cap array access in timekeeping_debug It was reported that hibernation could fail on the 2nd attempt, where the system hangs at hibernate() -> syscore_resume() -> i8237A_resume() -> claim_dma_lock(), because the lock has already been taken. However there is actually no other process would like to grab this lock on that problematic platform. Further investigation showed that the problem is triggered by setting /sys/power/pm_trace to 1 before the 1st hibernation. Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend, and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller than 1970) to the release date of that motherboard during POST stage, thus after resumed, it may seem that the system had a significant long sleep time which is a completely meaningless value. Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the sleep time happened to be set to 1, fls() returns 32 and we add 1 to sleep_time_bin[32], which causes an out of bounds array access and therefor memory being overwritten. As depicted by System.map: 0x81c9d080 b sleep_time_bin 0x81c9d100 B dma_spin_lock the dma_spin_lock.val is set to 1, which caused this problem. This patch adds a sanity check in tk_debug_account_sleep_time() to ensure we don't index past the sleep_time_bin array. [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the issue slightly differently, but borrowed his excelent explanation of the issue here.] Fixes: 5c83545f24ab "power: Add option to log time spent in suspend" Reported-by: Janek Kozicki Reported-by: Chen Yu Signed-off-by: John Stultz Cc: linux...@vger.kernel.org Cc: Peter Zijlstra Cc: Xunlei Pang Cc: "Rafael J. Wysocki" Cc: stable Cc: Zhang Rui Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping_debug.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index f6bd652..107310a 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -23,7 +23,9 @@ #include "timekeeping_internal.h" -static unsigned int sleep_time_bin[32] = {0}; +#define NUM_BINS 32 + +static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_show_sleep_time(struct seq_file *s, void *data) { @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init); void tk_debug_account_sleep_time(struct timespec64 *t) { - sleep_time_bin[fls(t->tv_sec)]++; + /* Cap bin index so we don't overflow the array */ + int bin = min(fls(t->tv_sec), NUM_BINS-1); + + sleep_time_bin[bin]++; }
[tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug
Commit-ID: a4f8f6667f099036c88f231dcad4cf233652c824 Gitweb: http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824 Author: John Stultz AuthorDate: Tue, 23 Aug 2016 16:08:22 -0700 Committer: Thomas Gleixner CommitDate: Wed, 24 Aug 2016 09:34:32 +0200 timekeeping: Cap array access in timekeeping_debug It was reported that hibernation could fail on the 2nd attempt, where the system hangs at hibernate() -> syscore_resume() -> i8237A_resume() -> claim_dma_lock(), because the lock has already been taken. However there is actually no other process would like to grab this lock on that problematic platform. Further investigation showed that the problem is triggered by setting /sys/power/pm_trace to 1 before the 1st hibernation. Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend, and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller than 1970) to the release date of that motherboard during POST stage, thus after resumed, it may seem that the system had a significant long sleep time which is a completely meaningless value. Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the sleep time happened to be set to 1, fls() returns 32 and we add 1 to sleep_time_bin[32], which causes an out of bounds array access and therefor memory being overwritten. As depicted by System.map: 0x81c9d080 b sleep_time_bin 0x81c9d100 B dma_spin_lock the dma_spin_lock.val is set to 1, which caused this problem. This patch adds a sanity check in tk_debug_account_sleep_time() to ensure we don't index past the sleep_time_bin array. [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the issue slightly differently, but borrowed his excelent explanation of the issue here.] Fixes: 5c83545f24ab "power: Add option to log time spent in suspend" Reported-by: Janek Kozicki Reported-by: Chen Yu Signed-off-by: John Stultz Cc: linux...@vger.kernel.org Cc: Peter Zijlstra Cc: Xunlei Pang Cc: "Rafael J. Wysocki" Cc: stable Cc: Zhang Rui Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping_debug.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index f6bd652..107310a 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -23,7 +23,9 @@ #include "timekeeping_internal.h" -static unsigned int sleep_time_bin[32] = {0}; +#define NUM_BINS 32 + +static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_show_sleep_time(struct seq_file *s, void *data) { @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init); void tk_debug_account_sleep_time(struct timespec64 *t) { - sleep_time_bin[fls(t->tv_sec)]++; + /* Cap bin index so we don't overflow the array */ + int bin = min(fls(t->tv_sec), NUM_BINS-1); + + sleep_time_bin[bin]++; }
[tip:timers/urgent] time: Make settimeofday error checking work again
Commit-ID: dfc2507b26af22b0bbc85251b8545b36d8bc5d72 Gitweb: http://git.kernel.org/tip/dfc2507b26af22b0bbc85251b8545b36d8bc5d72 Author: John StultzAuthorDate: Wed, 1 Jun 2016 11:53:26 -0700 Committer: Thomas Gleixner CommitDate: Wed, 1 Jun 2016 21:13:43 +0200 time: Make settimeofday error checking work again In commit 86d3473224b0 some of the checking for a valid timeval was subtley changed which caused -EINVAL to be returned whenever the timeval was null. However, it is possible to set the timezone data while specifying a NULL timeval, which is usually done to handle systems where the RTC keeps local time instead of UTC. Thus the patch causes such systems to have the time incorrectly set. This patch addresses the issue by handling the error conditionals in the same way as was done previously. Fixes: 86d3473224b0 "time: Introduce do_sys_settimeofday64()" Reported-by: Mika Westerberg Signed-off-by: John Stultz Tested-by: Mika Westerberg Cc: Prarit Bhargava Cc: Arnd Bergmann Cc: Baolin Wang Cc: Richard Cochran Cc: Shuah Khan Link: http://lkml.kernel.org/r/1464807207-16530-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeping.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 37dbacf..816b754 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -21,6 +21,9 @@ static inline int do_sys_settimeofday(const struct timespec *tv, struct timespec64 ts64; if (!tv) + return do_sys_settimeofday64(NULL, tz); + + if (!timespec_valid(tv)) return -EINVAL; ts64 = timespec_to_timespec64(*tv);
[tip:timers/urgent] time: Make settimeofday error checking work again
Commit-ID: dfc2507b26af22b0bbc85251b8545b36d8bc5d72 Gitweb: http://git.kernel.org/tip/dfc2507b26af22b0bbc85251b8545b36d8bc5d72 Author: John Stultz AuthorDate: Wed, 1 Jun 2016 11:53:26 -0700 Committer: Thomas Gleixner CommitDate: Wed, 1 Jun 2016 21:13:43 +0200 time: Make settimeofday error checking work again In commit 86d3473224b0 some of the checking for a valid timeval was subtley changed which caused -EINVAL to be returned whenever the timeval was null. However, it is possible to set the timezone data while specifying a NULL timeval, which is usually done to handle systems where the RTC keeps local time instead of UTC. Thus the patch causes such systems to have the time incorrectly set. This patch addresses the issue by handling the error conditionals in the same way as was done previously. Fixes: 86d3473224b0 "time: Introduce do_sys_settimeofday64()" Reported-by: Mika Westerberg Signed-off-by: John Stultz Tested-by: Mika Westerberg Cc: Prarit Bhargava Cc: Arnd Bergmann Cc: Baolin Wang Cc: Richard Cochran Cc: Shuah Khan Link: http://lkml.kernel.org/r/1464807207-16530-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/timekeeping.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 37dbacf..816b754 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -21,6 +21,9 @@ static inline int do_sys_settimeofday(const struct timespec *tv, struct timespec64 ts64; if (!tv) + return do_sys_settimeofday64(NULL, tz); + + if (!timespec_valid(tv)) return -EINVAL; ts64 = timespec_to_timespec64(*tv);
[tip:timers/urgent] kselftests: timers: Add adjtimex SETOFFSET validity tests
Commit-ID: e03a58c320e1103ebe97bda8ebdfcc5c9829c53f Gitweb: http://git.kernel.org/tip/e03a58c320e1103ebe97bda8ebdfcc5c9829c53f Author: John Stultz AuthorDate: Thu, 21 Jan 2016 15:03:35 -0800 Committer: Thomas Gleixner CommitDate: Tue, 26 Jan 2016 16:26:06 +0100 kselftests: timers: Add adjtimex SETOFFSET validity tests Add some simple tests to check both valid and invalid offsets when using adjtimex's ADJ_SETOFFSET method. Signed-off-by: John Stultz Acked-by: Shuah Khan Cc: Sasha Levin Cc: Richard Cochran Cc: Prarit Bhargava Cc: Harald Hoyer Cc: Kay Sievers Cc: David Herrmann Link: http://lkml.kernel.org/r/1453417415-19110-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- tools/testing/selftests/timers/valid-adjtimex.c | 139 +++- 1 file changed, 138 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/valid-adjtimex.c b/tools/testing/selftests/timers/valid-adjtimex.c index e86d937..60fe3c5 100644 --- a/tools/testing/selftests/timers/valid-adjtimex.c +++ b/tools/testing/selftests/timers/valid-adjtimex.c @@ -45,7 +45,17 @@ static inline int ksft_exit_fail(void) } #endif -#define NSEC_PER_SEC 10L +#define NSEC_PER_SEC 10LL +#define USEC_PER_SEC 100LL + +#define ADJ_SETOFFSET 0x0100 + +#include +static int clock_adjtime(clockid_t id, struct timex *tx) +{ + return syscall(__NR_clock_adjtime, id, tx); +} + /* clear NTP time_status & time_state */ int clear_time_state(void) @@ -193,10 +203,137 @@ out: } +int set_offset(long long offset, int use_nano) +{ + struct timex tmx = {}; + int ret; + + tmx.modes = ADJ_SETOFFSET; + if (use_nano) { + tmx.modes |= ADJ_NANO; + + tmx.time.tv_sec = offset / NSEC_PER_SEC; + tmx.time.tv_usec = offset % NSEC_PER_SEC; + + if (offset < 0 && tmx.time.tv_usec) { + tmx.time.tv_sec -= 1; + tmx.time.tv_usec += NSEC_PER_SEC; + } + } else { + tmx.time.tv_sec = offset / USEC_PER_SEC; + tmx.time.tv_usec = offset % USEC_PER_SEC; + + if (offset < 0 && tmx.time.tv_usec) { + tmx.time.tv_sec -= 1; + tmx.time.tv_usec += USEC_PER_SEC; + } + } + + ret = clock_adjtime(CLOCK_REALTIME, ); + if (ret < 0) { + printf("(sec: %ld usec: %ld) ", tmx.time.tv_sec, tmx.time.tv_usec); + printf("[FAIL]\n"); + return -1; + } + return 0; +} + +int set_bad_offset(long sec, long usec, int use_nano) +{ + struct timex tmx = {}; + int ret; + + tmx.modes = ADJ_SETOFFSET; + if (use_nano) + tmx.modes |= ADJ_NANO; + + tmx.time.tv_sec = sec; + tmx.time.tv_usec = usec; + ret = clock_adjtime(CLOCK_REALTIME, ); + if (ret >= 0) { + printf("Invalid (sec: %ld usec: %ld) did not fail! ", tmx.time.tv_sec, tmx.time.tv_usec); + printf("[FAIL]\n"); + return -1; + } + return 0; +} + +int validate_set_offset(void) +{ + printf("Testing ADJ_SETOFFSET... "); + + /* Test valid values */ + if (set_offset(NSEC_PER_SEC - 1, 1)) + return -1; + + if (set_offset(-NSEC_PER_SEC + 1, 1)) + return -1; + + if (set_offset(-NSEC_PER_SEC - 1, 1)) + return -1; + + if (set_offset(5 * NSEC_PER_SEC, 1)) + return -1; + + if (set_offset(-5 * NSEC_PER_SEC, 1)) + return -1; + + if (set_offset(5 * NSEC_PER_SEC + NSEC_PER_SEC / 2, 1)) + return -1; + + if (set_offset(-5 * NSEC_PER_SEC - NSEC_PER_SEC / 2, 1)) + return -1; + + if (set_offset(USEC_PER_SEC - 1, 0)) + return -1; + + if (set_offset(-USEC_PER_SEC + 1, 0)) + return -1; + + if (set_offset(-USEC_PER_SEC - 1, 0)) + return -1; + + if (set_offset(5 * USEC_PER_SEC, 0)) + return -1; + + if (set_offset(-5 * USEC_PER_SEC, 0)) + return -1; + + if (set_offset(5 * USEC_PER_SEC + USEC_PER_SEC / 2, 0)) + return -1; + + if (set_offset(-5 * USEC_PER_SEC - USEC_PER_SEC / 2, 0)) + return -1; + + /* Test invalid values */ + if (set_bad_offset(0, -1, 1)) + return -1; + if (set_bad_offset(0, -1, 0)) + return -1; + if (set_bad_offset(0, 2 * NSEC_PER_SEC, 1)) + return -1; + if (set_bad_offset(0, 2 * USEC_PER_SEC, 0)) + return -1; + if (set_bad_offset(0, NSEC_PER_SEC, 1)) + return -1; + if (set_bad_offset(0, USEC_PER_SEC, 0)) + return -1; + if (set_bad_offset(0, -NSEC_PER_SEC, 1)) + return -1; + if (set_bad_offset(0, -USEC_PER_SEC, 0)) +
[tip:timers/urgent] kselftests: timers: Add adjtimex SETOFFSET validity tests
Commit-ID: e03a58c320e1103ebe97bda8ebdfcc5c9829c53f Gitweb: http://git.kernel.org/tip/e03a58c320e1103ebe97bda8ebdfcc5c9829c53f Author: John StultzAuthorDate: Thu, 21 Jan 2016 15:03:35 -0800 Committer: Thomas Gleixner CommitDate: Tue, 26 Jan 2016 16:26:06 +0100 kselftests: timers: Add adjtimex SETOFFSET validity tests Add some simple tests to check both valid and invalid offsets when using adjtimex's ADJ_SETOFFSET method. Signed-off-by: John Stultz Acked-by: Shuah Khan Cc: Sasha Levin Cc: Richard Cochran Cc: Prarit Bhargava Cc: Harald Hoyer Cc: Kay Sievers Cc: David Herrmann Link: http://lkml.kernel.org/r/1453417415-19110-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- tools/testing/selftests/timers/valid-adjtimex.c | 139 +++- 1 file changed, 138 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/valid-adjtimex.c b/tools/testing/selftests/timers/valid-adjtimex.c index e86d937..60fe3c5 100644 --- a/tools/testing/selftests/timers/valid-adjtimex.c +++ b/tools/testing/selftests/timers/valid-adjtimex.c @@ -45,7 +45,17 @@ static inline int ksft_exit_fail(void) } #endif -#define NSEC_PER_SEC 10L +#define NSEC_PER_SEC 10LL +#define USEC_PER_SEC 100LL + +#define ADJ_SETOFFSET 0x0100 + +#include +static int clock_adjtime(clockid_t id, struct timex *tx) +{ + return syscall(__NR_clock_adjtime, id, tx); +} + /* clear NTP time_status & time_state */ int clear_time_state(void) @@ -193,10 +203,137 @@ out: } +int set_offset(long long offset, int use_nano) +{ + struct timex tmx = {}; + int ret; + + tmx.modes = ADJ_SETOFFSET; + if (use_nano) { + tmx.modes |= ADJ_NANO; + + tmx.time.tv_sec = offset / NSEC_PER_SEC; + tmx.time.tv_usec = offset % NSEC_PER_SEC; + + if (offset < 0 && tmx.time.tv_usec) { + tmx.time.tv_sec -= 1; + tmx.time.tv_usec += NSEC_PER_SEC; + } + } else { + tmx.time.tv_sec = offset / USEC_PER_SEC; + tmx.time.tv_usec = offset % USEC_PER_SEC; + + if (offset < 0 && tmx.time.tv_usec) { + tmx.time.tv_sec -= 1; + tmx.time.tv_usec += USEC_PER_SEC; + } + } + + ret = clock_adjtime(CLOCK_REALTIME, ); + if (ret < 0) { + printf("(sec: %ld usec: %ld) ", tmx.time.tv_sec, tmx.time.tv_usec); + printf("[FAIL]\n"); + return -1; + } + return 0; +} + +int set_bad_offset(long sec, long usec, int use_nano) +{ + struct timex tmx = {}; + int ret; + + tmx.modes = ADJ_SETOFFSET; + if (use_nano) + tmx.modes |= ADJ_NANO; + + tmx.time.tv_sec = sec; + tmx.time.tv_usec = usec; + ret = clock_adjtime(CLOCK_REALTIME, ); + if (ret >= 0) { + printf("Invalid (sec: %ld usec: %ld) did not fail! ", tmx.time.tv_sec, tmx.time.tv_usec); + printf("[FAIL]\n"); + return -1; + } + return 0; +} + +int validate_set_offset(void) +{ + printf("Testing ADJ_SETOFFSET... "); + + /* Test valid values */ + if (set_offset(NSEC_PER_SEC - 1, 1)) + return -1; + + if (set_offset(-NSEC_PER_SEC + 1, 1)) + return -1; + + if (set_offset(-NSEC_PER_SEC - 1, 1)) + return -1; + + if (set_offset(5 * NSEC_PER_SEC, 1)) + return -1; + + if (set_offset(-5 * NSEC_PER_SEC, 1)) + return -1; + + if (set_offset(5 * NSEC_PER_SEC + NSEC_PER_SEC / 2, 1)) + return -1; + + if (set_offset(-5 * NSEC_PER_SEC - NSEC_PER_SEC / 2, 1)) + return -1; + + if (set_offset(USEC_PER_SEC - 1, 0)) + return -1; + + if (set_offset(-USEC_PER_SEC + 1, 0)) + return -1; + + if (set_offset(-USEC_PER_SEC - 1, 0)) + return -1; + + if (set_offset(5 * USEC_PER_SEC, 0)) + return -1; + + if (set_offset(-5 * USEC_PER_SEC, 0)) + return -1; + + if (set_offset(5 * USEC_PER_SEC + USEC_PER_SEC / 2, 0)) + return -1; + + if (set_offset(-5 * USEC_PER_SEC - USEC_PER_SEC / 2, 0)) + return -1; + + /* Test invalid values */ + if (set_bad_offset(0, -1, 1)) + return -1; + if (set_bad_offset(0, -1, 0)) + return -1; + if (set_bad_offset(0, 2 * NSEC_PER_SEC, 1)) + return -1; + if (set_bad_offset(0, 2 * USEC_PER_SEC, 0)) + return -1; + if (set_bad_offset(0,
[tip:timers/urgent] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
Commit-ID: dd4e17ab704269bce71402285f5e8b9ac24b1eff Gitweb: http://git.kernel.org/tip/dd4e17ab704269bce71402285f5e8b9ac24b1eff Author: John Stultz AuthorDate: Thu, 21 Jan 2016 15:03:34 -0800 Committer: Thomas Gleixner CommitDate: Fri, 22 Jan 2016 12:01:42 +0100 ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed was actually a timespec (as is signaled with ADJ_NANO). This resulted in that patch breaking ADJ_SETOFFSET users who set ADJ_NANO, by rejecting valid timespecs that were compared with valid timeval ranges. This patch addresses this by checking for the ADJ_NANO flag and using the timepsec check instead in that case. Reported-by: Harald Hoyer Reported-by: Kay Sievers Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow" Signed-off-by: John Stultz Cc: Sasha Levin Cc: Richard Cochran Cc: Prarit Bhargava Cc: David Herrmann Link: http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/ntp.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 36f2ca0..6df8927 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc) if (!capable(CAP_SYS_TIME)) return -EPERM; - if (!timeval_inject_offset_valid(>time)) - return -EINVAL; + if (txc->modes & ADJ_NANO) { + struct timespec ts; + + ts.tv_sec = txc->time.tv_sec; + ts.tv_nsec = txc->time.tv_usec; + if (!timespec_inject_offset_valid()) + return -EINVAL; + + } else { + if (!timeval_inject_offset_valid(>time)) + return -EINVAL; + } } /*
[tip:timers/urgent] ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
Commit-ID: dd4e17ab704269bce71402285f5e8b9ac24b1eff Gitweb: http://git.kernel.org/tip/dd4e17ab704269bce71402285f5e8b9ac24b1eff Author: John StultzAuthorDate: Thu, 21 Jan 2016 15:03:34 -0800 Committer: Thomas Gleixner CommitDate: Fri, 22 Jan 2016 12:01:42 +0100 ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed was actually a timespec (as is signaled with ADJ_NANO). This resulted in that patch breaking ADJ_SETOFFSET users who set ADJ_NANO, by rejecting valid timespecs that were compared with valid timeval ranges. This patch addresses this by checking for the ADJ_NANO flag and using the timepsec check instead in that case. Reported-by: Harald Hoyer Reported-by: Kay Sievers Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow" Signed-off-by: John Stultz Cc: Sasha Levin Cc: Richard Cochran Cc: Prarit Bhargava Cc: David Herrmann Link: http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/ntp.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 36f2ca0..6df8927 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -685,8 +685,18 @@ int ntp_validate_timex(struct timex *txc) if (!capable(CAP_SYS_TIME)) return -EPERM; - if (!timeval_inject_offset_valid(>time)) - return -EINVAL; + if (txc->modes & ADJ_NANO) { + struct timespec ts; + + ts.tv_sec = txc->time.tv_sec; + ts.tv_nsec = txc->time.tv_usec; + if (!timespec_inject_offset_valid()) + return -EINVAL; + + } else { + if (!timeval_inject_offset_valid(>time)) + return -EINVAL; + } } /*
[tip:timers/core] timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
Commit-ID: 6035519fcf5aa17084b41790cdc584d881d82c03 Gitweb: http://git.kernel.org/tip/6035519fcf5aa17084b41790cdc584d881d82c03 Author: John Stultz AuthorDate: Mon, 5 Oct 2015 18:16:57 -0700 Committer: Ingo Molnar CommitDate: Mon, 12 Oct 2015 09:51:34 +0200 timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments Recently a kernel side NTP bug was fixed via the following commit: 2619d7e9c92d ("time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()") When the bug was reported it was difficult to detect, except by tweaking the adjtimex tick value, and noticing how quickly the adjustment took: https://lkml.org/lkml/2015/9/1/488 Thus this patch introduces a new test which manipulates the adjtimex tick value and validates that the results are what we expect. Signed-off-by: John Stultz Cc: Linus Torvalds Cc: Miroslav Lichvar Cc: Nuno Gonçalves Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Shuah Khan Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444094217-20258-1-git-send-email-john.stu...@linaro.org [ Tidied up the code and the changelog a bit. ] Signed-off-by: Ingo Molnar --- tools/testing/selftests/timers/Makefile | 3 +- tools/testing/selftests/timers/adjtick.c | 221 +++ 2 files changed, 223 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile index 89a3f44..4a1be1b 100644 --- a/tools/testing/selftests/timers/Makefile +++ b/tools/testing/selftests/timers/Makefile @@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \ inconsistency-check raw_skew threadtest rtctest -TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \ +TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \ skew_consistency clocksource-switch leap-a-day \ leapcrash set-tai set-2038 @@ -24,6 +24,7 @@ include ../lib.mk run_destructive_tests: run_tests ./alarmtimer-suspend ./valid-adjtimex + ./adjtick ./change_skew ./skew_consistency ./clocksource-switch diff --git a/tools/testing/selftests/timers/adjtick.c b/tools/testing/selftests/timers/adjtick.c new file mode 100644 index 000..9887fd5 --- /dev/null +++ b/tools/testing/selftests/timers/adjtick.c @@ -0,0 +1,221 @@ +/* adjtimex() tick adjustment test + * by: John Stultz + * (C) Copyright Linaro Limited 2015 + * Licensed under the GPLv2 + * + * To build: + * $ gcc adjtick.c -o adjtick -lrt + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include +#include + +#ifdef KTEST +#include "../kselftest.h" +#else +static inline int ksft_exit_pass(void) +{ + exit(0); +} +static inline int ksft_exit_fail(void) +{ + exit(1); +} +#endif + +#define CLOCK_MONOTONIC_RAW4 + +#define NSEC_PER_SEC 10LL +#define USEC_PER_SEC 100 + +#define MILLION100 + +long systick; + +long long llabs(long long val) +{ + if (val < 0) + val = -val; + return val; +} + +unsigned long long ts_to_nsec(struct timespec ts) +{ + return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec; +} + +struct timespec nsec_to_ts(long long ns) +{ + struct timespec ts; + + ts.tv_sec = ns/NSEC_PER_SEC; + ts.tv_nsec = ns%NSEC_PER_SEC; + + return ts; +} + +long long diff_timespec(struct timespec start, struct timespec end) +{ + long long start_ns, end_ns; + + start_ns = ts_to_nsec(start); + end_ns = ts_to_nsec(end); + + return end_ns - start_ns; +} + +void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw) +{ + struct timespec start, mid, end; + long long diff = 0, tmp; + int i; + + clock_gettime(CLOCK_MONOTONIC, mon); + clock_gettime(CLOCK_MONOTONIC_RAW, raw); + + /* Try to get a more tightly bound pairing */ + for (i = 0; i < 3; i++) { + long long newdiff; + + clock_gettime(CLOCK_MONOTONIC, ); + clock_gettime(CLOCK_MONOTONIC_RAW, ); + clock_gettime(CLOCK_MONOTONIC, ); + + newdiff = diff_timespec(start, end); + if (diff == 0 || newdiff < diff) { + diff = newdiff; +
[tip:timers/core] timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
Commit-ID: 6035519fcf5aa17084b41790cdc584d881d82c03 Gitweb: http://git.kernel.org/tip/6035519fcf5aa17084b41790cdc584d881d82c03 Author: John StultzAuthorDate: Mon, 5 Oct 2015 18:16:57 -0700 Committer: Ingo Molnar CommitDate: Mon, 12 Oct 2015 09:51:34 +0200 timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments Recently a kernel side NTP bug was fixed via the following commit: 2619d7e9c92d ("time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()") When the bug was reported it was difficult to detect, except by tweaking the adjtimex tick value, and noticing how quickly the adjustment took: https://lkml.org/lkml/2015/9/1/488 Thus this patch introduces a new test which manipulates the adjtimex tick value and validates that the results are what we expect. Signed-off-by: John Stultz Cc: Linus Torvalds Cc: Miroslav Lichvar Cc: Nuno Gonçalves Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Shuah Khan Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444094217-20258-1-git-send-email-john.stu...@linaro.org [ Tidied up the code and the changelog a bit. ] Signed-off-by: Ingo Molnar --- tools/testing/selftests/timers/Makefile | 3 +- tools/testing/selftests/timers/adjtick.c | 221 +++ 2 files changed, 223 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile index 89a3f44..4a1be1b 100644 --- a/tools/testing/selftests/timers/Makefile +++ b/tools/testing/selftests/timers/Makefile @@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \ inconsistency-check raw_skew threadtest rtctest -TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \ +TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \ skew_consistency clocksource-switch leap-a-day \ leapcrash set-tai set-2038 @@ -24,6 +24,7 @@ include ../lib.mk run_destructive_tests: run_tests ./alarmtimer-suspend ./valid-adjtimex + ./adjtick ./change_skew ./skew_consistency ./clocksource-switch diff --git a/tools/testing/selftests/timers/adjtick.c b/tools/testing/selftests/timers/adjtick.c new file mode 100644 index 000..9887fd5 --- /dev/null +++ b/tools/testing/selftests/timers/adjtick.c @@ -0,0 +1,221 @@ +/* adjtimex() tick adjustment test + * by: John Stultz + * (C) Copyright Linaro Limited 2015 + * Licensed under the GPLv2 + * + * To build: + * $ gcc adjtick.c -o adjtick -lrt + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include +#include + +#ifdef KTEST +#include "../kselftest.h" +#else +static inline int ksft_exit_pass(void) +{ + exit(0); +} +static inline int ksft_exit_fail(void) +{ + exit(1); +} +#endif + +#define CLOCK_MONOTONIC_RAW4 + +#define NSEC_PER_SEC 10LL +#define USEC_PER_SEC 100 + +#define MILLION100 + +long systick; + +long long llabs(long long val) +{ + if (val < 0) + val = -val; + return val; +} + +unsigned long long ts_to_nsec(struct timespec ts) +{ + return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec; +} + +struct timespec nsec_to_ts(long long ns) +{ + struct timespec ts; + + ts.tv_sec = ns/NSEC_PER_SEC; + ts.tv_nsec = ns%NSEC_PER_SEC; + + return ts; +} + +long long diff_timespec(struct timespec start, struct timespec end) +{ + long long start_ns, end_ns; + + start_ns = ts_to_nsec(start); + end_ns = ts_to_nsec(end); + + return end_ns - start_ns; +} + +void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw) +{ + struct timespec start, mid, end; + long long diff = 0, tmp; + int i; + + clock_gettime(CLOCK_MONOTONIC, mon); + clock_gettime(CLOCK_MONOTONIC_RAW, raw); + + /* Try to get a more tightly bound pairing */ + for (i = 0; i < 3; i++) { + long long newdiff; + +
[tip:timers/urgent] clocksource: Fix abs() usage w/ 64bit values
Commit-ID: 67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c Gitweb: http://git.kernel.org/tip/67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c Author: John Stultz AuthorDate: Mon, 14 Sep 2015 18:05:20 -0700 Committer: Thomas Gleixner CommitDate: Fri, 2 Oct 2015 22:53:01 +0200 clocksource: Fix abs() usage w/ 64bit values This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/clocksource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 841b72f..3a38775 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data) continue; /* Check the deviation from the watchdog clocksource. */ - if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) { + if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) { pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n", cs->name); pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n", -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] clocksource: Fix abs() usage w/ 64bit values
Commit-ID: 67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c Gitweb: http://git.kernel.org/tip/67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c Author: John StultzAuthorDate: Mon, 14 Sep 2015 18:05:20 -0700 Committer: Thomas Gleixner CommitDate: Fri, 2 Oct 2015 22:53:01 +0200 clocksource: Fix abs() usage w/ 64bit values This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/clocksource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 841b72f..3a38775 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data) continue; /* Check the deviation from the watchdog clocksource. */ - if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) { + if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) { pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n", cs->name); pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n", -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()
Commit-ID: 2619d7e9c92d524cb155ec89fd72875321512e5b Gitweb: http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b Author: John Stultz AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700 Committer: Ingo Molnar CommitDate: Sun, 13 Sep 2015 10:30:47 +0200 time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() The internal clocksteering done for fine-grained error correction uses a logarithmic approximation, so any time adjtimex() adjusts the clock steering, timekeeping_freqadjust() quickly approximates the correct clock frequency over a series of ticks. Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit: dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz") used the abs() function with a s64 error value to calculate the size of the approximated adjustment to be made. Per include/linux/kernel.h: "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()". Thus on 32-bit platforms, this resulted in the clocksteering to take a quite dampended random walk trying to converge on the proper frequency, which caused the adjustments to be made much slower then intended (most easily observed when large adjustments are made). This patch fixes the issue by using abs64() instead. Reported-by: Nuno Gonçalves Tested-by: Nuno Goncalves Signed-off-by: John Stultz Cc: # v3.17+ Cc: Linus Torvalds Cc: Miroslav Lichvar Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index f6ee2e6..3739ac6 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, negative = (tick_error < 0); /* Sort out the magnitude of the correction */ - tick_error = abs(tick_error); + tick_error = abs64(tick_error); for (adj = 0; tick_error > interval; adj++) tick_error >>= 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()
Commit-ID: 2619d7e9c92d524cb155ec89fd72875321512e5b Gitweb: http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b Author: John StultzAuthorDate: Wed, 9 Sep 2015 16:07:30 -0700 Committer: Ingo Molnar CommitDate: Sun, 13 Sep 2015 10:30:47 +0200 time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() The internal clocksteering done for fine-grained error correction uses a logarithmic approximation, so any time adjtimex() adjusts the clock steering, timekeeping_freqadjust() quickly approximates the correct clock frequency over a series of ticks. Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit: dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz") used the abs() function with a s64 error value to calculate the size of the approximated adjustment to be made. Per include/linux/kernel.h: "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()". Thus on 32-bit platforms, this resulted in the clocksteering to take a quite dampended random walk trying to converge on the proper frequency, which caused the adjustments to be made much slower then intended (most easily observed when large adjustments are made). This patch fixes the issue by using abs64() instead. Reported-by: Nuno Gonçalves Tested-by: Nuno Goncalves Signed-off-by: John Stultz Cc: # v3.17+ Cc: Linus Torvalds Cc: Miroslav Lichvar Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index f6ee2e6..3739ac6 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, negative = (tick_error < 0); /* Sort out the magnitude of the correction */ - tick_error = abs(tick_error); + tick_error = abs64(tick_error); for (adj = 0; tick_error > interval; adj++) tick_error >>= 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] selftest: Timers: Avoid signal deadlock in leap-a-day
Commit-ID: 51a16c1e887a5975ada27a3ae935a4f2783005da Gitweb: http://git.kernel.org/tip/51a16c1e887a5975ada27a3ae935a4f2783005da Author: John Stultz AuthorDate: Wed, 17 Jun 2015 11:16:43 -0700 Committer: Thomas Gleixner CommitDate: Thu, 18 Jun 2015 15:28:14 +0200 selftest: Timers: Avoid signal deadlock in leap-a-day In 0c4a5fc95b1df (Add leap-second timer edge testing to leap-a-day.c), we added a timer to the test which checks to make sure timers near the leapsecond edge behave correctly. However, the output generated from the timer uses ctime_r, which isn't async-signal safe, and should that signal land while the main test is using ctime_r to print its output, its possible for the test to deadlock on glibc internal locks. Thus this patch reworks the output to avoid using ctime_r in the signal handler. Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Daniel Bristot de Oliveira Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Shuah Khan Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- tools/testing/selftests/timers/leap-a-day.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c index 331c4f7..fb46ad6 100644 --- a/tools/testing/selftests/timers/leap-a-day.c +++ b/tools/testing/selftests/timers/leap-a-day.c @@ -141,27 +141,28 @@ void handler(int unused) void sigalarm(int signo) { struct timex tx; - char buf[26]; int ret; tx.modes = 0; ret = adjtimex(); - ctime_r(_sec, buf); - buf[strlen(buf)-1] = 0; /*remove trailing\n */ - printf("%s + %6ld us (%i)\t%s - TIMER FIRED\n", - buf, + if (tx.time.tv_sec < next_leap) { + printf("Error: Early timer expiration! (Should be %ld)\n", next_leap); + error_found = 1; + printf("adjtimex: %10ld sec + %6ld us (%i)\t%s\n", + tx.time.tv_sec, tx.time.tv_usec, tx.tai, time_state_str(ret)); - - if (tx.time.tv_sec < next_leap) { - printf("Error: Early timer expiration!\n"); - error_found = 1; } if (ret != TIME_WAIT) { - printf("Error: Incorrect NTP state?\n"); + printf("Error: Timer seeing incorrect NTP state? (Should be TIME_WAIT)\n"); error_found = 1; + printf("adjtimex: %10ld sec + %6ld us (%i)\t%s\n", + tx.time.tv_sec, + tx.time.tv_usec, + tx.tai, + time_state_str(ret)); } } @@ -297,7 +298,7 @@ int main(int argc, char **argv) printf("Scheduling leap second for %s", ctime(_leap)); /* Set up timer */ - printf("Setting timer for %s", ctime(_leap)); + printf("Setting timer for %ld - %s", next_leap, ctime(_leap)); memset(, 0, sizeof(se)); se.sigev_notify = SIGEV_SIGNAL; se.sigev_signo = signum; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Copy the shadow-timekeeper over the real timekeeper last
Commit-ID: 906c55579a6360dd9ef5a3101bb2e3ae396dfb97 Gitweb: http://git.kernel.org/tip/906c55579a6360dd9ef5a3101bb2e3ae396dfb97 Author: John Stultz AuthorDate: Wed, 17 Jun 2015 10:05:53 -0700 Committer: Thomas Gleixner CommitDate: Thu, 18 Jun 2015 09:27:02 +0200 timekeeping: Copy the shadow-timekeeper over the real timekeeper last The fix in d151832650ed9 (time: Move clock_was_set_seq update before updating shadow-timekeeper) was unfortunately incomplete. The main gist of that change was to do the shadow-copy update last, so that any state changes were properly duplicated, and we wouldn't accidentally have stale data in the shadow. Unfortunately in the main update_wall_time() logic, we update use the shadow-timekeeper to calculate the next update values, then while holding the lock, copy the shadow-timekeeper over, then call timekeeping_update() to do some additional bookkeeping, (skipping the shadow mirror). The bug with this is the additional bookkeeping isn't all read-only, and some changes timkeeper state. Thus we might then overwrite this state change on the next update. To avoid this problem, do the timekeeping_update() on the shadow-timekeeper prior to copying the full state over to the real-timekeeper. This avoids problems with both the clock_was_set_seq and next_leap_ktime being overwritten and possibly the fast-timekeepers as well. Many thanks to Prarit for his rigorous testing, which discovered this problem, along with Prarit and Daniel's work validating this fix. Reported-by: Prarit Bhargava Tested-by: Prarit Bhargava Tested-by: Daniel Bristot de Oliveira Signed-off-by: John Stultz Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 5d67ffb..30b7a40 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1853,8 +1853,9 @@ void update_wall_time(void) * memcpy under the tk_core.seq against one before we start * updating. */ + timekeeping_update(tk, clock_set); memcpy(real_tk, tk, sizeof(*tk)); - timekeeping_update(real_tk, clock_set); + /* The memcpy must come last. Do not put anything here! */ write_seqcount_end(_core.seq); out: raw_spin_unlock_irqrestore(_lock, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] selftest: Timers: Avoid signal deadlock in leap-a-day
Commit-ID: 51a16c1e887a5975ada27a3ae935a4f2783005da Gitweb: http://git.kernel.org/tip/51a16c1e887a5975ada27a3ae935a4f2783005da Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 17 Jun 2015 11:16:43 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Thu, 18 Jun 2015 15:28:14 +0200 selftest: Timers: Avoid signal deadlock in leap-a-day In 0c4a5fc95b1df (Add leap-second timer edge testing to leap-a-day.c), we added a timer to the test which checks to make sure timers near the leapsecond edge behave correctly. However, the output generated from the timer uses ctime_r, which isn't async-signal safe, and should that signal land while the main test is using ctime_r to print its output, its possible for the test to deadlock on glibc internal locks. Thus this patch reworks the output to avoid using ctime_r in the signal handler. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Prarit Bhargava pra...@redhat.com Cc: Daniel Bristot de Oliveira bris...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Shuah Khan shua...@osg.samsung.com Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- tools/testing/selftests/timers/leap-a-day.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c index 331c4f7..fb46ad6 100644 --- a/tools/testing/selftests/timers/leap-a-day.c +++ b/tools/testing/selftests/timers/leap-a-day.c @@ -141,27 +141,28 @@ void handler(int unused) void sigalarm(int signo) { struct timex tx; - char buf[26]; int ret; tx.modes = 0; ret = adjtimex(tx); - ctime_r(tx.time.tv_sec, buf); - buf[strlen(buf)-1] = 0; /*remove trailing\n */ - printf(%s + %6ld us (%i)\t%s - TIMER FIRED\n, - buf, + if (tx.time.tv_sec next_leap) { + printf(Error: Early timer expiration! (Should be %ld)\n, next_leap); + error_found = 1; + printf(adjtimex: %10ld sec + %6ld us (%i)\t%s\n, + tx.time.tv_sec, tx.time.tv_usec, tx.tai, time_state_str(ret)); - - if (tx.time.tv_sec next_leap) { - printf(Error: Early timer expiration!\n); - error_found = 1; } if (ret != TIME_WAIT) { - printf(Error: Incorrect NTP state?\n); + printf(Error: Timer seeing incorrect NTP state? (Should be TIME_WAIT)\n); error_found = 1; + printf(adjtimex: %10ld sec + %6ld us (%i)\t%s\n, + tx.time.tv_sec, + tx.time.tv_usec, + tx.tai, + time_state_str(ret)); } } @@ -297,7 +298,7 @@ int main(int argc, char **argv) printf(Scheduling leap second for %s, ctime(next_leap)); /* Set up timer */ - printf(Setting timer for %s, ctime(next_leap)); + printf(Setting timer for %ld - %s, next_leap, ctime(next_leap)); memset(se, 0, sizeof(se)); se.sigev_notify = SIGEV_SIGNAL; se.sigev_signo = signum; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Copy the shadow-timekeeper over the real timekeeper last
Commit-ID: 906c55579a6360dd9ef5a3101bb2e3ae396dfb97 Gitweb: http://git.kernel.org/tip/906c55579a6360dd9ef5a3101bb2e3ae396dfb97 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 17 Jun 2015 10:05:53 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Thu, 18 Jun 2015 09:27:02 +0200 timekeeping: Copy the shadow-timekeeper over the real timekeeper last The fix in d151832650ed9 (time: Move clock_was_set_seq update before updating shadow-timekeeper) was unfortunately incomplete. The main gist of that change was to do the shadow-copy update last, so that any state changes were properly duplicated, and we wouldn't accidentally have stale data in the shadow. Unfortunately in the main update_wall_time() logic, we update use the shadow-timekeeper to calculate the next update values, then while holding the lock, copy the shadow-timekeeper over, then call timekeeping_update() to do some additional bookkeeping, (skipping the shadow mirror). The bug with this is the additional bookkeeping isn't all read-only, and some changes timkeeper state. Thus we might then overwrite this state change on the next update. To avoid this problem, do the timekeeping_update() on the shadow-timekeeper prior to copying the full state over to the real-timekeeper. This avoids problems with both the clock_was_set_seq and next_leap_ktime being overwritten and possibly the fast-timekeepers as well. Many thanks to Prarit for his rigorous testing, which discovered this problem, along with Prarit and Daniel's work validating this fix. Reported-by: Prarit Bhargava pra...@redhat.com Tested-by: Prarit Bhargava pra...@redhat.com Tested-by: Daniel Bristot de Oliveira bris...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/timekeeping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 5d67ffb..30b7a40 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1853,8 +1853,9 @@ void update_wall_time(void) * memcpy under the tk_core.seq against one before we start * updating. */ + timekeeping_update(tk, clock_set); memcpy(real_tk, tk, sizeof(*tk)); - timekeeping_update(real_tk, clock_set); + /* The memcpy must come last. Do not put anything here! */ write_seqcount_end(tk_core.seq); out: raw_spin_unlock_irqrestore(timekeeper_lock, flags); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] ntp: Introduce and use SECS_PER_DAY macro instead of 86400
Commit-ID: 90bf361ceae28dee50a584c3dd4c1a96178d982c Gitweb: http://git.kernel.org/tip/90bf361ceae28dee50a584c3dd4c1a96178d982c Author: John Stultz AuthorDate: Thu, 11 Jun 2015 15:54:54 -0700 Committer: Thomas Gleixner CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 ntp: Introduce and use SECS_PER_DAY macro instead of 86400 Currently the leapsecond logic uses what looks like magic values. Improve this by defining SECS_PER_DAY and using that macro to make the logic more clear. Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Daniel Bristot de Oliveira Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/ntp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 7a68100..7aa2161 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -35,6 +35,7 @@ unsigned long tick_nsec; static u64 tick_length; static u64 tick_length_base; +#define SECS_PER_DAY 86400 #define MAX_TICKADJ500LL /* usecs */ #define MAX_TICKADJ_SCALED \ (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) @@ -390,7 +391,7 @@ int second_overflow(unsigned long secs) case TIME_INS: if (!(time_status & STA_INS)) time_state = TIME_OK; - else if (secs % 86400 == 0) { + else if (secs % SECS_PER_DAY == 0) { leap = -1; time_state = TIME_OOP; printk(KERN_NOTICE @@ -400,7 +401,7 @@ int second_overflow(unsigned long secs) case TIME_DEL: if (!(time_status & STA_DEL)) time_state = TIME_OK; - else if ((secs + 1) % 86400 == 0) { + else if ((secs + 1) % SECS_PER_DAY == 0) { leap = 1; time_state = TIME_WAIT; printk(KERN_NOTICE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] ntp: Do leapsecond adjustment in adjtimex read path
Commit-ID: 96efdcf2d080687e041b0353c604b708546689fd Gitweb: http://git.kernel.org/tip/96efdcf2d080687e041b0353c604b708546689fd Author: John Stultz AuthorDate: Thu, 11 Jun 2015 15:54:56 -0700 Committer: Thomas Gleixner CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 ntp: Do leapsecond adjustment in adjtimex read path Since the leapsecond is applied at tick-time, this means there is a small window of time at the start of a leap-second where we cross into the next second before applying the leap. This patch modified adjtimex so that the leap-second is applied on the second edge. Providing more correct leapsecond behavior. This does make it so that adjtimex()'s returned time values can be inconsistent with time values read from gettimeofday() or clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at the leapsecond. However, those other interfaces do not provide the TIME_OOP time_state return that adjtimex() provides, which allows the leapsecond to be properly represented. They instead only see a time discontinuity, and cannot tell the first 23:59:59 from the repeated 23:59:59 leap second. This seems like a reasonable tradeoff given clock_gettime() / gettimeofday() cannot properly represent a leapsecond, and users likely care more about performance, while folks who are using adjtimex() more likely care about leap-second correctness. Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Daniel Bristot de Oliveira Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/ntp.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 033743e..fb4d98c 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -740,6 +740,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai) if (!(time_status & STA_NANO)) txc->time.tv_usec /= NSEC_PER_USEC; + /* Handle leapsec adjustments */ + if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) { + if ((time_state == TIME_INS) && (time_status & STA_INS)) { + result = TIME_OOP; + txc->tai++; + txc->time.tv_sec--; + } + if ((time_state == TIME_DEL) && (time_status & STA_DEL)) { + result = TIME_WAIT; + txc->tai--; + txc->time.tv_sec++; + } + if ((time_state == TIME_OOP) && + (ts->tv_sec == ntp_next_leap_sec)) { + result = TIME_WAIT; + } + } + return result; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] time: Prevent early expiry of hrtimers[ CLOCK_REALTIME] at the leap second edge
Commit-ID: 833f32d763028c1bb371c64f457788b933773b3e Gitweb: http://git.kernel.org/tip/833f32d763028c1bb371c64f457788b933773b3e Author: John Stultz AuthorDate: Thu, 11 Jun 2015 15:54:55 -0700 Committer: Thomas Gleixner CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge Currently, leapsecond adjustments are done at tick time. As a result, the leapsecond was applied at the first timer tick *after* the leapsecond (~1-10ms late depending on HZ), rather then exactly on the second edge. This was in part historical from back when we were always tick based, but correcting this since has been avoided since it adds extra conditional checks in the gettime fastpath, which has performance overhead. However, it was recently pointed out that ABS_TIME CLOCK_REALTIME timers set for right after the leapsecond could fire a second early, since some timers may be expired before we trigger the timekeeping timer, which then applies the leapsecond. This isn't quite as bad as it sounds, since behaviorally it is similar to what is possible w/ ntpd made leapsecond adjustments done w/o using the kernel discipline. Where due to latencies, timers may fire just prior to the settimeofday call. (Also, one should note that all applications using CLOCK_REALTIME timers should always be careful, since they are prone to quirks from settimeofday() disturbances.) However, the purpose of having the kernel do the leap adjustment is to avoid such latencies, so I think this is worth fixing. So in order to properly keep those timers from firing a second early, this patch modifies the ntp and timekeeping logic so that we keep enough state so that the update_base_offsets_now accessor, which provides the hrtimer core the current time, can check and apply the leapsecond adjustment on the second edge. This prevents the hrtimer core from expiring timers too early. This patch does not modify any other time read path, so no additional overhead is incurred. However, this also means that the leap-second continues to be applied at tick time for all other read-paths. Apologies to Richard Cochran, who pushed for similar changes years ago, which I resisted due to the concerns about the performance overhead. While I suspect this isn't extremely critical, folks who care about strict leap-second correctness will likely want to watch this. Potentially a -stable candidate eventually. Originally-suggested-by: Richard Cochran Reported-by: Daniel Bristot de Oliveira Reported-by: Prarit Bhargava Signed-off-by: John Stultz Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Shuah Khan Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/time64.h | 1 + include/linux/timekeeper_internal.h | 2 ++ kernel/time/ntp.c | 42 ++--- kernel/time/ntp_internal.h | 1 + kernel/time/timekeeping.c | 23 +++- 5 files changed, 61 insertions(+), 8 deletions(-) diff --git a/include/linux/time64.h b/include/linux/time64.h index 12d4e82..77b5df2 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -29,6 +29,7 @@ struct timespec64 { #define FSEC_PER_SEC 1000LL /* Located here for timespec[64]_valid_strict */ +#define TIME64_MAX ((s64)~((u64)1 << 63)) #define KTIME_MAX ((s64)~((u64)1 << 63)) #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e1f5a11..2524722 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -50,6 +50,7 @@ struct tk_read_base { * @offs_tai: Offset clock monotonic -> clock tai * @tai_offset:The current UTC to TAI offset in seconds * @clock_was_set_seq: The sequence number of clock was set events + * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second * @raw_time: Monotonic raw base time in timespec64 format * @cycle_interval:Number of clock cycles in one NTP interval * @xtime_interval:Number of clock shifted nano seconds in one NTP @@ -90,6 +91,7 @@ struct timekeeper { ktime_t offs_tai; s32 tai_offset; unsigned intclock_was_set_seq; + ktime_t next_leap_ktime; struct timespec64 raw_time; /* The following members are for timekeeping internal use */ diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 7aa2161..033743e 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -77,6 +77,9 @@ static long time_adjust; /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ static s64 ntp_tick_adj; +/*
[tip:timers/core] selftests: timers: Add leap-second timer edge testing to leap-a-day.c
Commit-ID: 0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0 Gitweb: http://git.kernel.org/tip/0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0 Author: John Stultz AuthorDate: Thu, 11 Jun 2015 15:54:57 -0700 Committer: Thomas Gleixner CommitDate: Fri, 12 Jun 2015 11:15:50 +0200 selftests: timers: Add leap-second timer edge testing to leap-a-day.c Prarit reported an issue w/ timers around the leapsecond, where a timer set for Midnight UTC (00:00:00) might fire a second early right before the leapsecond (23:59:60 - though it appears as a repeated 23:59:59) is applied. So I've updated the leap-a-day.c test to integrate a similar test, where we set a timer and check if it triggers at the right time, and if the ntp state transition is managed properly. Reported-by: Daniel Bristot de Oliveira Reported-by: Prarit Bhargava Signed-off-by: John Stultz Cc: Richard Cochran Cc: Jan Kara Cc: Jiri Bohac Cc: Shuah Khan Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- tools/testing/selftests/timers/leap-a-day.c | 76 +++-- 1 file changed, 72 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c index b8272e6..331c4f7 100644 --- a/tools/testing/selftests/timers/leap-a-day.c +++ b/tools/testing/selftests/timers/leap-a-day.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,9 @@ static inline int ksft_exit_fail(void) #define NSEC_PER_SEC 10ULL #define CLOCK_TAI 11 +time_t next_leap; +int error_found; + /* returns 1 if a <= b, 0 otherwise */ static inline int in_order(struct timespec a, struct timespec b) { @@ -134,6 +138,34 @@ void handler(int unused) exit(0); } +void sigalarm(int signo) +{ + struct timex tx; + char buf[26]; + int ret; + + tx.modes = 0; + ret = adjtimex(); + + ctime_r(_sec, buf); + buf[strlen(buf)-1] = 0; /*remove trailing\n */ + printf("%s + %6ld us (%i)\t%s - TIMER FIRED\n", + buf, + tx.time.tv_usec, + tx.tai, + time_state_str(ret)); + + if (tx.time.tv_sec < next_leap) { + printf("Error: Early timer expiration!\n"); + error_found = 1; + } + if (ret != TIME_WAIT) { + printf("Error: Incorrect NTP state?\n"); + error_found = 1; + } +} + + /* Test for known hrtimer failure */ void test_hrtimer_failure(void) { @@ -144,12 +176,19 @@ void test_hrtimer_failure(void) clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, , NULL); clock_gettime(CLOCK_REALTIME, ); - if (!in_order(target, now)) + if (!in_order(target, now)) { printf("ERROR: hrtimer early expiration failure observed.\n"); + error_found = 1; + } } int main(int argc, char **argv) { + timer_t tm1; + struct itimerspec its1; + struct sigevent se; + struct sigaction act; + int signum = SIGRTMAX; int settime = 0; int tai_time = 0; int insert = 1; @@ -191,6 +230,12 @@ int main(int argc, char **argv) signal(SIGINT, handler); signal(SIGKILL, handler); + /* Set up timer signal handler: */ + sigfillset(_mask); + act.sa_flags = 0; + act.sa_handler = sigalarm; + sigaction(signum, , NULL); + if (iterations < 0) printf("This runs continuously. Press ctrl-c to stop\n"); else @@ -201,7 +246,7 @@ int main(int argc, char **argv) int ret; struct timespec ts; struct timex tx; - time_t now, next_leap; + time_t now; /* Get the current time */ clock_gettime(CLOCK_REALTIME, ); @@ -251,10 +296,27 @@ int main(int argc, char **argv) printf("Scheduling leap second for %s", ctime(_leap)); + /* Set up timer */ + printf("Setting timer for %s", ctime(_leap)); + memset(, 0, sizeof(se)); + se.sigev_notify = SIGEV_SIGNAL; + se.sigev_signo = signum; + se.sigev_value.sival_int = 0; + if (timer_create(CLOCK_REALTIME, , ) == -1) { + printf("Error: timer_create failed\n"); + return ksft_exit_fail(); + } + its1.it_value.tv_sec = next_leap; + its1.it_value.tv_nsec = 0; + its1.it_interval.tv_sec = 0; + its1.it_interval.tv_nsec = 0; + timer_settime(tm1, TIMER_ABSTIME, , NULL); + /* Wake up 3 seconds before leap */ ts.tv_sec = next_leap - 3;
[tip:timers/core] time: Move clock_was_set_seq update before updating shadow-timekeeper
Commit-ID: d151832650ed98961a5650e73e85c349ad7839cb Gitweb: http://git.kernel.org/tip/d151832650ed98961a5650e73e85c349ad7839cb Author: John Stultz AuthorDate: Thu, 11 Jun 2015 15:54:53 -0700 Committer: Thomas Gleixner CommitDate: Fri, 12 Jun 2015 10:56:20 +0200 time: Move clock_was_set_seq update before updating shadow-timekeeper It was reported that 868a3e915f7f5eba (hrtimer: Make offset update smarter) was causing timer problems after suspend/resume. The problem with that change is the modification to clock_was_set_seq in timekeeping_update is done prior to mirroring the time state to the shadow-timekeeper. Thus the next time we do update_wall_time() the updated sequence is overwritten by whats in the shadow copy. This patch moves the shadow-timekeeper mirroring to the end of the function, after all updates have been made, so all data is kept in sync. (This patch also affects the update_fast_timekeeper calls which were also problematically done prior to the mirroring). Reported-and-tested-by: Jeremiah Mahler Signed-off-by: John Stultz Cc: Preeti U Murthy Cc: Peter Zijlstra Cc: Viresh Kumar Cc: Marcelo Tosatti Cc: Frederic Weisbecker Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 90ed5db..849b932 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -585,15 +585,19 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action) update_vsyscall(tk); update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET); - if (action & TK_MIRROR) - memcpy(_timekeeper, _core.timekeeper, - sizeof(tk_core.timekeeper)); - update_fast_timekeeper(>tkr_mono, _fast_mono); update_fast_timekeeper(>tkr_raw, _fast_raw); if (action & TK_CLOCK_WAS_SET) tk->clock_was_set_seq++; + /* +* The mirroring of the data to the shadow-timekeeper needs +* to happen last here to ensure we don't over-write the +* timekeeper structure on the next update with stale data +*/ + if (action & TK_MIRROR) + memcpy(_timekeeper, _core.timekeeper, + sizeof(tk_core.timekeeper)); } /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] time: Move clock_was_set_seq update before updating shadow-timekeeper
Commit-ID: d151832650ed98961a5650e73e85c349ad7839cb Gitweb: http://git.kernel.org/tip/d151832650ed98961a5650e73e85c349ad7839cb Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 11 Jun 2015 15:54:53 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Fri, 12 Jun 2015 10:56:20 +0200 time: Move clock_was_set_seq update before updating shadow-timekeeper It was reported that 868a3e915f7f5eba (hrtimer: Make offset update smarter) was causing timer problems after suspend/resume. The problem with that change is the modification to clock_was_set_seq in timekeeping_update is done prior to mirroring the time state to the shadow-timekeeper. Thus the next time we do update_wall_time() the updated sequence is overwritten by whats in the shadow copy. This patch moves the shadow-timekeeper mirroring to the end of the function, after all updates have been made, so all data is kept in sync. (This patch also affects the update_fast_timekeeper calls which were also problematically done prior to the mirroring). Reported-and-tested-by: Jeremiah Mahler jmmah...@gmail.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Preeti U Murthy pre...@linux.vnet.ibm.com Cc: Peter Zijlstra pet...@infradead.org Cc: Viresh Kumar viresh.ku...@linaro.org Cc: Marcelo Tosatti mtosa...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/timekeeping.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 90ed5db..849b932 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -585,15 +585,19 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action) update_vsyscall(tk); update_pvclock_gtod(tk, action TK_CLOCK_WAS_SET); - if (action TK_MIRROR) - memcpy(shadow_timekeeper, tk_core.timekeeper, - sizeof(tk_core.timekeeper)); - update_fast_timekeeper(tk-tkr_mono, tk_fast_mono); update_fast_timekeeper(tk-tkr_raw, tk_fast_raw); if (action TK_CLOCK_WAS_SET) tk-clock_was_set_seq++; + /* +* The mirroring of the data to the shadow-timekeeper needs +* to happen last here to ensure we don't over-write the +* timekeeper structure on the next update with stale data +*/ + if (action TK_MIRROR) + memcpy(shadow_timekeeper, tk_core.timekeeper, + sizeof(tk_core.timekeeper)); } /** -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] ntp: Introduce and use SECS_PER_DAY macro instead of 86400
Commit-ID: 90bf361ceae28dee50a584c3dd4c1a96178d982c Gitweb: http://git.kernel.org/tip/90bf361ceae28dee50a584c3dd4c1a96178d982c Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 11 Jun 2015 15:54:54 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 ntp: Introduce and use SECS_PER_DAY macro instead of 86400 Currently the leapsecond logic uses what looks like magic values. Improve this by defining SECS_PER_DAY and using that macro to make the logic more clear. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Prarit Bhargava pra...@redhat.com Cc: Daniel Bristot de Oliveira bris...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/ntp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 7a68100..7aa2161 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -35,6 +35,7 @@ unsigned long tick_nsec; static u64 tick_length; static u64 tick_length_base; +#define SECS_PER_DAY 86400 #define MAX_TICKADJ500LL /* usecs */ #define MAX_TICKADJ_SCALED \ (((MAX_TICKADJ * NSEC_PER_USEC) NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) @@ -390,7 +391,7 @@ int second_overflow(unsigned long secs) case TIME_INS: if (!(time_status STA_INS)) time_state = TIME_OK; - else if (secs % 86400 == 0) { + else if (secs % SECS_PER_DAY == 0) { leap = -1; time_state = TIME_OOP; printk(KERN_NOTICE @@ -400,7 +401,7 @@ int second_overflow(unsigned long secs) case TIME_DEL: if (!(time_status STA_DEL)) time_state = TIME_OK; - else if ((secs + 1) % 86400 == 0) { + else if ((secs + 1) % SECS_PER_DAY == 0) { leap = 1; time_state = TIME_WAIT; printk(KERN_NOTICE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] ntp: Do leapsecond adjustment in adjtimex read path
Commit-ID: 96efdcf2d080687e041b0353c604b708546689fd Gitweb: http://git.kernel.org/tip/96efdcf2d080687e041b0353c604b708546689fd Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 11 Jun 2015 15:54:56 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 ntp: Do leapsecond adjustment in adjtimex read path Since the leapsecond is applied at tick-time, this means there is a small window of time at the start of a leap-second where we cross into the next second before applying the leap. This patch modified adjtimex so that the leap-second is applied on the second edge. Providing more correct leapsecond behavior. This does make it so that adjtimex()'s returned time values can be inconsistent with time values read from gettimeofday() or clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at the leapsecond. However, those other interfaces do not provide the TIME_OOP time_state return that adjtimex() provides, which allows the leapsecond to be properly represented. They instead only see a time discontinuity, and cannot tell the first 23:59:59 from the repeated 23:59:59 leap second. This seems like a reasonable tradeoff given clock_gettime() / gettimeofday() cannot properly represent a leapsecond, and users likely care more about performance, while folks who are using adjtimex() more likely care about leap-second correctness. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Prarit Bhargava pra...@redhat.com Cc: Daniel Bristot de Oliveira bris...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/ntp.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 033743e..fb4d98c 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -740,6 +740,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai) if (!(time_status STA_NANO)) txc-time.tv_usec /= NSEC_PER_USEC; + /* Handle leapsec adjustments */ + if (unlikely(ts-tv_sec = ntp_next_leap_sec)) { + if ((time_state == TIME_INS) (time_status STA_INS)) { + result = TIME_OOP; + txc-tai++; + txc-time.tv_sec--; + } + if ((time_state == TIME_DEL) (time_status STA_DEL)) { + result = TIME_WAIT; + txc-tai--; + txc-time.tv_sec++; + } + if ((time_state == TIME_OOP) + (ts-tv_sec == ntp_next_leap_sec)) { + result = TIME_WAIT; + } + } + return result; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] time: Prevent early expiry of hrtimers[ CLOCK_REALTIME] at the leap second edge
Commit-ID: 833f32d763028c1bb371c64f457788b933773b3e Gitweb: http://git.kernel.org/tip/833f32d763028c1bb371c64f457788b933773b3e Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 11 Jun 2015 15:54:55 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Fri, 12 Jun 2015 11:15:49 +0200 time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge Currently, leapsecond adjustments are done at tick time. As a result, the leapsecond was applied at the first timer tick *after* the leapsecond (~1-10ms late depending on HZ), rather then exactly on the second edge. This was in part historical from back when we were always tick based, but correcting this since has been avoided since it adds extra conditional checks in the gettime fastpath, which has performance overhead. However, it was recently pointed out that ABS_TIME CLOCK_REALTIME timers set for right after the leapsecond could fire a second early, since some timers may be expired before we trigger the timekeeping timer, which then applies the leapsecond. This isn't quite as bad as it sounds, since behaviorally it is similar to what is possible w/ ntpd made leapsecond adjustments done w/o using the kernel discipline. Where due to latencies, timers may fire just prior to the settimeofday call. (Also, one should note that all applications using CLOCK_REALTIME timers should always be careful, since they are prone to quirks from settimeofday() disturbances.) However, the purpose of having the kernel do the leap adjustment is to avoid such latencies, so I think this is worth fixing. So in order to properly keep those timers from firing a second early, this patch modifies the ntp and timekeeping logic so that we keep enough state so that the update_base_offsets_now accessor, which provides the hrtimer core the current time, can check and apply the leapsecond adjustment on the second edge. This prevents the hrtimer core from expiring timers too early. This patch does not modify any other time read path, so no additional overhead is incurred. However, this also means that the leap-second continues to be applied at tick time for all other read-paths. Apologies to Richard Cochran, who pushed for similar changes years ago, which I resisted due to the concerns about the performance overhead. While I suspect this isn't extremely critical, folks who care about strict leap-second correctness will likely want to watch this. Potentially a -stable candidate eventually. Originally-suggested-by: Richard Cochran richardcoch...@gmail.com Reported-by: Daniel Bristot de Oliveira bris...@redhat.com Reported-by: Prarit Bhargava pra...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Shuah Khan shua...@osg.samsung.com Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/linux/time64.h | 1 + include/linux/timekeeper_internal.h | 2 ++ kernel/time/ntp.c | 42 ++--- kernel/time/ntp_internal.h | 1 + kernel/time/timekeeping.c | 23 +++- 5 files changed, 61 insertions(+), 8 deletions(-) diff --git a/include/linux/time64.h b/include/linux/time64.h index 12d4e82..77b5df2 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -29,6 +29,7 @@ struct timespec64 { #define FSEC_PER_SEC 1000LL /* Located here for timespec[64]_valid_strict */ +#define TIME64_MAX ((s64)~((u64)1 63)) #define KTIME_MAX ((s64)~((u64)1 63)) #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e1f5a11..2524722 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -50,6 +50,7 @@ struct tk_read_base { * @offs_tai: Offset clock monotonic - clock tai * @tai_offset:The current UTC to TAI offset in seconds * @clock_was_set_seq: The sequence number of clock was set events + * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second * @raw_time: Monotonic raw base time in timespec64 format * @cycle_interval:Number of clock cycles in one NTP interval * @xtime_interval:Number of clock shifted nano seconds in one NTP @@ -90,6 +91,7 @@ struct timekeeper { ktime_t offs_tai; s32 tai_offset; unsigned intclock_was_set_seq; + ktime_t next_leap_ktime; struct timespec64 raw_time; /* The following members are for timekeeping internal use */ diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 7aa2161..033743e 100644 ---
[tip:timers/core] selftests: timers: Add leap-second timer edge testing to leap-a-day.c
Commit-ID: 0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0 Gitweb: http://git.kernel.org/tip/0c4a5fc95b1df42651a9b4c1f72d348b3d196ea0 Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 11 Jun 2015 15:54:57 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Fri, 12 Jun 2015 11:15:50 +0200 selftests: timers: Add leap-second timer edge testing to leap-a-day.c Prarit reported an issue w/ timers around the leapsecond, where a timer set for Midnight UTC (00:00:00) might fire a second early right before the leapsecond (23:59:60 - though it appears as a repeated 23:59:59) is applied. So I've updated the leap-a-day.c test to integrate a similar test, where we set a timer and check if it triggers at the right time, and if the ntp state transition is managed properly. Reported-by: Daniel Bristot de Oliveira bris...@redhat.com Reported-by: Prarit Bhargava pra...@redhat.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Richard Cochran richardcoch...@gmail.com Cc: Jan Kara j...@suse.cz Cc: Jiri Bohac jbo...@suse.cz Cc: Shuah Khan shua...@osg.samsung.com Cc: Ingo Molnar mi...@kernel.org Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- tools/testing/selftests/timers/leap-a-day.c | 76 +++-- 1 file changed, 72 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c index b8272e6..331c4f7 100644 --- a/tools/testing/selftests/timers/leap-a-day.c +++ b/tools/testing/selftests/timers/leap-a-day.c @@ -44,6 +44,7 @@ #include time.h #include sys/time.h #include sys/timex.h +#include sys/errno.h #include string.h #include signal.h #include unistd.h @@ -63,6 +64,9 @@ static inline int ksft_exit_fail(void) #define NSEC_PER_SEC 10ULL #define CLOCK_TAI 11 +time_t next_leap; +int error_found; + /* returns 1 if a = b, 0 otherwise */ static inline int in_order(struct timespec a, struct timespec b) { @@ -134,6 +138,34 @@ void handler(int unused) exit(0); } +void sigalarm(int signo) +{ + struct timex tx; + char buf[26]; + int ret; + + tx.modes = 0; + ret = adjtimex(tx); + + ctime_r(tx.time.tv_sec, buf); + buf[strlen(buf)-1] = 0; /*remove trailing\n */ + printf(%s + %6ld us (%i)\t%s - TIMER FIRED\n, + buf, + tx.time.tv_usec, + tx.tai, + time_state_str(ret)); + + if (tx.time.tv_sec next_leap) { + printf(Error: Early timer expiration!\n); + error_found = 1; + } + if (ret != TIME_WAIT) { + printf(Error: Incorrect NTP state?\n); + error_found = 1; + } +} + + /* Test for known hrtimer failure */ void test_hrtimer_failure(void) { @@ -144,12 +176,19 @@ void test_hrtimer_failure(void) clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, target, NULL); clock_gettime(CLOCK_REALTIME, now); - if (!in_order(target, now)) + if (!in_order(target, now)) { printf(ERROR: hrtimer early expiration failure observed.\n); + error_found = 1; + } } int main(int argc, char **argv) { + timer_t tm1; + struct itimerspec its1; + struct sigevent se; + struct sigaction act; + int signum = SIGRTMAX; int settime = 0; int tai_time = 0; int insert = 1; @@ -191,6 +230,12 @@ int main(int argc, char **argv) signal(SIGINT, handler); signal(SIGKILL, handler); + /* Set up timer signal handler: */ + sigfillset(act.sa_mask); + act.sa_flags = 0; + act.sa_handler = sigalarm; + sigaction(signum, act, NULL); + if (iterations 0) printf(This runs continuously. Press ctrl-c to stop\n); else @@ -201,7 +246,7 @@ int main(int argc, char **argv) int ret; struct timespec ts; struct timex tx; - time_t now, next_leap; + time_t now; /* Get the current time */ clock_gettime(CLOCK_REALTIME, ts); @@ -251,10 +296,27 @@ int main(int argc, char **argv) printf(Scheduling leap second for %s, ctime(next_leap)); + /* Set up timer */ + printf(Setting timer for %s, ctime(next_leap)); + memset(se, 0, sizeof(se)); + se.sigev_notify = SIGEV_SIGNAL; + se.sigev_signo = signum; + se.sigev_value.sival_int = 0; + if (timer_create(CLOCK_REALTIME, se, tm1) == -1) { + printf(Error: timer_create failed\n); + return ksft_exit_fail(); + } + its1.it_value.tv_sec = next_leap; +
[tip:timers/urgent] ktime: Fix ktime_divns to do signed division
Commit-ID: f7bcb70ebae0dcdb5a2d859b09e4465784d99029 Gitweb: http://git.kernel.org/tip/f7bcb70ebae0dcdb5a2d859b09e4465784d99029 Author: John Stultz AuthorDate: Fri, 8 May 2015 13:47:23 -0700 Committer: Thomas Gleixner CommitDate: Wed, 13 May 2015 10:19:35 +0200 ktime: Fix ktime_divns to do signed division It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. [ tglx: Hand an u64 to do_div() to avoid the compiler warning ] Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes Signed-off-by: John Stultz Acked-by: Nicolas Pitre Cc: Ingo Molnar Cc: Josh Boyer Cc: One Thousand Gnomes Cc: Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/ktime.h | 27 +-- kernel/time/hrtimer.c | 14 -- 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index 5fc3d10..2b6a204 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -166,19 +166,34 @@ static inline bool ktime_before(const ktime_t cmp1, const ktime_t cmp2) } #if BITS_PER_LONG < 64 -extern u64 __ktime_divns(const ktime_t kt, s64 div); -static inline u64 ktime_divns(const ktime_t kt, s64 div) +extern s64 __ktime_divns(const ktime_t kt, s64 div); +static inline s64 ktime_divns(const ktime_t kt, s64 div) { + /* +* Negative divisors could cause an inf loop, +* so bug out here. +*/ + BUG_ON(div < 0); if (__builtin_constant_p(div) && !(div >> 32)) { - u64 ns = kt.tv64; - do_div(ns, div); - return ns; + s64 ns = kt.tv64; + u64 tmp = ns < 0 ? -ns : ns; + + do_div(tmp, div); + return ns < 0 ? -tmp : tmp; } else { return __ktime_divns(kt, div); } } #else /* BITS_PER_LONG < 64 */ -# define ktime_divns(kt, div) (u64)((kt).tv64 / (div)) +static inline s64 ktime_divns(const ktime_t kt, s64 div) +{ + /* +* 32-bit implementation cannot handle negative divisors, +* so catch them on 64bit as well. +*/ + WARN_ON(div < 0); + return kt.tv64 / div; +} #endif static inline s64 ktime_to_us(const ktime_t kt) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 76d4bd9..93ef7190 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -266,21 +266,23 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) /* * Divide a ktime value by a nanosecond value */ -u64 __ktime_divns(const ktime_t kt, s64 div) +s64 __ktime_divns(const ktime_t kt, s64 div) { - u64 dclc; int sft = 0; + s64 dclc; + u64 tmp; dclc = ktime_to_ns(kt); + tmp = dclc < 0 ? -dclc : dclc; + /* Make sure the divisor is less than 2^32: */ while (div >> 32) { sft++; div >>= 1; } - dclc >>= sft; - do_div(dclc, (unsigned long) div); - - return dclc; + tmp >>= sft; + do_div(tmp, (unsigned long) div); + return dclc < 0 ? -tmp : tmp; } EXPORT_SYMBOL_GPL(__ktime_divns); #endif /* BITS_PER_LONG >= 64 */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] ktime: Fix ktime_divns to do signed division
Commit-ID: f7bcb70ebae0dcdb5a2d859b09e4465784d99029 Gitweb: http://git.kernel.org/tip/f7bcb70ebae0dcdb5a2d859b09e4465784d99029 Author: John Stultz john.stu...@linaro.org AuthorDate: Fri, 8 May 2015 13:47:23 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Wed, 13 May 2015 10:19:35 +0200 ktime: Fix ktime_divns to do signed division It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. [ tglx: Hand an u64 to do_div() to avoid the compiler warning ] Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes tre...@tecnopolis.ca Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: Nicolas Pitre nicolas.pi...@linaro.org Cc: Ingo Molnar mi...@kernel.org Cc: Josh Boyer jwbo...@redhat.com Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/linux/ktime.h | 27 +-- kernel/time/hrtimer.c | 14 -- 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index 5fc3d10..2b6a204 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -166,19 +166,34 @@ static inline bool ktime_before(const ktime_t cmp1, const ktime_t cmp2) } #if BITS_PER_LONG 64 -extern u64 __ktime_divns(const ktime_t kt, s64 div); -static inline u64 ktime_divns(const ktime_t kt, s64 div) +extern s64 __ktime_divns(const ktime_t kt, s64 div); +static inline s64 ktime_divns(const ktime_t kt, s64 div) { + /* +* Negative divisors could cause an inf loop, +* so bug out here. +*/ + BUG_ON(div 0); if (__builtin_constant_p(div) !(div 32)) { - u64 ns = kt.tv64; - do_div(ns, div); - return ns; + s64 ns = kt.tv64; + u64 tmp = ns 0 ? -ns : ns; + + do_div(tmp, div); + return ns 0 ? -tmp : tmp; } else { return __ktime_divns(kt, div); } } #else /* BITS_PER_LONG 64 */ -# define ktime_divns(kt, div) (u64)((kt).tv64 / (div)) +static inline s64 ktime_divns(const ktime_t kt, s64 div) +{ + /* +* 32-bit implementation cannot handle negative divisors, +* so catch them on 64bit as well. +*/ + WARN_ON(div 0); + return kt.tv64 / div; +} #endif static inline s64 ktime_to_us(const ktime_t kt) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 76d4bd9..93ef7190 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -266,21 +266,23 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) /* * Divide a ktime value by a nanosecond value */ -u64 __ktime_divns(const ktime_t kt, s64 div) +s64 __ktime_divns(const ktime_t kt, s64 div) { - u64 dclc; int sft = 0; + s64 dclc; + u64 tmp; dclc = ktime_to_ns(kt); + tmp = dclc 0 ? -dclc : dclc; + /* Make sure the divisor is less than 2^32: */ while (div 32) { sft++; div = 1; } - dclc = sft; - do_div(dclc, (unsigned long) div); - - return dclc; + tmp = sft; + do_div(tmp, (unsigned long) div); + return dclc 0 ? -tmp : tmp; } EXPORT_SYMBOL_GPL(__ktime_divns); #endif /* BITS_PER_LONG = 64 */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] ktime: Fix ktime_divns to do signed division
Commit-ID: 37e159cccb3121308bf9885530e7b3044d2edec8 Gitweb: http://git.kernel.org/tip/37e159cccb3121308bf9885530e7b3044d2edec8 Author: John Stultz AuthorDate: Fri, 8 May 2015 13:47:23 -0700 Committer: Thomas Gleixner CommitDate: Tue, 12 May 2015 09:04:09 +0200 ktime: Fix ktime_divns to do signed division It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes Signed-off-by: John Stultz Acked-by: Nicolas Pitre Cc: Josh Boyer Cc: One Thousand Gnomes Cc: Ingo Molnar Cc: # 3.17+ Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- include/linux/ktime.h | 27 +++ kernel/time/hrtimer.c | 11 --- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index 5fc3d10..ab2de1c7 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t cmp1, const ktime_t cmp2) } #if BITS_PER_LONG < 64 -extern u64 __ktime_divns(const ktime_t kt, s64 div); -static inline u64 ktime_divns(const ktime_t kt, s64 div) +extern s64 __ktime_divns(const ktime_t kt, s64 div); +static inline s64 ktime_divns(const ktime_t kt, s64 div) { + /* +* Negative divisors could cause an inf loop, +* so bug out here. +*/ + BUG_ON(div < 0); if (__builtin_constant_p(div) && !(div >> 32)) { - u64 ns = kt.tv64; + s64 ns = kt.tv64; + int neg = (ns < 0); + + if (neg) + ns = -ns; do_div(ns, div); + if (neg) + ns = -ns; return ns; } else { return __ktime_divns(kt, div); } } #else /* BITS_PER_LONG < 64 */ -# define ktime_divns(kt, div) (u64)((kt).tv64 / (div)) +static inline s64 ktime_divns(const ktime_t kt, s64 div) +{ + /* +* 32-bit implementation cannot handle negative divisors, +* so catch them on 64bit as well. +*/ + WARN_ON(div < 0); + return kt.tv64 / div; +} #endif static inline s64 ktime_to_us(const ktime_t kt) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 76d4bd9..c98ce4d 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) /* * Divide a ktime value by a nanosecond value */ -u64 __ktime_divns(const ktime_t kt, s64 div) +s64 __ktime_divns(const ktime_t kt, s64 div) { - u64 dclc; - int sft = 0; + s64 dclc; + int neg, sft = 0; dclc = ktime_to_ns(kt); + neg = (dclc < 0); + if (neg) + dclc = -dclc; /* Make sure the divisor is less than 2^32: */ while (div >> 32) { sft++; @@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div) } dclc >>= sft; do_div(dclc, (unsigned long) div); + if (neg) + dclc = -dclc; return dclc; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] ktime: Fix ktime_divns to do signed division
Commit-ID: 37e159cccb3121308bf9885530e7b3044d2edec8 Gitweb: http://git.kernel.org/tip/37e159cccb3121308bf9885530e7b3044d2edec8 Author: John Stultz john.stu...@linaro.org AuthorDate: Fri, 8 May 2015 13:47:23 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 12 May 2015 09:04:09 +0200 ktime: Fix ktime_divns to do signed division It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes tre...@tecnopolis.ca Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: Nicolas Pitre nicolas.pi...@linaro.org Cc: Josh Boyer jwbo...@redhat.com Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk Cc: Ingo Molnar mi...@kernel.org Cc: sta...@vger.kernel.org # 3.17+ Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/linux/ktime.h | 27 +++ kernel/time/hrtimer.c | 11 --- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index 5fc3d10..ab2de1c7 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -166,19 +166,38 @@ static inline bool ktime_before(const ktime_t cmp1, const ktime_t cmp2) } #if BITS_PER_LONG 64 -extern u64 __ktime_divns(const ktime_t kt, s64 div); -static inline u64 ktime_divns(const ktime_t kt, s64 div) +extern s64 __ktime_divns(const ktime_t kt, s64 div); +static inline s64 ktime_divns(const ktime_t kt, s64 div) { + /* +* Negative divisors could cause an inf loop, +* so bug out here. +*/ + BUG_ON(div 0); if (__builtin_constant_p(div) !(div 32)) { - u64 ns = kt.tv64; + s64 ns = kt.tv64; + int neg = (ns 0); + + if (neg) + ns = -ns; do_div(ns, div); + if (neg) + ns = -ns; return ns; } else { return __ktime_divns(kt, div); } } #else /* BITS_PER_LONG 64 */ -# define ktime_divns(kt, div) (u64)((kt).tv64 / (div)) +static inline s64 ktime_divns(const ktime_t kt, s64 div) +{ + /* +* 32-bit implementation cannot handle negative divisors, +* so catch them on 64bit as well. +*/ + WARN_ON(div 0); + return kt.tv64 / div; +} #endif static inline s64 ktime_to_us(const ktime_t kt) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 76d4bd9..c98ce4d 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -266,12 +266,15 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) /* * Divide a ktime value by a nanosecond value */ -u64 __ktime_divns(const ktime_t kt, s64 div) +s64 __ktime_divns(const ktime_t kt, s64 div) { - u64 dclc; - int sft = 0; + s64 dclc; + int neg, sft = 0; dclc = ktime_to_ns(kt); + neg = (dclc 0); + if (neg) + dclc = -dclc; /* Make sure the divisor is less than 2^32: */ while (div 32) { sft++; @@ -279,6 +282,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div) } dclc = sft; do_div(dclc, (unsigned long) div); + if (neg) + dclc = -dclc; return dclc; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin
Commit-ID: 8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09 Gitweb: http://git.kernel.org/tip/8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09 Author: John Stultz AuthorDate: Wed, 1 Apr 2015 20:34:39 -0700 Committer: Ingo Molnar CommitDate: Fri, 3 Apr 2015 08:18:35 +0200 clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin Ingo noted that the description of clocks_calc_max_nsecs()'s 50% safety margin was somewhat circular. So this patch tries to improve the comment to better explain what we mean by the 50% safety margin and why we need it. Signed-off-by: John Stultz Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/clocksource.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c3be3c7..15facb1 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @max_cyc: maximum cycle value before potential overflow (does not include * any safety margin) * - * NOTE: This function includes a safety margin of 50%, so that bad clock values - * can be detected. + * NOTE: This function includes a safety margin of 50%, in other words, we + * return half the number of nanoseconds the hardware counter can technically + * cover. This is done so that we can potentially detect problems caused by + * delayed timers or bad hardware, which might result in time intervals that + * are larger then what the math used can handle without overflows. */ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin
Commit-ID: 8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09 Gitweb: http://git.kernel.org/tip/8e56f33f8439b2f8e7f4ae7f3d0bfe683ecc3b09 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 1 Apr 2015 20:34:39 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 3 Apr 2015 08:18:35 +0200 clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin Ingo noted that the description of clocks_calc_max_nsecs()'s 50% safety margin was somewhat circular. So this patch tries to improve the comment to better explain what we mean by the 50% safety margin and why we need it. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/clocksource.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c3be3c7..15facb1 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @max_cyc: maximum cycle value before potential overflow (does not include * any safety margin) * - * NOTE: This function includes a safety margin of 50%, so that bad clock values - * can be detected. + * NOTE: This function includes a safety margin of 50%, in other words, we + * return half the number of nanoseconds the hardware counter can technically + * cover. This is done so that we can potentially detect problems caused by + * delayed timers or bad hardware, which might result in time intervals that + * are larger then what the math used can handle without overflows. */ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Add some debug info about clocksources being registered
Commit-ID: 8cc8c525ad4e7b581cacf84119e1a28dcb4044db Gitweb: http://git.kernel.org/tip/8cc8c525ad4e7b581cacf84119e1a28dcb4044db Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:39 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:07 +0100 clocksource: Add some debug info about clocksources being registered Print the mask, max_cycles, and max_idle_ns values for clocksources being registered. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/clocksource.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 5cdf17e..1977eba 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -703,6 +703,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) cs->name); clocksource_update_max_deferment(cs); + + pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n", + cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns); } EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Rename __clocksource_updatefreq_*( ) to __clocksource_update_freq_*()
Commit-ID: fba9e07208c0f9d92d9f73761c99c8612039da44 Gitweb: http://git.kernel.org/tip/fba9e07208c0f9d92d9f73761c99c8612039da44 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:40 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:08 +0100 clocksource: Rename __clocksource_updatefreq_*() to __clocksource_update_freq_*() Ingo requested this function be renamed to improve readability, so I've renamed __clocksource_updatefreq_scale() as well as the __clocksource_updatefreq_hz/khz() functions to avoid squishedtogethernames. This touches some of the sh clocksources, which I've not tested. The arch/arm/plat-omap change is just a comment change for consistency. Signed-off-by: John Stultz Cc: Daniel Lezcano Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- arch/arm/plat-omap/counter_32k.c | 2 +- drivers/clocksource/em_sti.c | 2 +- drivers/clocksource/sh_cmt.c | 2 +- drivers/clocksource/sh_tmu.c | 2 +- include/linux/clocksource.h | 10 +- kernel/time/clocksource.c| 11 ++- 6 files changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c index 61b4d70..43cf745 100644 --- a/arch/arm/plat-omap/counter_32k.c +++ b/arch/arm/plat-omap/counter_32k.c @@ -103,7 +103,7 @@ int __init omap_init_clocksource_32k(void __iomem *vbase) /* * 12 rough estimate from the calculations in -* __clocksource_updatefreq_scale. +* __clocksource_update_freq_scale. */ clocks_calc_mult_shift(_mult, _shift, 32768, NSEC_PER_SEC, 12); diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c index d0a7bd6..dc3c6ee 100644 --- a/drivers/clocksource/em_sti.c +++ b/drivers/clocksource/em_sti.c @@ -210,7 +210,7 @@ static int em_sti_clocksource_enable(struct clocksource *cs) ret = em_sti_start(p, USER_CLOCKSOURCE); if (!ret) - __clocksource_updatefreq_hz(cs, p->rate); + __clocksource_update_freq_hz(cs, p->rate); return ret; } diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c index 2bd13b5..b8ff3c6 100644 --- a/drivers/clocksource/sh_cmt.c +++ b/drivers/clocksource/sh_cmt.c @@ -641,7 +641,7 @@ static int sh_cmt_clocksource_enable(struct clocksource *cs) ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE); if (!ret) { - __clocksource_updatefreq_hz(cs, ch->rate); + __clocksource_update_freq_hz(cs, ch->rate); ch->cs_enabled = true; } return ret; diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c index f150ca82..b6b8fa3 100644 --- a/drivers/clocksource/sh_tmu.c +++ b/drivers/clocksource/sh_tmu.c @@ -272,7 +272,7 @@ static int sh_tmu_clocksource_enable(struct clocksource *cs) ret = sh_tmu_enable(ch); if (!ret) { - __clocksource_updatefreq_hz(cs, ch->rate); + __clocksource_update_freq_hz(cs, ch->rate); ch->cs_enabled = true; } diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index bd98eaa..1355098 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -200,7 +200,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec); extern int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq); extern void -__clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq); +__clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq); /* * Don't call this unless you are a default clocksource @@ -221,14 +221,14 @@ static inline int clocksource_register_khz(struct clocksource *cs, u32 khz) return __clocksource_register_scale(cs, 1000, khz); } -static inline void __clocksource_updatefreq_hz(struct clocksource *cs, u32 hz) +static inline void __clocksource_update_freq_hz(struct clocksource *cs, u32 hz) { - __clocksource_updatefreq_scale(cs, 1, hz); + __clocksource_update_freq_scale(cs, 1, hz); } -static inline void __clocksource_updatefreq_khz(struct clocksource *cs, u32 khz) +static inline void __clocksource_update_freq_khz(struct clocksource *cs, u32 khz) { - __clocksource_updatefreq_scale(cs, 1000, khz); + __clocksource_update_freq_scale(cs, 1000, khz); } diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 1977eba..c3be3c7 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -643,7 +643,7 @@ static void clocksource_enqueue(struct clocksource *cs) } /** - * __clocksource_updatefreq_scale - Used update clocksource with new freq + * __clocksource_update_freq_scale -
[tip:timers/core] clocksource: Mostly kill clocksource_register()
Commit-ID: f8935983f110505daa38e8d36ee406807f83a069 Gitweb: http://git.kernel.org/tip/f8935983f110505daa38e8d36ee406807f83a069 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:37 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:06 +0100 clocksource: Mostly kill clocksource_register() A long running project has been to clean up remaining uses of clocksource_register(), replacing it with the simpler clocksource_register_khz/hz() functions. However, there are a few cases where we need to self-define our mult/shift values, so switch the function to a more obviously internal __clocksource_register() name, and consolidate much of the internal logic so we don't have duplication. Signed-off-by: John Stultz Cc: Dave Jones Cc: David S. Miller Cc: Linus Torvalds Cc: Martin Schwidefsky Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stu...@linaro.org [ Minor cleanups. ] Signed-off-by: Ingo Molnar --- arch/s390/kernel/time.c | 2 +- arch/sparc/kernel/time_32.c | 2 +- include/linux/clocksource.h | 10 +- kernel/time/clocksource.c | 81 +++-- kernel/time/jiffies.c | 4 +-- 5 files changed, 47 insertions(+), 52 deletions(-) diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c index 20660dd..6c273cd 100644 --- a/arch/s390/kernel/time.c +++ b/arch/s390/kernel/time.c @@ -283,7 +283,7 @@ void __init time_init(void) if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt)) panic("Couldn't request external interrupt 0x1406"); - if (clocksource_register(_tod) != 0) + if (__clocksource_register(_tod) != 0) panic("Could not register TOD clock source"); /* Enable TOD clock interrupts on the boot cpu. */ diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c index 2f80d23..a31c0c8 100644 --- a/arch/sparc/kernel/time_32.c +++ b/arch/sparc/kernel/time_32.c @@ -191,7 +191,7 @@ static __init int setup_timer_cs(void) timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate, timer_cs.shift); - return clocksource_register(_cs); + return __clocksource_register(_cs); } #ifdef CONFIG_SMP diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 16d048c..bd98eaa 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -179,7 +179,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift) } -extern int clocksource_register(struct clocksource*); extern int clocksource_unregister(struct clocksource*); extern void clocksource_touch_watchdog(void); extern struct clocksource* clocksource_get_next(void); @@ -203,6 +202,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq); extern void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq); +/* + * Don't call this unless you are a default clocksource + * (AKA: jiffies) and absolutely have to. + */ +static inline int __clocksource_register(struct clocksource *cs) +{ + return __clocksource_register_scale(cs, 1, 0); +} + static inline int clocksource_register_hz(struct clocksource *cs, u32 hz) { return __clocksource_register_scale(cs, 1, hz); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c4cc04b..5cdf17e 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -656,38 +656,52 @@ static void clocksource_enqueue(struct clocksource *cs) void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) { u64 sec; + /* -* Calc the maximum number of seconds which we can run before -* wrapping around. For clocksources which have a mask > 32bit -* we need to limit the max sleep time to have a good -* conversion precision. 10 minutes is still a reasonable -* amount. That results in a shift value of 24 for a -* clocksource with mask >= 40bit and f >= 4GHz. That maps to -* ~ 0.06ppm granularity for NTP. +* Default clocksources are *special* and self-define their mult/shift. +* But, you're not special, so you should specify a freq value. */ - sec = cs->mask; - do_div(sec, freq); - do_div(sec, scale); - if (!sec) - sec = 1; - else if (sec > 600 && cs->mask > UINT_MAX) - sec = 600; - - clocks_calc_mult_shift(>mult, >shift, freq, - NSEC_PER_SEC / scale, sec * scale); - + if (freq) { + /* +* Calc the maximum number of seconds which we can run before +* wrapping around. For clocksources which have a mask > 32-bit +* we need to limit the max sleep time to have a good +
[tip:timers/core] clocksource, sparc32: Convert to using clocksource_register_hz()
Commit-ID: 3142f76022fe46f6e0a0d3940b23fb6ccb794692 Gitweb: http://git.kernel.org/tip/3142f76022fe46f6e0a0d3940b23fb6ccb794692 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:38 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:07 +0100 clocksource, sparc32: Convert to using clocksource_register_hz() While cleaning up some clocksource code, I noticed the time_32 implementation uses the clocksource_hz2mult() helper, but doesn't use the clocksource_register_hz() method. I don't believe the Sparc clocksource is a default clocksource, so we shouldn't need to self-define the mult/shift pair. So convert the time_32.c implementation to use clocksource_register_hz(). Untested. Signed-off-by: John Stultz Acked-by: David S. Miller Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-11-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- arch/sparc/kernel/time_32.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c index a31c0c8..18147a5 100644 --- a/arch/sparc/kernel/time_32.c +++ b/arch/sparc/kernel/time_32.c @@ -181,17 +181,13 @@ static struct clocksource timer_cs = { .rating = 100, .read = timer_cs_read, .mask = CLOCKSOURCE_MASK(64), - .shift = 2, .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; static __init int setup_timer_cs(void) { timer_cs_enabled = 1; - timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate, - timer_cs.shift); - - return __clocksource_register(_cs); + return clocksource_register_hz(_cs, sparc_config.clock_rate); } #ifdef CONFIG_SMP -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Improve clocksource watchdog reporting
Commit-ID: 0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8 Gitweb: http://git.kernel.org/tip/0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:36 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:06 +0100 clocksource: Improve clocksource watchdog reporting The clocksource watchdog reporting has been less helpful then desired, as it just printed the delta between the two clocksources. This prevents any useful analysis of why the skew occurred. Thus this patch tries to improve the output when we mark a clocksource as unstable, printing out the cycle last and now values for both the current clocksource and the watchdog clocksource. This will allow us to see if the result was due to a false positive caused by a problematic watchdog. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stu...@linaro.org [ Minor cleanups of kernel messages. ] Signed-off-by: Ingo Molnar --- kernel/time/clocksource.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index fc2a9de..c4cc04b 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -142,13 +142,6 @@ static void __clocksource_unstable(struct clocksource *cs) schedule_work(_work); } -static void clocksource_unstable(struct clocksource *cs, int64_t delta) -{ - printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n", - cs->name, delta); - __clocksource_unstable(cs); -} - /** * clocksource_mark_unstable - mark clocksource unstable via watchdog * @cs:clocksource to be marked unstable @@ -174,7 +167,7 @@ void clocksource_mark_unstable(struct clocksource *cs) static void clocksource_watchdog(unsigned long data) { struct clocksource *cs; - cycle_t csnow, wdnow, delta; + cycle_t csnow, wdnow, cslast, wdlast, delta; int64_t wd_nsec, cs_nsec; int next_cpu, reset_pending; @@ -213,6 +206,8 @@ static void clocksource_watchdog(unsigned long data) delta = clocksource_delta(csnow, cs->cs_last, cs->mask); cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift); + wdlast = cs->wd_last; /* save these in case we print them */ + cslast = cs->cs_last; cs->cs_last = csnow; cs->wd_last = wdnow; @@ -221,7 +216,12 @@ static void clocksource_watchdog(unsigned long data) /* Check the deviation from the watchdog clocksource. */ if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) { - clocksource_unstable(cs, cs_nsec - wd_nsec); + pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable, because the skew is too large:\n", cs->name); + pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n", + watchdog->name, wdnow, wdlast, watchdog->mask); + pr_warn(" '%s' cs_now: %llx cs_last: %llx mask: %llx\n", + cs->name, csnow, cslast, cs->mask); + __clocksource_unstable(cs); continue; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Try to catch clocksource delta underflows
Commit-ID: 057b87e3161d1194a095718f9918c01b2c389e74 Gitweb: http://git.kernel.org/tip/057b87e3161d1194a095718f9918c01b2c389e74 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:34 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:05 +0100 timekeeping: Try to catch clocksource delta underflows In the case where there is a broken clocksource where there are multiple actual clocks that aren't perfectly aligned, we may see small "negative" deltas when we subtract 'now' from 'cycle_last'. The values are actually negative with respect to the clocksource mask value, not necessarily negative if cast to a s64, but we can check by checking the delta to see if it is a small (relative to the mask) negative value (again negative relative to the mask). If so, we assume we jumped backwards somehow and instead use zero for our delta. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 657414c..187149b 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -148,6 +148,13 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) /* calculate the delta since the last update_wall_time */ delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + /* +* Try to catch underflows by checking if we are seeing small +* mask-relative negative values. +*/ + if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3))) + delta = 0; + /* Cap delta value to the max_cycles values to avoid mult overflows */ if (unlikely(delta > tkr->clock->max_cycles)) delta = tkr->clock->max_cycles; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Add warnings when overflows or underflows are observed
Commit-ID: 4ca22c2648f9c1cec0b242f58d7302136f5a4cbb Gitweb: http://git.kernel.org/tip/4ca22c2648f9c1cec0b242f58d7302136f5a4cbb Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:35 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:05 +0100 timekeeping: Add warnings when overflows or underflows are observed It was suggested that the underflow/overflow protection should probably throw some sort of warning out, rather than just silently fixing the issue. So this patch adds some warnings here. The flag variables used are not protected by locks, but since we can't print from the reading functions, just being able to say we saw an issue in the update interval is useful enough, and can be slightly racy without real consequence. The big complication is that we're only under a read seqlock, so the data could shift under us during our calculation to see if there was a problem. This patch avoids this issue by nesting another seqlock which allows us to snapshot the just required values atomically. So we shouldn't see false positives. I also added some basic rate-limiting here, since on one build machine w/ skewed TSCs it was fairly noisy at bootup. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 64 +-- 1 file changed, 57 insertions(+), 7 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 187149b..892f6cb 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -119,6 +119,20 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) } #ifdef CONFIG_DEBUG_TIMEKEEPING +#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ +/* + * These simple flag variables are managed + * without locks, which is racy, but ok since + * we don't really care about being super + * precise about how many events were seen, + * just that a problem was observed. + */ +static int timekeeping_underflow_seen; +static int timekeeping_overflow_seen; + +/* last_warning is only modified under the timekeeping lock */ +static long timekeeping_last_warning; + static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) { @@ -136,28 +150,64 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) printk_deferred(" timekeeping: Your kernel is still fine, but is feeling a bit nervous\n"); } } + + if (timekeeping_underflow_seen) { + if (jiffies - timekeeping_last_warning > WARNING_FREQ) { + printk_deferred("WARNING: Underflow in clocksource '%s' observed, time update ignored.\n", name); + printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); + printk_deferred(" Your kernel is probably still fine.\n"); + timekeeping_last_warning = jiffies; + } + timekeeping_underflow_seen = 0; + } + + if (timekeeping_overflow_seen) { + if (jiffies - timekeeping_last_warning > WARNING_FREQ) { + printk_deferred("WARNING: Overflow in clocksource '%s' observed, time update capped.\n", name); + printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); + printk_deferred(" Your kernel is probably still fine.\n"); + timekeeping_last_warning = jiffies; + } + timekeeping_overflow_seen = 0; + } } static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) { - cycle_t cycle_now, delta; + cycle_t now, last, mask, max, delta; + unsigned int seq; - /* read clocksource */ - cycle_now = tkr->read(tkr->clock); + /* +* Since we're called holding a seqlock, the data may shift +* under us while we're doing the calculation. This can cause +* false positives, since we'd note a problem but throw the +* results away. So nest another seqlock here to atomically +* grab the points we are checking with. +*/ + do { + seq = read_seqcount_begin(_core.seq); + now = tkr->read(tkr->clock); + last = tkr->cycle_last; + mask = tkr->mask; + max = tkr->clock->max_cycles; + } while (read_seqcount_retry(_core.seq, seq)); - /* calculate the delta since the last update_wall_time */ - delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + delta = clocksource_delta(now, last, mask);
[tip:timers/core] timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value
Commit-ID: a558cd021d83b65c47ee5b9bec1fcfe5298a769f Gitweb: http://git.kernel.org/tip/a558cd021d83b65c47ee5b9bec1fcfe5298a769f Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:33 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:07:04 +0100 timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value When calculating the current delta since the last tick, we currently have no hard protections to prevent a multiplication overflow from occuring. This patch introduces infrastructure to allow a cap that limits the clocksource read delta value to the 'max_cycles' value, which is where an overflow would occur. Since this is in the hotpath, it adds the extra checking under CONFIG_DEBUG_TIMEKEEPING=y. There was some concern that capping time like this could cause problems as we may stop expiring timers, which could go circular if the timer that triggers time accumulation were mis-scheduled too far in the future, which would cause time to stop. However, since the mult overflow would result in a smaller time value, we would effectively have the same problem there. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 49 +-- 1 file changed, 35 insertions(+), 14 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index acf0491..657414c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -126,9 +126,9 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) const char *name = tk->tkr.clock->name; if (offset > max_cycles) { - printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow\n", + printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow danger\n", offset, name, max_cycles); - printk_deferred(" timekeeping: Your kernel is sick, but tries to cope\n"); + printk_deferred(" timekeeping: Your kernel is sick, but tries to cope by capping time updates\n"); } else { if (offset > (max_cycles >> 1)) { printk_deferred("INFO: timekeeping: Cycle offset (%lld) is larger than the the '%s' clock's 50%% safety margin (%lld)\n", @@ -137,10 +137,39 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) } } } + +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) +{ + cycle_t cycle_now, delta; + + /* read clocksource */ + cycle_now = tkr->read(tkr->clock); + + /* calculate the delta since the last update_wall_time */ + delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + + /* Cap delta value to the max_cycles values to avoid mult overflows */ + if (unlikely(delta > tkr->clock->max_cycles)) + delta = tkr->clock->max_cycles; + + return delta; +} #else static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) { } +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) +{ + cycle_t cycle_now, delta; + + /* read clocksource */ + cycle_now = tkr->read(tkr->clock); + + /* calculate the delta since the last update_wall_time */ + delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + + return delta; +} #endif /** @@ -218,14 +247,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; } static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) { - cycle_t cycle_now, delta; + cycle_t delta; s64 nsec; - /* read clocksource: */ - cycle_now = tkr->read(tkr->clock); - - /* calculate the delta since the last update_wall_time: */ - delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + delta = timekeeping_get_delta(tkr); nsec = delta * tkr->mult + tkr->xtime_nsec; nsec >>= tkr->shift; @@ -237,14 +262,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) { struct clocksource *clock = tk->tkr.clock; - cycle_t cycle_now, delta; + cycle_t delta; s64 nsec; - /* read clocksource: */ - cycle_now = tk->tkr.read(clock); - - /* calculate the delta since the last update_wall_time: */ - delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask); + delta = timekeeping_get_delta(>tkr); /* convert delta to
[tip:timers/core] clocksource: Simplify the logic around clocksource wrapping safety margins
Commit-ID: 362fde0410377e468ca00ad363fdf3e3ec42eb6a Gitweb: http://git.kernel.org/tip/362fde0410377e468ca00ad363fdf3e3ec42eb6a Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:30 -0700 Committer: Ingo Molnar CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Simplify the logic around clocksource wrapping safety margins The clocksource logic has a number of places where we try to include a safety margin. Most of these are 12% safety margins, but they are inconsistently applied and sometimes are applied on top of each other. Additionally, in the previous patch, we corrected an issue where we unintentionally in effect created a 50% safety margin, which these 12.5% margins where then added to. So to simplify the logic here, this patch removes the various 12.5% margins, and consolidates adding the margin in one place: clocks_calc_max_nsecs(). Additionally, Linus prefers a 50% safety margin, as it allows bad clock values to be more easily caught. This should really have no net effect, due to the corrected issue earlier which caused greater then 50% margins to be used w/o issue. Signed-off-by: John Stultz Acked-by: Stephen Boyd (for the sched_clock.c bit) Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/clocksource.c | 26 -- kernel/time/sched_clock.c | 4 ++-- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 2148f41..ace9576 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @shift: cycle to nanosecond divisor (power of two) * @maxadj:maximum adjustment value to mult (~11%) * @mask: bitmask for two's complement subtraction of non 64 bit counters + * + * NOTE: This function includes a safety margin of 50%, so that bad clock values + * can be detected. */ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) { @@ -490,11 +493,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) max_cycles = min(max_cycles, mask); max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift); + /* Return 50% of the actual maximum, so we can detect bad values */ + max_nsecs >>= 1; + return max_nsecs; } /** - * clocksource_max_deferment - Returns max time the clocksource can be deferred + * clocksource_max_deferment - Returns max time the clocksource should be deferred * @cs: Pointer to clocksource * */ @@ -504,13 +510,7 @@ static u64 clocksource_max_deferment(struct clocksource *cs) max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj, cs->mask); - /* -* To ensure that the clocksource does not wrap whilst we are idle, -* limit the time the clocksource can be deferred by 12.5%. Please -* note a margin of 12.5% is used because this can be computed with -* a shift, versus say 10% which would require division. -*/ - return max_nsecs - (max_nsecs >> 3); + return max_nsecs; } #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET @@ -659,10 +659,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) * conversion precision. 10 minutes is still a reasonable * amount. That results in a shift value of 24 for a * clocksource with mask >= 40bit and f >= 4GHz. That maps to -* ~ 0.06ppm granularity for NTP. We apply the same 12.5% -* margin as we do in clocksource_max_deferment() +* ~ 0.06ppm granularity for NTP. */ - sec = (cs->mask - (cs->mask >> 3)); + sec = cs->mask; do_div(sec, freq); do_div(sec, scale); if (!sec) @@ -674,9 +673,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) NSEC_PER_SEC / scale, sec * scale); /* -* for clocksources that have large mults, to avoid overflow. -* Since mult may be adjusted by ntp, add an safety extra margin -* +* Ensure clocksources that have large 'mult' values don't overflow +* when adjusted. */ cs->maxadj = clocksource_max_adjustment(cs); while ((cs->mult + cs->maxadj < cs->mult) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 01d2d15..3b8ae45 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -125,9 +125,9 @@ void __init sched_clock_register(u64 (*read)(void), int bits, new_mask = CLOCKSOURCE_MASK(bits); - /* calculate how many ns until we wrap */ + /* calculate how many nanosecs until we risk wrapping
[tip:timers/core] timekeeping: Add debugging checks to warn if we see delays
Commit-ID: 3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5 Gitweb: http://git.kernel.org/tip/3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:32 -0700 Committer: Ingo Molnar CommitDate: Fri, 13 Mar 2015 08:06:58 +0100 timekeeping: Add debugging checks to warn if we see delays Recently there's been requests for better sanity checking in the time code, so that it's more clear when something is going wrong, since timekeeping issues could manifest in a large number of strange ways in various subsystems. Thus, this patch adds some extra infrastructure to add a check to update_wall_time() to print two new warnings: 1) if we see the call delayed beyond the 'max_cycles' overflow point, 2) or if we see the call delayed beyond the clocksource's 'max_idle_ns' value, which is currently 50% of the overflow point. This extra infrastructure is conditional on a new CONFIG_DEBUG_TIMEKEEPING option, also added in this patch - default off. Tested this a bit by halting qemu for specified lengths of time to trigger the warnings. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stu...@linaro.org [ Improved the changelog and the messages a bit. ] Signed-off-by: Ingo Molnar --- kernel/time/jiffies.c | 1 + kernel/time/timekeeping.c | 28 lib/Kconfig.debug | 13 + 3 files changed, 42 insertions(+) diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c index a6a5bf5..7e41390 100644 --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = { .mask = 0x, /*32bits*/ .mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */ .shift = JIFFIES_SHIFT, + .max_cycles = 10, }; __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 91db941..acf0491 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -118,6 +118,31 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) tk->offs_boot = ktime_add(tk->offs_boot, delta); } +#ifdef CONFIG_DEBUG_TIMEKEEPING +static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) +{ + + cycle_t max_cycles = tk->tkr.clock->max_cycles; + const char *name = tk->tkr.clock->name; + + if (offset > max_cycles) { + printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow\n", + offset, name, max_cycles); + printk_deferred(" timekeeping: Your kernel is sick, but tries to cope\n"); + } else { + if (offset > (max_cycles >> 1)) { + printk_deferred("INFO: timekeeping: Cycle offset (%lld) is larger than the the '%s' clock's 50%% safety margin (%lld)\n", + offset, name, max_cycles >> 1); + printk_deferred(" timekeeping: Your kernel is still fine, but is feeling a bit nervous\n"); + } + } +} +#else +static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) +{ +} +#endif + /** * tk_setup_internals - Set up internals to use clocksource clock. * @@ -1630,6 +1655,9 @@ void update_wall_time(void) if (offset < real_tk->cycle_interval) goto out; + /* Do some additional sanity checking */ + timekeeping_check_update(real_tk, offset); + /* * With NO_HZ we may have to accumulate many cycle_intervals * (think "ticks") worth of time at once. To do this efficiently, diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c5cefb3..36b6fa8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -865,6 +865,19 @@ config SCHED_STACK_END_CHECK data corruption or a sporadic crash at a later stage once the region is examined. The runtime overhead introduced is minimal. +config DEBUG_TIMEKEEPING + bool "Enable extra timekeeping sanity checking" + help + This option will enable additional timekeeping sanity checks + which may be helpful when diagnosing issues where timekeeping + problems are suspected. + + This may include checks in the timekeeping hotpaths, so this + option may have a (very small) performance impact to some + workloads. + + If unsure, say N. + config TIMER_STATS bool "Collect kernel timers statistics" depends on DEBUG_KERNEL && PROC_FS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message
[tip:timers/core] clocksource: Add 'max_cycles' to ' struct clocksource'
Commit-ID: fb82fe2fe8588745edd73aa3a6229facac5c1e15 Gitweb: http://git.kernel.org/tip/fb82fe2fe8588745edd73aa3a6229facac5c1e15 Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:31 -0700 Committer: Ingo Molnar CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Add 'max_cycles' to 'struct clocksource' In order to facilitate clocksource validation, add a 'max_cycles' field to the clocksource structure which will hold the maximum cycle value that can safely be multiplied without potentially causing an overflow. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- include/linux/clocksource.h | 5 +++-- kernel/time/clocksource.c | 28 kernel/time/sched_clock.c | 2 +- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 9c78d15..16d048c 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -56,6 +56,7 @@ struct module; * @shift: cycle to nanosecond divisor (power of two) * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @maxadj:maximum adjustment value to mult (~11%) + * @max_cycles:maximum safe cycle value which won't overflow on multiplication * @flags: flags describing special properties * @archdata: arch-specific data * @suspend: suspend function for the clocksource, if necessary @@ -76,7 +77,7 @@ struct clocksource { #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA struct arch_clocksource_data archdata; #endif - + u64 max_cycles; const char *name; struct list_head list; int rating; @@ -189,7 +190,7 @@ extern struct clocksource * __init clocksource_default_clock(void); extern void clocksource_mark_unstable(struct clocksource *cs); extern u64 -clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask); +clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cycles); extern void clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ace9576..fc2a9de 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -469,11 +469,13 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @shift: cycle to nanosecond divisor (power of two) * @maxadj:maximum adjustment value to mult (~11%) * @mask: bitmask for two's complement subtraction of non 64 bit counters + * @max_cyc: maximum cycle value before potential overflow (does not include + * any safety margin) * * NOTE: This function includes a safety margin of 50%, so that bad clock values * can be detected. */ -u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) +u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc) { u64 max_nsecs, max_cycles; @@ -493,6 +495,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) max_cycles = min(max_cycles, mask); max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift); + /* return the max_cycles value as well if requested */ + if (max_cyc) + *max_cyc = max_cycles; + /* Return 50% of the actual maximum, so we can detect bad values */ max_nsecs >>= 1; @@ -500,17 +506,15 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) } /** - * clocksource_max_deferment - Returns max time the clocksource should be deferred - * @cs: Pointer to clocksource + * clocksource_update_max_deferment - Updates the clocksource max_idle_ns & max_cycles + * @cs: Pointer to clocksource to be updated * */ -static u64 clocksource_max_deferment(struct clocksource *cs) +static inline void clocksource_update_max_deferment(struct clocksource *cs) { - u64 max_nsecs; - - max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj, - cs->mask); - return max_nsecs; + cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift, + cs->maxadj, cs->mask, + >max_cycles); } #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET @@ -684,7 +688,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) cs->maxadj = clocksource_max_adjustment(cs); } - cs->max_idle_ns = clocksource_max_deferment(cs); + clocksource_update_max_deferment(cs); } EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale); @@ -730,8 +734,8 @@ int clocksource_register(struct clocksource
[tip:timers/core] clocksource: Simplify the clocks_calc_max_nsecs () logic
Commit-ID: 6086e346fdea1ae64d974c94c1acacc2605567ae Gitweb: http://git.kernel.org/tip/6086e346fdea1ae64d974c94c1acacc2605567ae Author: John Stultz AuthorDate: Wed, 11 Mar 2015 21:16:29 -0700 Committer: Ingo Molnar CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Simplify the clocks_calc_max_nsecs() logic The previous clocks_calc_max_nsecs() code had some unecessarily complex bit logic to find the max interval that could cause multiplication overflows. Since this is not in the hot path, just do the divide to make it easier to read. The previous implementation also had a subtle issue that it avoided overflows with signed 64-bit values, where as the intervals are always unsigned. This resulted in overly conservative intervals, which other safety margins were then added to, reducing the intended interval length. Signed-off-by: John Stultz Cc: Dave Jones Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/clocksource.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4892352..2148f41 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -476,19 +476,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) /* * Calculate the maximum number of cycles that we can pass to the -* cyc2ns function without overflowing a 64-bit signed result. The -* maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj) -* which is equivalent to the below. -* max_cycles < (2^63)/(mult + maxadj) -* max_cycles < 2^(log2((2^63)/(mult + maxadj))) -* max_cycles < 2^(log2(2^63) - log2(mult + maxadj)) -* max_cycles < 2^(63 - log2(mult + maxadj)) -* max_cycles < 1 << (63 - log2(mult + maxadj)) -* Please note that we add 1 to the result of the log2 to account for -* any rounding errors, ensure the above inequality is satisfied and -* no overflow will occur. +* cyc2ns() function without overflowing a 64-bit result. */ - max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1)); + max_cycles = ULLONG_MAX; + do_div(max_cycles, mult+maxadj); /* * The actual maximum number of cycles we can defer the clocksource is -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Add warnings when overflows or underflows are observed
Commit-ID: 4ca22c2648f9c1cec0b242f58d7302136f5a4cbb Gitweb: http://git.kernel.org/tip/4ca22c2648f9c1cec0b242f58d7302136f5a4cbb Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:35 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:05 +0100 timekeeping: Add warnings when overflows or underflows are observed It was suggested that the underflow/overflow protection should probably throw some sort of warning out, rather than just silently fixing the issue. So this patch adds some warnings here. The flag variables used are not protected by locks, but since we can't print from the reading functions, just being able to say we saw an issue in the update interval is useful enough, and can be slightly racy without real consequence. The big complication is that we're only under a read seqlock, so the data could shift under us during our calculation to see if there was a problem. This patch avoids this issue by nesting another seqlock which allows us to snapshot the just required values atomically. So we shouldn't see false positives. I also added some basic rate-limiting here, since on one build machine w/ skewed TSCs it was fairly noisy at bootup. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 64 +-- 1 file changed, 57 insertions(+), 7 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 187149b..892f6cb 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -119,6 +119,20 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) } #ifdef CONFIG_DEBUG_TIMEKEEPING +#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ +/* + * These simple flag variables are managed + * without locks, which is racy, but ok since + * we don't really care about being super + * precise about how many events were seen, + * just that a problem was observed. + */ +static int timekeeping_underflow_seen; +static int timekeeping_overflow_seen; + +/* last_warning is only modified under the timekeeping lock */ +static long timekeeping_last_warning; + static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) { @@ -136,28 +150,64 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) printk_deferred( timekeeping: Your kernel is still fine, but is feeling a bit nervous\n); } } + + if (timekeeping_underflow_seen) { + if (jiffies - timekeeping_last_warning WARNING_FREQ) { + printk_deferred(WARNING: Underflow in clocksource '%s' observed, time update ignored.\n, name); + printk_deferred( Please report this, consider using a different clocksource, if possible.\n); + printk_deferred( Your kernel is probably still fine.\n); + timekeeping_last_warning = jiffies; + } + timekeeping_underflow_seen = 0; + } + + if (timekeeping_overflow_seen) { + if (jiffies - timekeeping_last_warning WARNING_FREQ) { + printk_deferred(WARNING: Overflow in clocksource '%s' observed, time update capped.\n, name); + printk_deferred( Please report this, consider using a different clocksource, if possible.\n); + printk_deferred( Your kernel is probably still fine.\n); + timekeeping_last_warning = jiffies; + } + timekeeping_overflow_seen = 0; + } } static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) { - cycle_t cycle_now, delta; + cycle_t now, last, mask, max, delta; + unsigned int seq; - /* read clocksource */ - cycle_now = tkr-read(tkr-clock); + /* +* Since we're called holding a seqlock, the data may shift +* under us while we're doing the calculation. This can cause +* false positives, since we'd note a problem but throw the +* results away. So nest another seqlock here to atomically +* grab the points we are checking with. +*/ + do { + seq = read_seqcount_begin(tk_core.seq); + now = tkr-read(tkr-clock); + last = tkr-cycle_last; + mask = tkr-mask; + max = tkr-clock-max_cycles; + } while (read_seqcount_retry(tk_core.seq,
[tip:timers/core] timekeeping: Try to catch clocksource delta underflows
Commit-ID: 057b87e3161d1194a095718f9918c01b2c389e74 Gitweb: http://git.kernel.org/tip/057b87e3161d1194a095718f9918c01b2c389e74 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:34 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:05 +0100 timekeeping: Try to catch clocksource delta underflows In the case where there is a broken clocksource where there are multiple actual clocks that aren't perfectly aligned, we may see small negative deltas when we subtract 'now' from 'cycle_last'. The values are actually negative with respect to the clocksource mask value, not necessarily negative if cast to a s64, but we can check by checking the delta to see if it is a small (relative to the mask) negative value (again negative relative to the mask). If so, we assume we jumped backwards somehow and instead use zero for our delta. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 657414c..187149b 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -148,6 +148,13 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) /* calculate the delta since the last update_wall_time */ delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask); + /* +* Try to catch underflows by checking if we are seeing small +* mask-relative negative values. +*/ + if (unlikely((~delta tkr-mask) (tkr-mask 3))) + delta = 0; + /* Cap delta value to the max_cycles values to avoid mult overflows */ if (unlikely(delta tkr-clock-max_cycles)) delta = tkr-clock-max_cycles; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource, sparc32: Convert to using clocksource_register_hz()
Commit-ID: 3142f76022fe46f6e0a0d3940b23fb6ccb794692 Gitweb: http://git.kernel.org/tip/3142f76022fe46f6e0a0d3940b23fb6ccb794692 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:38 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:07 +0100 clocksource, sparc32: Convert to using clocksource_register_hz() While cleaning up some clocksource code, I noticed the time_32 implementation uses the clocksource_hz2mult() helper, but doesn't use the clocksource_register_hz() method. I don't believe the Sparc clocksource is a default clocksource, so we shouldn't need to self-define the mult/shift pair. So convert the time_32.c implementation to use clocksource_register_hz(). Untested. Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: David S. Miller da...@davemloft.net Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-11-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/sparc/kernel/time_32.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c index a31c0c8..18147a5 100644 --- a/arch/sparc/kernel/time_32.c +++ b/arch/sparc/kernel/time_32.c @@ -181,17 +181,13 @@ static struct clocksource timer_cs = { .rating = 100, .read = timer_cs_read, .mask = CLOCKSOURCE_MASK(64), - .shift = 2, .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; static __init int setup_timer_cs(void) { timer_cs_enabled = 1; - timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate, - timer_cs.shift); - - return __clocksource_register(timer_cs); + return clocksource_register_hz(timer_cs, sparc_config.clock_rate); } #ifdef CONFIG_SMP -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Improve clocksource watchdog reporting
Commit-ID: 0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8 Gitweb: http://git.kernel.org/tip/0b046b217ad4c64fbbeaaac24d0648cb1fa49ad8 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:36 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:06 +0100 clocksource: Improve clocksource watchdog reporting The clocksource watchdog reporting has been less helpful then desired, as it just printed the delta between the two clocksources. This prevents any useful analysis of why the skew occurred. Thus this patch tries to improve the output when we mark a clocksource as unstable, printing out the cycle last and now values for both the current clocksource and the watchdog clocksource. This will allow us to see if the result was due to a false positive caused by a problematic watchdog. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stu...@linaro.org [ Minor cleanups of kernel messages. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/clocksource.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index fc2a9de..c4cc04b 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -142,13 +142,6 @@ static void __clocksource_unstable(struct clocksource *cs) schedule_work(watchdog_work); } -static void clocksource_unstable(struct clocksource *cs, int64_t delta) -{ - printk(KERN_WARNING Clocksource %s unstable (delta = %Ld ns)\n, - cs-name, delta); - __clocksource_unstable(cs); -} - /** * clocksource_mark_unstable - mark clocksource unstable via watchdog * @cs:clocksource to be marked unstable @@ -174,7 +167,7 @@ void clocksource_mark_unstable(struct clocksource *cs) static void clocksource_watchdog(unsigned long data) { struct clocksource *cs; - cycle_t csnow, wdnow, delta; + cycle_t csnow, wdnow, cslast, wdlast, delta; int64_t wd_nsec, cs_nsec; int next_cpu, reset_pending; @@ -213,6 +206,8 @@ static void clocksource_watchdog(unsigned long data) delta = clocksource_delta(csnow, cs-cs_last, cs-mask); cs_nsec = clocksource_cyc2ns(delta, cs-mult, cs-shift); + wdlast = cs-wd_last; /* save these in case we print them */ + cslast = cs-cs_last; cs-cs_last = csnow; cs-wd_last = wdnow; @@ -221,7 +216,12 @@ static void clocksource_watchdog(unsigned long data) /* Check the deviation from the watchdog clocksource. */ if ((abs(cs_nsec - wd_nsec) WATCHDOG_THRESHOLD)) { - clocksource_unstable(cs, cs_nsec - wd_nsec); + pr_warn(timekeeping watchdog: Marking clocksource '%s' as unstable, because the skew is too large:\n, cs-name); + pr_warn( '%s' wd_now: %llx wd_last: %llx mask: %llx\n, + watchdog-name, wdnow, wdlast, watchdog-mask); + pr_warn( '%s' cs_now: %llx cs_last: %llx mask: %llx\n, + cs-name, csnow, cslast, cs-mask); + __clocksource_unstable(cs); continue; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value
Commit-ID: a558cd021d83b65c47ee5b9bec1fcfe5298a769f Gitweb: http://git.kernel.org/tip/a558cd021d83b65c47ee5b9bec1fcfe5298a769f Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:33 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:04 +0100 timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value When calculating the current delta since the last tick, we currently have no hard protections to prevent a multiplication overflow from occuring. This patch introduces infrastructure to allow a cap that limits the clocksource read delta value to the 'max_cycles' value, which is where an overflow would occur. Since this is in the hotpath, it adds the extra checking under CONFIG_DEBUG_TIMEKEEPING=y. There was some concern that capping time like this could cause problems as we may stop expiring timers, which could go circular if the timer that triggers time accumulation were mis-scheduled too far in the future, which would cause time to stop. However, since the mult overflow would result in a smaller time value, we would effectively have the same problem there. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 49 +-- 1 file changed, 35 insertions(+), 14 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index acf0491..657414c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -126,9 +126,9 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) const char *name = tk-tkr.clock-name; if (offset max_cycles) { - printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow\n, + printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow danger\n, offset, name, max_cycles); - printk_deferred( timekeeping: Your kernel is sick, but tries to cope\n); + printk_deferred( timekeeping: Your kernel is sick, but tries to cope by capping time updates\n); } else { if (offset (max_cycles 1)) { printk_deferred(INFO: timekeeping: Cycle offset (%lld) is larger than the the '%s' clock's 50%% safety margin (%lld)\n, @@ -137,10 +137,39 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) } } } + +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) +{ + cycle_t cycle_now, delta; + + /* read clocksource */ + cycle_now = tkr-read(tkr-clock); + + /* calculate the delta since the last update_wall_time */ + delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask); + + /* Cap delta value to the max_cycles values to avoid mult overflows */ + if (unlikely(delta tkr-clock-max_cycles)) + delta = tkr-clock-max_cycles; + + return delta; +} #else static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) { } +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) +{ + cycle_t cycle_now, delta; + + /* read clocksource */ + cycle_now = tkr-read(tkr-clock); + + /* calculate the delta since the last update_wall_time */ + delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask); + + return delta; +} #endif /** @@ -218,14 +247,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; } static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) { - cycle_t cycle_now, delta; + cycle_t delta; s64 nsec; - /* read clocksource: */ - cycle_now = tkr-read(tkr-clock); - - /* calculate the delta since the last update_wall_time: */ - delta = clocksource_delta(cycle_now, tkr-cycle_last, tkr-mask); + delta = timekeeping_get_delta(tkr); nsec = delta * tkr-mult + tkr-xtime_nsec; nsec = tkr-shift; @@ -237,14 +262,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) { struct clocksource *clock = tk-tkr.clock; - cycle_t cycle_now, delta; + cycle_t delta; s64 nsec; - /* read clocksource: */ - cycle_now = tk-tkr.read(clock); - - /* calculate the delta since
[tip:timers/core] clocksource: Simplify the logic around clocksource wrapping safety margins
Commit-ID: 362fde0410377e468ca00ad363fdf3e3ec42eb6a Gitweb: http://git.kernel.org/tip/362fde0410377e468ca00ad363fdf3e3ec42eb6a Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:30 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Simplify the logic around clocksource wrapping safety margins The clocksource logic has a number of places where we try to include a safety margin. Most of these are 12% safety margins, but they are inconsistently applied and sometimes are applied on top of each other. Additionally, in the previous patch, we corrected an issue where we unintentionally in effect created a 50% safety margin, which these 12.5% margins where then added to. So to simplify the logic here, this patch removes the various 12.5% margins, and consolidates adding the margin in one place: clocks_calc_max_nsecs(). Additionally, Linus prefers a 50% safety margin, as it allows bad clock values to be more easily caught. This should really have no net effect, due to the corrected issue earlier which caused greater then 50% margins to be used w/o issue. Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: Stephen Boyd sb...@codeaurora.org (for the sched_clock.c bit) Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/clocksource.c | 26 -- kernel/time/sched_clock.c | 4 ++-- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 2148f41..ace9576 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @shift: cycle to nanosecond divisor (power of two) * @maxadj:maximum adjustment value to mult (~11%) * @mask: bitmask for two's complement subtraction of non 64 bit counters + * + * NOTE: This function includes a safety margin of 50%, so that bad clock values + * can be detected. */ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) { @@ -490,11 +493,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) max_cycles = min(max_cycles, mask); max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift); + /* Return 50% of the actual maximum, so we can detect bad values */ + max_nsecs = 1; + return max_nsecs; } /** - * clocksource_max_deferment - Returns max time the clocksource can be deferred + * clocksource_max_deferment - Returns max time the clocksource should be deferred * @cs: Pointer to clocksource * */ @@ -504,13 +510,7 @@ static u64 clocksource_max_deferment(struct clocksource *cs) max_nsecs = clocks_calc_max_nsecs(cs-mult, cs-shift, cs-maxadj, cs-mask); - /* -* To ensure that the clocksource does not wrap whilst we are idle, -* limit the time the clocksource can be deferred by 12.5%. Please -* note a margin of 12.5% is used because this can be computed with -* a shift, versus say 10% which would require division. -*/ - return max_nsecs - (max_nsecs 3); + return max_nsecs; } #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET @@ -659,10 +659,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) * conversion precision. 10 minutes is still a reasonable * amount. That results in a shift value of 24 for a * clocksource with mask = 40bit and f = 4GHz. That maps to -* ~ 0.06ppm granularity for NTP. We apply the same 12.5% -* margin as we do in clocksource_max_deferment() +* ~ 0.06ppm granularity for NTP. */ - sec = (cs-mask - (cs-mask 3)); + sec = cs-mask; do_div(sec, freq); do_div(sec, scale); if (!sec) @@ -674,9 +673,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) NSEC_PER_SEC / scale, sec * scale); /* -* for clocksources that have large mults, to avoid overflow. -* Since mult may be adjusted by ntp, add an safety extra margin -* +* Ensure clocksources that have large 'mult' values don't overflow +* when adjusted. */ cs-maxadj = clocksource_max_adjustment(cs); while ((cs-mult + cs-maxadj cs-mult) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 01d2d15..3b8ae45 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -125,9 +125,9 @@ void __init
[tip:timers/core] timekeeping: Add debugging checks to warn if we see delays
Commit-ID: 3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5 Gitweb: http://git.kernel.org/tip/3c17ad19f0697ffe5ef7438cdafc2d2b7757d8a5 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:32 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:06:58 +0100 timekeeping: Add debugging checks to warn if we see delays Recently there's been requests for better sanity checking in the time code, so that it's more clear when something is going wrong, since timekeeping issues could manifest in a large number of strange ways in various subsystems. Thus, this patch adds some extra infrastructure to add a check to update_wall_time() to print two new warnings: 1) if we see the call delayed beyond the 'max_cycles' overflow point, 2) or if we see the call delayed beyond the clocksource's 'max_idle_ns' value, which is currently 50% of the overflow point. This extra infrastructure is conditional on a new CONFIG_DEBUG_TIMEKEEPING option, also added in this patch - default off. Tested this a bit by halting qemu for specified lengths of time to trigger the warnings. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stu...@linaro.org [ Improved the changelog and the messages a bit. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/jiffies.c | 1 + kernel/time/timekeeping.c | 28 lib/Kconfig.debug | 13 + 3 files changed, 42 insertions(+) diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c index a6a5bf5..7e41390 100644 --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = { .mask = 0x, /*32bits*/ .mult = NSEC_PER_JIFFY JIFFIES_SHIFT, /* details above */ .shift = JIFFIES_SHIFT, + .max_cycles = 10, }; __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 91db941..acf0491 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -118,6 +118,31 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) tk-offs_boot = ktime_add(tk-offs_boot, delta); } +#ifdef CONFIG_DEBUG_TIMEKEEPING +static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) +{ + + cycle_t max_cycles = tk-tkr.clock-max_cycles; + const char *name = tk-tkr.clock-name; + + if (offset max_cycles) { + printk_deferred(WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow\n, + offset, name, max_cycles); + printk_deferred( timekeeping: Your kernel is sick, but tries to cope\n); + } else { + if (offset (max_cycles 1)) { + printk_deferred(INFO: timekeeping: Cycle offset (%lld) is larger than the the '%s' clock's 50%% safety margin (%lld)\n, + offset, name, max_cycles 1); + printk_deferred( timekeeping: Your kernel is still fine, but is feeling a bit nervous\n); + } + } +} +#else +static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) +{ +} +#endif + /** * tk_setup_internals - Set up internals to use clocksource clock. * @@ -1630,6 +1655,9 @@ void update_wall_time(void) if (offset real_tk-cycle_interval) goto out; + /* Do some additional sanity checking */ + timekeeping_check_update(real_tk, offset); + /* * With NO_HZ we may have to accumulate many cycle_intervals * (think ticks) worth of time at once. To do this efficiently, diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c5cefb3..36b6fa8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -865,6 +865,19 @@ config SCHED_STACK_END_CHECK data corruption or a sporadic crash at a later stage once the region is examined. The runtime overhead introduced is minimal. +config DEBUG_TIMEKEEPING + bool Enable extra timekeeping sanity checking + help + This option will enable additional timekeeping sanity checks + which may be helpful when diagnosing issues where timekeeping + problems are suspected. + + This may include checks in the timekeeping hotpaths, so this + option may have a (very small) performance impact to some + workloads. + + If unsure, say N. + config
[tip:timers/core] clocksource: Add 'max_cycles' to ' struct clocksource'
Commit-ID: fb82fe2fe8588745edd73aa3a6229facac5c1e15 Gitweb: http://git.kernel.org/tip/fb82fe2fe8588745edd73aa3a6229facac5c1e15 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:31 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Add 'max_cycles' to 'struct clocksource' In order to facilitate clocksource validation, add a 'max_cycles' field to the clocksource structure which will hold the maximum cycle value that can safely be multiplied without potentially causing an overflow. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/clocksource.h | 5 +++-- kernel/time/clocksource.c | 28 kernel/time/sched_clock.c | 2 +- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 9c78d15..16d048c 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -56,6 +56,7 @@ struct module; * @shift: cycle to nanosecond divisor (power of two) * @max_idle_ns: max idle time permitted by the clocksource (nsecs) * @maxadj:maximum adjustment value to mult (~11%) + * @max_cycles:maximum safe cycle value which won't overflow on multiplication * @flags: flags describing special properties * @archdata: arch-specific data * @suspend: suspend function for the clocksource, if necessary @@ -76,7 +77,7 @@ struct clocksource { #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA struct arch_clocksource_data archdata; #endif - + u64 max_cycles; const char *name; struct list_head list; int rating; @@ -189,7 +190,7 @@ extern struct clocksource * __init clocksource_default_clock(void); extern void clocksource_mark_unstable(struct clocksource *cs); extern u64 -clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask); +clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cycles); extern void clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ace9576..fc2a9de 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -469,11 +469,13 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) * @shift: cycle to nanosecond divisor (power of two) * @maxadj:maximum adjustment value to mult (~11%) * @mask: bitmask for two's complement subtraction of non 64 bit counters + * @max_cyc: maximum cycle value before potential overflow (does not include + * any safety margin) * * NOTE: This function includes a safety margin of 50%, so that bad clock values * can be detected. */ -u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) +u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc) { u64 max_nsecs, max_cycles; @@ -493,6 +495,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) max_cycles = min(max_cycles, mask); max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift); + /* return the max_cycles value as well if requested */ + if (max_cyc) + *max_cyc = max_cycles; + /* Return 50% of the actual maximum, so we can detect bad values */ max_nsecs = 1; @@ -500,17 +506,15 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) } /** - * clocksource_max_deferment - Returns max time the clocksource should be deferred - * @cs: Pointer to clocksource + * clocksource_update_max_deferment - Updates the clocksource max_idle_ns max_cycles + * @cs: Pointer to clocksource to be updated * */ -static u64 clocksource_max_deferment(struct clocksource *cs) +static inline void clocksource_update_max_deferment(struct clocksource *cs) { - u64 max_nsecs; - - max_nsecs = clocks_calc_max_nsecs(cs-mult, cs-shift, cs-maxadj, - cs-mask); - return max_nsecs; + cs-max_idle_ns = clocks_calc_max_nsecs(cs-mult, cs-shift, + cs-maxadj, cs-mask, + cs-max_cycles); } #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET @@ -684,7 +688,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) cs-maxadj = clocksource_max_adjustment(cs); } -
[tip:timers/core] clocksource: Add some debug info about clocksources being registered
Commit-ID: 8cc8c525ad4e7b581cacf84119e1a28dcb4044db Gitweb: http://git.kernel.org/tip/8cc8c525ad4e7b581cacf84119e1a28dcb4044db Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:39 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:07 +0100 clocksource: Add some debug info about clocksources being registered Print the mask, max_cycles, and max_idle_ns values for clocksources being registered. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/clocksource.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 5cdf17e..1977eba 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -703,6 +703,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) cs-name); clocksource_update_max_deferment(cs); + + pr_info(clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n, + cs-name, cs-mask, cs-max_cycles, cs-max_idle_ns); } EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] clocksource: Mostly kill clocksource_register()
Commit-ID: f8935983f110505daa38e8d36ee406807f83a069 Gitweb: http://git.kernel.org/tip/f8935983f110505daa38e8d36ee406807f83a069 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:37 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:06 +0100 clocksource: Mostly kill clocksource_register() A long running project has been to clean up remaining uses of clocksource_register(), replacing it with the simpler clocksource_register_khz/hz() functions. However, there are a few cases where we need to self-define our mult/shift values, so switch the function to a more obviously internal __clocksource_register() name, and consolidate much of the internal logic so we don't have duplication. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: David S. Miller da...@davemloft.net Cc: Linus Torvalds torva...@linux-foundation.org Cc: Martin Schwidefsky schwidef...@de.ibm.com Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stu...@linaro.org [ Minor cleanups. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/s390/kernel/time.c | 2 +- arch/sparc/kernel/time_32.c | 2 +- include/linux/clocksource.h | 10 +- kernel/time/clocksource.c | 81 +++-- kernel/time/jiffies.c | 4 +-- 5 files changed, 47 insertions(+), 52 deletions(-) diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c index 20660dd..6c273cd 100644 --- a/arch/s390/kernel/time.c +++ b/arch/s390/kernel/time.c @@ -283,7 +283,7 @@ void __init time_init(void) if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt)) panic(Couldn't request external interrupt 0x1406); - if (clocksource_register(clocksource_tod) != 0) + if (__clocksource_register(clocksource_tod) != 0) panic(Could not register TOD clock source); /* Enable TOD clock interrupts on the boot cpu. */ diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c index 2f80d23..a31c0c8 100644 --- a/arch/sparc/kernel/time_32.c +++ b/arch/sparc/kernel/time_32.c @@ -191,7 +191,7 @@ static __init int setup_timer_cs(void) timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate, timer_cs.shift); - return clocksource_register(timer_cs); + return __clocksource_register(timer_cs); } #ifdef CONFIG_SMP diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 16d048c..bd98eaa 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -179,7 +179,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift) } -extern int clocksource_register(struct clocksource*); extern int clocksource_unregister(struct clocksource*); extern void clocksource_touch_watchdog(void); extern struct clocksource* clocksource_get_next(void); @@ -203,6 +202,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq); extern void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq); +/* + * Don't call this unless you are a default clocksource + * (AKA: jiffies) and absolutely have to. + */ +static inline int __clocksource_register(struct clocksource *cs) +{ + return __clocksource_register_scale(cs, 1, 0); +} + static inline int clocksource_register_hz(struct clocksource *cs, u32 hz) { return __clocksource_register_scale(cs, 1, hz); diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c4cc04b..5cdf17e 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -656,38 +656,52 @@ static void clocksource_enqueue(struct clocksource *cs) void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq) { u64 sec; + /* -* Calc the maximum number of seconds which we can run before -* wrapping around. For clocksources which have a mask 32bit -* we need to limit the max sleep time to have a good -* conversion precision. 10 minutes is still a reasonable -* amount. That results in a shift value of 24 for a -* clocksource with mask = 40bit and f = 4GHz. That maps to -* ~ 0.06ppm granularity for NTP. +* Default clocksources are *special* and self-define their mult/shift. +* But, you're not special, so you should specify a freq value. */ - sec = cs-mask; - do_div(sec, freq); - do_div(sec, scale); - if (!sec) - sec = 1; - else if (sec 600 cs-mask UINT_MAX) - sec = 600; - - clocks_calc_mult_shift(cs-mult, cs-shift, freq, - NSEC_PER_SEC /
[tip:timers/core] clocksource: Rename __clocksource_updatefreq_*( ) to __clocksource_update_freq_*()
Commit-ID: fba9e07208c0f9d92d9f73761c99c8612039da44 Gitweb: http://git.kernel.org/tip/fba9e07208c0f9d92d9f73761c99c8612039da44 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:40 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 13 Mar 2015 08:07:08 +0100 clocksource: Rename __clocksource_updatefreq_*() to __clocksource_update_freq_*() Ingo requested this function be renamed to improve readability, so I've renamed __clocksource_updatefreq_scale() as well as the __clocksource_updatefreq_hz/khz() functions to avoid squishedtogethernames. This touches some of the sh clocksources, which I've not tested. The arch/arm/plat-omap change is just a comment change for consistency. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Daniel Lezcano daniel.lezc...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/arm/plat-omap/counter_32k.c | 2 +- drivers/clocksource/em_sti.c | 2 +- drivers/clocksource/sh_cmt.c | 2 +- drivers/clocksource/sh_tmu.c | 2 +- include/linux/clocksource.h | 10 +- kernel/time/clocksource.c| 11 ++- 6 files changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c index 61b4d70..43cf745 100644 --- a/arch/arm/plat-omap/counter_32k.c +++ b/arch/arm/plat-omap/counter_32k.c @@ -103,7 +103,7 @@ int __init omap_init_clocksource_32k(void __iomem *vbase) /* * 12 rough estimate from the calculations in -* __clocksource_updatefreq_scale. +* __clocksource_update_freq_scale. */ clocks_calc_mult_shift(persistent_mult, persistent_shift, 32768, NSEC_PER_SEC, 12); diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c index d0a7bd6..dc3c6ee 100644 --- a/drivers/clocksource/em_sti.c +++ b/drivers/clocksource/em_sti.c @@ -210,7 +210,7 @@ static int em_sti_clocksource_enable(struct clocksource *cs) ret = em_sti_start(p, USER_CLOCKSOURCE); if (!ret) - __clocksource_updatefreq_hz(cs, p-rate); + __clocksource_update_freq_hz(cs, p-rate); return ret; } diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c index 2bd13b5..b8ff3c6 100644 --- a/drivers/clocksource/sh_cmt.c +++ b/drivers/clocksource/sh_cmt.c @@ -641,7 +641,7 @@ static int sh_cmt_clocksource_enable(struct clocksource *cs) ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE); if (!ret) { - __clocksource_updatefreq_hz(cs, ch-rate); + __clocksource_update_freq_hz(cs, ch-rate); ch-cs_enabled = true; } return ret; diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c index f150ca82..b6b8fa3 100644 --- a/drivers/clocksource/sh_tmu.c +++ b/drivers/clocksource/sh_tmu.c @@ -272,7 +272,7 @@ static int sh_tmu_clocksource_enable(struct clocksource *cs) ret = sh_tmu_enable(ch); if (!ret) { - __clocksource_updatefreq_hz(cs, ch-rate); + __clocksource_update_freq_hz(cs, ch-rate); ch-cs_enabled = true; } diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index bd98eaa..1355098 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -200,7 +200,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec); extern int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq); extern void -__clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq); +__clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq); /* * Don't call this unless you are a default clocksource @@ -221,14 +221,14 @@ static inline int clocksource_register_khz(struct clocksource *cs, u32 khz) return __clocksource_register_scale(cs, 1000, khz); } -static inline void __clocksource_updatefreq_hz(struct clocksource *cs, u32 hz) +static inline void __clocksource_update_freq_hz(struct clocksource *cs, u32 hz) { - __clocksource_updatefreq_scale(cs, 1, hz); + __clocksource_update_freq_scale(cs, 1, hz); } -static inline void __clocksource_updatefreq_khz(struct clocksource *cs, u32 khz) +static inline void __clocksource_update_freq_khz(struct clocksource *cs, u32 khz) { - __clocksource_updatefreq_scale(cs, 1000, khz); + __clocksource_update_freq_scale(cs, 1000, khz); } diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 1977eba..c3be3c7 100644
[tip:timers/core] clocksource: Simplify the clocks_calc_max_nsecs () logic
Commit-ID: 6086e346fdea1ae64d974c94c1acacc2605567ae Gitweb: http://git.kernel.org/tip/6086e346fdea1ae64d974c94c1acacc2605567ae Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Mar 2015 21:16:29 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 12 Mar 2015 10:16:38 +0100 clocksource: Simplify the clocks_calc_max_nsecs() logic The previous clocks_calc_max_nsecs() code had some unecessarily complex bit logic to find the max interval that could cause multiplication overflows. Since this is not in the hot path, just do the divide to make it easier to read. The previous implementation also had a subtle issue that it avoided overflows with signed 64-bit values, where as the intervals are always unsigned. This resulted in overly conservative intervals, which other safety margins were then added to, reducing the intended interval length. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Dave Jones da...@codemonkey.org.uk Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Cc: Stephen Boyd sb...@codeaurora.org Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/clocksource.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4892352..2148f41 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -476,19 +476,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) /* * Calculate the maximum number of cycles that we can pass to the -* cyc2ns function without overflowing a 64-bit signed result. The -* maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj) -* which is equivalent to the below. -* max_cycles (2^63)/(mult + maxadj) -* max_cycles 2^(log2((2^63)/(mult + maxadj))) -* max_cycles 2^(log2(2^63) - log2(mult + maxadj)) -* max_cycles 2^(63 - log2(mult + maxadj)) -* max_cycles 1 (63 - log2(mult + maxadj)) -* Please note that we add 1 to the result of the log2 to account for -* any rounding errors, ensure the above inequality is satisfied and -* no overflow will occur. +* cyc2ns() function without overflowing a 64-bit result. */ - max_cycles = 1ULL (63 - (ilog2(mult + maxadj) + 1)); + max_cycles = ULLONG_MAX; + do_div(max_cycles, mult+maxadj); /* * The actual maximum number of cycles we can defer the clocksource is -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] ntp: Fixup adjtimex freq validation on 32-bit systems
Commit-ID: 29183a70b0b828500816bd794b3fe192fce89f73 Gitweb: http://git.kernel.org/tip/29183a70b0b828500816bd794b3fe192fce89f73 Author: John Stultz AuthorDate: Mon, 9 Feb 2015 23:30:36 -0800 Committer: Ingo Molnar CommitDate: Wed, 18 Feb 2015 14:50:10 +0100 ntp: Fixup adjtimex freq validation on 32-bit systems Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer Reported-by: George Joseph Tested-by: George Joseph Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra (Intel) Cc: # v3.19+ Cc: Linus Torvalds Cc: Sasha Levin Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stu...@linaro.org [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar --- kernel/time/ntp.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 4b585e0..0f60b08 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -633,10 +633,14 @@ int ntp_validate_timex(struct timex *txc) if ((txc->modes & ADJ_SETOFFSET) && (!capable(CAP_SYS_TIME))) return -EPERM; - if (txc->modes & ADJ_FREQUENCY) { - if (LONG_MIN / PPM_SCALE > txc->freq) + /* +* Check for potential multiplication overflows that can +* only happen on 64-bit systems: +*/ + if ((txc->modes & ADJ_FREQUENCY) && (BITS_PER_LONG == 64)) { + if (LLONG_MIN / PPM_SCALE > txc->freq) return -EINVAL; - if (LONG_MAX / PPM_SCALE < txc->freq) + if (LLONG_MAX / PPM_SCALE < txc->freq) return -EINVAL; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] ntp: Fixup adjtimex freq validation on 32-bit systems
Commit-ID: 29183a70b0b828500816bd794b3fe192fce89f73 Gitweb: http://git.kernel.org/tip/29183a70b0b828500816bd794b3fe192fce89f73 Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 9 Feb 2015 23:30:36 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 18 Feb 2015 14:50:10 +0100 ntp: Fixup adjtimex freq validation on 32-bit systems Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer jwbo...@fedoraproject.org Reported-by: George Joseph george.jos...@fairview5.com Tested-by: George Joseph george.jos...@fairview5.com Signed-off-by: John Stultz john.stu...@linaro.org Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Cc: sta...@vger.kernel.org # v3.19+ Cc: Linus Torvalds torva...@linux-foundation.org Cc: Sasha Levin sasha.le...@oracle.com Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stu...@linaro.org [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/ntp.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 4b585e0..0f60b08 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -633,10 +633,14 @@ int ntp_validate_timex(struct timex *txc) if ((txc-modes ADJ_SETOFFSET) (!capable(CAP_SYS_TIME))) return -EPERM; - if (txc-modes ADJ_FREQUENCY) { - if (LONG_MIN / PPM_SCALE txc-freq) + /* +* Check for potential multiplication overflows that can +* only happen on 64-bit systems: +*/ + if ((txc-modes ADJ_FREQUENCY) (BITS_PER_LONG == 64)) { + if (LLONG_MIN / PPM_SCALE txc-freq) return -EINVAL; - if (LONG_MAX / PPM_SCALE txc-freq) + if (LLONG_MAX / PPM_SCALE txc-freq) return -EINVAL; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] hrtimer: Fix incorrect tai offset calculation for non high-res timer systems
Commit-ID: 2d926c15d629a13914ce3e5f26354f6a0ac99e70 Gitweb: http://git.kernel.org/tip/2d926c15d629a13914ce3e5f26354f6a0ac99e70 Author: John Stultz AuthorDate: Wed, 4 Feb 2015 16:45:26 -0800 Committer: Ingo Molnar CommitDate: Thu, 5 Feb 2015 08:39:37 +0100 hrtimer: Fix incorrect tai offset calculation for non high-res timer systems I noticed some CLOCK_TAI timer test failures on one of my less-frequently used configurations. And after digging in I found in 76f4108892d9 (Cleanup hrtimer accessors to the timekepeing state), the hrtimer_get_softirq_time tai offset calucation was incorrectly rewritten, as the tai offset we return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME. This results in CLOCK_TAI timers expiring early on non-highres capable machines. This patch fixes the issue, calculating the tai time properly from the monotonic base. Signed-off-by: John Stultz Cc: Thomas Gleixner Cc: stable # 3.17+ Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 37e50aa..d8c724c 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -122,7 +122,7 @@ static void hrtimer_get_softirq_time(struct hrtimer_cpu_base *base) mono = ktime_get_update_offsets_tick(_real, _boot, _tai); boot = ktime_add(mono, off_boot); xtim = ktime_add(mono, off_real); - tai = ktime_add(xtim, off_tai); + tai = ktime_add(mono, off_tai); base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim; base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] hrtimer: Fix incorrect tai offset calculation for non high-res timer systems
Commit-ID: 2d926c15d629a13914ce3e5f26354f6a0ac99e70 Gitweb: http://git.kernel.org/tip/2d926c15d629a13914ce3e5f26354f6a0ac99e70 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 4 Feb 2015 16:45:26 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 5 Feb 2015 08:39:37 +0100 hrtimer: Fix incorrect tai offset calculation for non high-res timer systems I noticed some CLOCK_TAI timer test failures on one of my less-frequently used configurations. And after digging in I found in 76f4108892d9 (Cleanup hrtimer accessors to the timekepeing state), the hrtimer_get_softirq_time tai offset calucation was incorrectly rewritten, as the tai offset we return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME. This results in CLOCK_TAI timers expiring early on non-highres capable machines. This patch fixes the issue, calculating the tai time properly from the monotonic base. Signed-off-by: John Stultz john.stu...@linaro.org Cc: Thomas Gleixner t...@linutronix.de Cc: stable sta...@vger.kernel.org # 3.17+ Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 37e50aa..d8c724c 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -122,7 +122,7 @@ static void hrtimer_get_softirq_time(struct hrtimer_cpu_base *base) mono = ktime_get_update_offsets_tick(off_real, off_boot, off_tai); boot = ktime_add(mono, off_boot); xtim = ktime_add(mono, off_real); - tai = ktime_add(xtim, off_tai); + tai = ktime_add(mono, off_tai); base-clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim; base-clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] time: Fix sign bug in NTP mult overflow warning
Commit-ID: cb2aa63469f81426c7406227be70b628b42f7a05 Gitweb: http://git.kernel.org/tip/cb2aa63469f81426c7406227be70b628b42f7a05 Author: John Stultz AuthorDate: Mon, 24 Nov 2014 20:35:45 -0800 Committer: Ingo Molnar CommitDate: Tue, 25 Nov 2014 07:18:34 +0100 time: Fix sign bug in NTP mult overflow warning In commit 6067dc5a8c2b ("time: Avoid possible NTP adjustment mult overflow") a new check was added to watch for adjustments that could cause a mult overflow. Unfortunately the check compares a signed with unsigned value and ignored the case where the adjustment was negative, which causes spurious warn-ons on some systems (and seems like it would result in problematic time adjustments there as well, due to the early return). Thus this patch adds a check to make sure the adjustment is positive before we check for an overflow, and resovles the issue in my testing. Reported-by: Fengguang Wu Debugged-by: pang.xunlei Signed-off-by: John Stultz Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 29a7d67..2dc0646 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1330,7 +1330,7 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * * XXX - TODO: Doc ntp_error calculation. */ - if (tk->tkr.mult + mult_adj < mult_adj) { + if ((mult_adj > 0) && (tk->tkr.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ WARN_ON_ONCE(1); return; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] time: Fix sign bug in NTP mult overflow warning
Commit-ID: cb2aa63469f81426c7406227be70b628b42f7a05 Gitweb: http://git.kernel.org/tip/cb2aa63469f81426c7406227be70b628b42f7a05 Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 24 Nov 2014 20:35:45 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 25 Nov 2014 07:18:34 +0100 time: Fix sign bug in NTP mult overflow warning In commit 6067dc5a8c2b (time: Avoid possible NTP adjustment mult overflow) a new check was added to watch for adjustments that could cause a mult overflow. Unfortunately the check compares a signed with unsigned value and ignored the case where the adjustment was negative, which causes spurious warn-ons on some systems (and seems like it would result in problematic time adjustments there as well, due to the early return). Thus this patch adds a check to make sure the adjustment is positive before we check for an overflow, and resovles the issue in my testing. Reported-by: Fengguang Wu fengguang...@intel.com Debugged-by: pang.xunlei pang.xun...@linaro.org Signed-off-by: John Stultz john.stu...@linaro.org Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 29a7d67..2dc0646 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1330,7 +1330,7 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * * XXX - TODO: Doc ntp_error calculation. */ - if (tk-tkr.mult + mult_adj mult_adj) { + if ((mult_adj 0) (tk-tkr.mult + mult_adj mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ WARN_ON_ONCE(1); return; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Fixup typo in update_vsyscall_old definition
Commit-ID: 953dec21aed4038464fec02f96a2f1b8701a5bce Gitweb: http://git.kernel.org/tip/953dec21aed4038464fec02f96a2f1b8701a5bce Author: John Stultz AuthorDate: Fri, 25 Jul 2014 21:37:19 -0700 Committer: Thomas Gleixner CommitDate: Wed, 30 Jul 2014 09:26:25 +0200 timekeeping: Fixup typo in update_vsyscall_old definition In commit 4a0e637738f0 ("clocksource: Get rid of cycle_last"), currently in the -tip tree, there was a small typo where cycles_t was used intstead of cycle_t. This broke ppc64 builds. Fix this by using the proper cycle_t type for this usage, in both the definition and the ia64 implementation. Now, having both cycle_t and cycles_t types seems like a very bad idea just asking for these sorts of issues. But that will be a cleanup for another day. Reported-by: Stephen Rothwell Signed-off-by: John Stultz Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1406349439-11785-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- arch/ia64/kernel/time.c | 2 +- include/linux/timekeeper_internal.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 11dc42d..3e71ef8 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -441,7 +441,7 @@ void update_vsyscall_tz(void) } void update_vsyscall_old(struct timespec *wall, struct timespec *wtm, -struct clocksource *c, u32 mult, cycles_t cycle_last) +struct clocksource *c, u32 mult, cycle_t cycle_last) { write_seqcount_begin(_gtod_data.seq); diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e9660e5..95640dc 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -113,7 +113,7 @@ extern void update_vsyscall_tz(void); extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, struct clocksource *c, u32 mult, - cycles_t cycle_last); + cycle_t cycle_last); extern void update_vsyscall_tz(void); #else -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Fixup typo in update_vsyscall_old definition
Commit-ID: 953dec21aed4038464fec02f96a2f1b8701a5bce Gitweb: http://git.kernel.org/tip/953dec21aed4038464fec02f96a2f1b8701a5bce Author: John Stultz john.stu...@linaro.org AuthorDate: Fri, 25 Jul 2014 21:37:19 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Wed, 30 Jul 2014 09:26:25 +0200 timekeeping: Fixup typo in update_vsyscall_old definition In commit 4a0e637738f0 (clocksource: Get rid of cycle_last), currently in the -tip tree, there was a small typo where cycles_t was used intstead of cycle_t. This broke ppc64 builds. Fix this by using the proper cycle_t type for this usage, in both the definition and the ia64 implementation. Now, having both cycle_t and cycles_t types seems like a very bad idea just asking for these sorts of issues. But that will be a cleanup for another day. Reported-by: Stephen Rothwell s...@canb.auug.org.au Signed-off-by: John Stultz john.stu...@linaro.org Cc: Ingo Molnar mi...@kernel.org Cc: H. Peter Anvin h...@zytor.com Cc: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1406349439-11785-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/ia64/kernel/time.c | 2 +- include/linux/timekeeper_internal.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 11dc42d..3e71ef8 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -441,7 +441,7 @@ void update_vsyscall_tz(void) } void update_vsyscall_old(struct timespec *wall, struct timespec *wtm, -struct clocksource *c, u32 mult, cycles_t cycle_last) +struct clocksource *c, u32 mult, cycle_t cycle_last) { write_seqcount_begin(fsyscall_gtod_data.seq); diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index e9660e5..95640dc 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -113,7 +113,7 @@ extern void update_vsyscall_tz(void); extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, struct clocksource *c, u32 mult, - cycles_t cycle_last); + cycle_t cycle_last); extern void update_vsyscall_tz(void); #else -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] alarmtimer: Fix bug where relative alarm timers were treated as absolute
Commit-ID: 16927776ae757d0d132bdbfabbfe2c498342bd59 Gitweb: http://git.kernel.org/tip/16927776ae757d0d132bdbfabbfe2c498342bd59 Author: John Stultz AuthorDate: Mon, 7 Jul 2014 14:06:11 -0700 Committer: Thomas Gleixner CommitDate: Tue, 8 Jul 2014 10:49:36 +0200 alarmtimer: Fix bug where relative alarm timers were treated as absolute Sharvil noticed with the posix timer_settime interface, using the CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users tried to specify a relative time timer, it would incorrectly be treated as absolute regardless of the state of the flags argument. This patch corrects this, properly checking the absolute/relative flag, as well as adds further error checking that no invalid flag bits are set. Reported-by: Sharvil Nanavati Signed-off-by: John Stultz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Prarit Bhargava Cc: Sharvil Nanavati Cc: stable #3.0+ Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/alarmtimer.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 88c9c65..fe75444 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, struct itimerspec *new_setting, struct itimerspec *old_setting) { + ktime_t exp; + if (!rtcdev) return -ENOTSUPP; + if (flags & ~TIMER_ABSTIME) + return -EINVAL; + if (old_setting) alarm_timer_get(timr, old_setting); @@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, /* start the timer */ timr->it.alarm.interval = timespec_to_ktime(new_setting->it_interval); - alarm_start(>it.alarm.alarmtimer, - timespec_to_ktime(new_setting->it_value)); + exp = timespec_to_ktime(new_setting->it_value); + /* Convert (if necessary) to absolute time */ + if (flags != TIMER_ABSTIME) { + ktime_t now; + + now = alarm_bases[timr->it.alarm.alarmtimer.type].gettime(); + exp = ktime_add(now, exp); + } + + alarm_start(>it.alarm.alarmtimer, exp); return 0; } @@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags, if (!alarmtimer_get_rtcdev()) return -ENOTSUPP; + if (flags & ~TIMER_ABSTIME) + return -EINVAL; + if (!capable(CAP_WAKE_ALARM)) return -EPERM; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] alarmtimer: Fix bug where relative alarm timers were treated as absolute
Commit-ID: 16927776ae757d0d132bdbfabbfe2c498342bd59 Gitweb: http://git.kernel.org/tip/16927776ae757d0d132bdbfabbfe2c498342bd59 Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 7 Jul 2014 14:06:11 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 8 Jul 2014 10:49:36 +0200 alarmtimer: Fix bug where relative alarm timers were treated as absolute Sharvil noticed with the posix timer_settime interface, using the CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users tried to specify a relative time timer, it would incorrectly be treated as absolute regardless of the state of the flags argument. This patch corrects this, properly checking the absolute/relative flag, as well as adds further error checking that no invalid flag bits are set. Reported-by: Sharvil Nanavati shar...@google.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@kernel.org Cc: Prarit Bhargava pra...@redhat.com Cc: Sharvil Nanavati shar...@google.com Cc: stable sta...@vger.kernel.org #3.0+ Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/alarmtimer.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 88c9c65..fe75444 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, struct itimerspec *new_setting, struct itimerspec *old_setting) { + ktime_t exp; + if (!rtcdev) return -ENOTSUPP; + if (flags ~TIMER_ABSTIME) + return -EINVAL; + if (old_setting) alarm_timer_get(timr, old_setting); @@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, /* start the timer */ timr-it.alarm.interval = timespec_to_ktime(new_setting-it_interval); - alarm_start(timr-it.alarm.alarmtimer, - timespec_to_ktime(new_setting-it_value)); + exp = timespec_to_ktime(new_setting-it_value); + /* Convert (if necessary) to absolute time */ + if (flags != TIMER_ABSTIME) { + ktime_t now; + + now = alarm_bases[timr-it.alarm.alarmtimer.type].gettime(); + exp = ktime_add(now, exp); + } + + alarm_start(timr-it.alarm.alarmtimer, exp); return 0; } @@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags, if (!alarmtimer_get_rtcdev()) return -ENOTSUPP; + if (flags ~TIMER_ABSTIME) + return -EINVAL; + if (!capable(CAP_WAKE_ALARM)) return -EPERM; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Revert to calling clock_was_set_delayed () while in irq context
Commit-ID: cab5e127eef040399902caa8e1510795583fa03a Gitweb: http://git.kernel.org/tip/cab5e127eef040399902caa8e1510795583fa03a Author: John Stultz AuthorDate: Thu, 27 Mar 2014 16:30:49 -0700 Committer: Ingo Molnar CommitDate: Fri, 28 Mar 2014 08:07:07 +0100 time: Revert to calling clock_was_set_delayed() while in irq context In commit 47a1b796306356f35 ("tick/timekeeping: Call update_wall_time outside the jiffies lock"), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 0aa4ce8..5b40279 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1435,7 +1435,8 @@ void update_wall_time(void) out: raw_spin_unlock_irqrestore(_lock, flags); if (clock_set) - clock_was_set(); + /* Have to call _delayed version, since in irq context*/ + clock_was_set_delayed(); } /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Revert to calling clock_was_set_delayed () while in irq context
Commit-ID: cab5e127eef040399902caa8e1510795583fa03a Gitweb: http://git.kernel.org/tip/cab5e127eef040399902caa8e1510795583fa03a Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 27 Mar 2014 16:30:49 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 28 Mar 2014 08:07:07 +0100 time: Revert to calling clock_was_set_delayed() while in irq context In commit 47a1b796306356f35 (tick/timekeeping: Call update_wall_time outside the jiffies lock), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz john.stu...@linaro.org Cc: Linus Torvalds torva...@linux-foundation.org Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 0aa4ce8..5b40279 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1435,7 +1435,8 @@ void update_wall_time(void) out: raw_spin_unlock_irqrestore(timekeeper_lock, flags); if (clock_set) - clock_was_set(); + /* Have to call _delayed version, since in irq context*/ + clock_was_set_delayed(); } /** -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:core/urgent] seqlock: Use raw_ prefix instead of _no_lockdep
Commit-ID: 0c3351d451ae2fa438d5d1ed719fc43354fbffbb Gitweb: http://git.kernel.org/tip/0c3351d451ae2fa438d5d1ed719fc43354fbffbb Author: John Stultz AuthorDate: Thu, 2 Jan 2014 15:11:13 -0800 Committer: Ingo Molnar CommitDate: Sun, 12 Jan 2014 10:13:59 +0100 seqlock: Use raw_ prefix instead of _no_lockdep Linus disliked the _no_lockdep() naming, so instead use the more-consistent raw_* prefix to the non-lockdep enabled seqcount methods. This also adds raw_ methods for the write operations as well, which will be utilized in a following patch. Acked-by: Linus Torvalds Reviewed-by: Stephen Boyd Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra Cc: Krzysztof Hałasa Cc: Uwe Kleine-König Cc: Willy Tarreau Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- arch/x86/vdso/vclock_gettime.c | 8 include/linux/seqlock.h| 27 +++ 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c index 2ada505..eb5d7a5 100644 --- a/arch/x86/vdso/vclock_gettime.c +++ b/arch/x86/vdso/vclock_gettime.c @@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts) ts->tv_nsec = 0; do { - seq = read_seqcount_begin_no_lockdep(>seq); + seq = raw_read_seqcount_begin(>seq); mode = gtod->clock.vclock_mode; ts->tv_sec = gtod->wall_time_sec; ns = gtod->wall_time_snsec; @@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts) ts->tv_nsec = 0; do { - seq = read_seqcount_begin_no_lockdep(>seq); + seq = raw_read_seqcount_begin(>seq); mode = gtod->clock.vclock_mode; ts->tv_sec = gtod->monotonic_time_sec; ns = gtod->monotonic_time_snsec; @@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin_no_lockdep(>seq); + seq = raw_read_seqcount_begin(>seq); ts->tv_sec = gtod->wall_time_coarse.tv_sec; ts->tv_nsec = gtod->wall_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(>seq, seq))); @@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin_no_lockdep(>seq); + seq = raw_read_seqcount_begin(>seq); ts->tv_sec = gtod->monotonic_time_coarse.tv_sec; ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(>seq, seq))); diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index cf87a24..535f158 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -117,15 +117,15 @@ repeat: } /** - * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep + * raw_read_seqcount_begin - start seq-read critical section w/o lockdep * @s: pointer to seqcount_t * Returns: count to be passed to read_seqcount_retry * - * read_seqcount_begin_no_lockdep opens a read critical section of the given + * raw_read_seqcount_begin opens a read critical section of the given * seqcount, but without any lockdep checking. Validity of the critical * section is tested by checking read_seqcount_retry function. */ -static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s) +static inline unsigned raw_read_seqcount_begin(const seqcount_t *s) { unsigned ret = __read_seqcount_begin(s); smp_rmb(); @@ -144,7 +144,7 @@ static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s) static inline unsigned read_seqcount_begin(const seqcount_t *s) { seqcount_lockdep_reader_access(s); - return read_seqcount_begin_no_lockdep(s); + return raw_read_seqcount_begin(s); } /** @@ -206,14 +206,26 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start) } + +static inline void raw_write_seqcount_begin(seqcount_t *s) +{ + s->sequence++; + smp_wmb(); +} + +static inline void raw_write_seqcount_end(seqcount_t *s) +{ + smp_wmb(); + s->sequence++; +} + /* * Sequence counter only version assumes that callers are using their * own mutexing. */ static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass) { - s->sequence++; - smp_wmb(); + raw_write_seqcount_begin(s); seqcount_acquire(>dep_map, subclass, 0, _RET_IP_); } @@ -225,8 +237,7 @@ static inline void write_seqcount_begin(seqcount_t *s) static inline void write_seqcount_end(seqcount_t *s) { seqcount_release(>dep_map, 1, _RET_IP_); - smp_wmb(); - s->sequence++; + raw_write_seqcount_end(s); } /** -- To unsubscribe
[tip:core/urgent] sched_clock: Disable seqlock lockdep usage in sched_clock()
Commit-ID: 7a06c41cbec33c6dbe7eec575c61986122617408 Gitweb: http://git.kernel.org/tip/7a06c41cbec33c6dbe7eec575c61986122617408 Author: John Stultz AuthorDate: Thu, 2 Jan 2014 15:11:14 -0800 Committer: Ingo Molnar CommitDate: Sun, 12 Jan 2014 10:14:00 +0100 sched_clock: Disable seqlock lockdep usage in sched_clock() Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds Reviewed-by: Stephen Boyd Reported-by: Krzysztof Hałasa Signed-off-by: John Stultz Cc: Uwe Kleine-König Cc: Willy Tarreau Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/sched_clock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 68b7993..0abb364 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -74,7 +74,7 @@ unsigned long long notrace sched_clock(void) return cd.epoch_ns; do { - seq = read_seqcount_begin(); + seq = raw_read_seqcount_begin(); epoch_cyc = cd.epoch_cyc; epoch_ns = cd.epoch_ns; } while (read_seqcount_retry(, seq)); @@ -99,10 +99,10 @@ static void notrace update_sched_clock(void) cd.mult, cd.shift); raw_local_irq_save(flags); - write_seqcount_begin(); + raw_write_seqcount_begin(); cd.epoch_ns = ns; cd.epoch_cyc = cyc; - write_seqcount_end(); + raw_write_seqcount_end(); raw_local_irq_restore(flags); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:core/urgent] sched_clock: Disable seqlock lockdep usage in sched_clock()
Commit-ID: 7a06c41cbec33c6dbe7eec575c61986122617408 Gitweb: http://git.kernel.org/tip/7a06c41cbec33c6dbe7eec575c61986122617408 Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 2 Jan 2014 15:11:14 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Sun, 12 Jan 2014 10:14:00 +0100 sched_clock: Disable seqlock lockdep usage in sched_clock() Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds torva...@linux-foundation.org Reviewed-by: Stephen Boyd sb...@codeaurora.org Reported-by: Krzysztof Hałasa khal...@piap.pl Signed-off-by: John Stultz john.stu...@linaro.org Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de Cc: Willy Tarreau w...@1wt.eu Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/sched_clock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 68b7993..0abb364 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -74,7 +74,7 @@ unsigned long long notrace sched_clock(void) return cd.epoch_ns; do { - seq = read_seqcount_begin(cd.seq); + seq = raw_read_seqcount_begin(cd.seq); epoch_cyc = cd.epoch_cyc; epoch_ns = cd.epoch_ns; } while (read_seqcount_retry(cd.seq, seq)); @@ -99,10 +99,10 @@ static void notrace update_sched_clock(void) cd.mult, cd.shift); raw_local_irq_save(flags); - write_seqcount_begin(cd.seq); + raw_write_seqcount_begin(cd.seq); cd.epoch_ns = ns; cd.epoch_cyc = cyc; - write_seqcount_end(cd.seq); + raw_write_seqcount_end(cd.seq); raw_local_irq_restore(flags); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:core/urgent] seqlock: Use raw_ prefix instead of _no_lockdep
Commit-ID: 0c3351d451ae2fa438d5d1ed719fc43354fbffbb Gitweb: http://git.kernel.org/tip/0c3351d451ae2fa438d5d1ed719fc43354fbffbb Author: John Stultz john.stu...@linaro.org AuthorDate: Thu, 2 Jan 2014 15:11:13 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Sun, 12 Jan 2014 10:13:59 +0100 seqlock: Use raw_ prefix instead of _no_lockdep Linus disliked the _no_lockdep() naming, so instead use the more-consistent raw_* prefix to the non-lockdep enabled seqcount methods. This also adds raw_ methods for the write operations as well, which will be utilized in a following patch. Acked-by: Linus Torvalds torva...@linux-foundation.org Reviewed-by: Stephen Boyd sb...@codeaurora.org Signed-off-by: John Stultz john.stu...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Cc: Krzysztof Hałasa khal...@piap.pl Cc: Uwe Kleine-König u.kleine-koe...@pengutronix.de Cc: Willy Tarreau w...@1wt.eu Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/x86/vdso/vclock_gettime.c | 8 include/linux/seqlock.h| 27 +++ 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c index 2ada505..eb5d7a5 100644 --- a/arch/x86/vdso/vclock_gettime.c +++ b/arch/x86/vdso/vclock_gettime.c @@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts) ts-tv_nsec = 0; do { - seq = read_seqcount_begin_no_lockdep(gtod-seq); + seq = raw_read_seqcount_begin(gtod-seq); mode = gtod-clock.vclock_mode; ts-tv_sec = gtod-wall_time_sec; ns = gtod-wall_time_snsec; @@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts) ts-tv_nsec = 0; do { - seq = read_seqcount_begin_no_lockdep(gtod-seq); + seq = raw_read_seqcount_begin(gtod-seq); mode = gtod-clock.vclock_mode; ts-tv_sec = gtod-monotonic_time_sec; ns = gtod-monotonic_time_snsec; @@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin_no_lockdep(gtod-seq); + seq = raw_read_seqcount_begin(gtod-seq); ts-tv_sec = gtod-wall_time_coarse.tv_sec; ts-tv_nsec = gtod-wall_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(gtod-seq, seq))); @@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin_no_lockdep(gtod-seq); + seq = raw_read_seqcount_begin(gtod-seq); ts-tv_sec = gtod-monotonic_time_coarse.tv_sec; ts-tv_nsec = gtod-monotonic_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(gtod-seq, seq))); diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index cf87a24..535f158 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -117,15 +117,15 @@ repeat: } /** - * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep + * raw_read_seqcount_begin - start seq-read critical section w/o lockdep * @s: pointer to seqcount_t * Returns: count to be passed to read_seqcount_retry * - * read_seqcount_begin_no_lockdep opens a read critical section of the given + * raw_read_seqcount_begin opens a read critical section of the given * seqcount, but without any lockdep checking. Validity of the critical * section is tested by checking read_seqcount_retry function. */ -static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s) +static inline unsigned raw_read_seqcount_begin(const seqcount_t *s) { unsigned ret = __read_seqcount_begin(s); smp_rmb(); @@ -144,7 +144,7 @@ static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s) static inline unsigned read_seqcount_begin(const seqcount_t *s) { seqcount_lockdep_reader_access(s); - return read_seqcount_begin_no_lockdep(s); + return raw_read_seqcount_begin(s); } /** @@ -206,14 +206,26 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start) } + +static inline void raw_write_seqcount_begin(seqcount_t *s) +{ + s-sequence++; + smp_wmb(); +} + +static inline void raw_write_seqcount_end(seqcount_t *s) +{ + smp_wmb(); + s-sequence++; +} + /* * Sequence counter only version assumes that callers are using their * own mutexing. */ static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass) { - s-sequence++; - smp_wmb(); + raw_write_seqcount_begin(s); seqcount_acquire(s-dep_map, subclass, 0, _RET_IP_); } @@ -225,8 +237,7 @@ static inline void
[tip:core/locking] ipv6: Fix possible ipv6 seqlock deadlock
Commit-ID: 5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b Gitweb: http://git.kernel.org/tip/5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b Author: John Stultz AuthorDate: Mon, 7 Oct 2013 15:52:01 -0700 Committer: Ingo Molnar CommitDate: Wed, 6 Nov 2013 12:40:28 +0100 ipv6: Fix possible ipv6 seqlock deadlock While enabling lockdep on seqlocks, I ran across the warning below caused by the ipv6 stats being updated in both irq and non-irq context. This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested by Eric Dumazet) to resolve this problem. [ 11.120383] = [ 11.121024] [ INFO: inconsistent lock state ] [ 11.121663] 3.12.0-rc1+ #68 Not tainted [ 11.19] - [ 11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 11.124505] (>syncp.seq#6){+.?...}, at: [] ndisc_send_ns+0xe2/0x130 [ 11.125736] {SOFTIRQ-ON-W} state was registered at: [ 11.126447] [] __lock_acquire+0x5c7/0x1af0 [ 11.127222] [] lock_acquire+0x96/0xd0 [ 11.127925] [] write_seqcount_begin+0x33/0x40 [ 11.128766] [] ip6_dst_lookup_tail+0x3a3/0x460 [ 11.129582] [] ip6_dst_lookup_flow+0x2e/0x80 [ 11.130014] [] ip6_datagram_connect+0x150/0x4e0 [ 11.130014] [] inet_dgram_connect+0x25/0x70 [ 11.130014] [] SYSC_connect+0xa1/0xc0 [ 11.130014] [] SyS_connect+0x11/0x20 [ 11.130014] [] SyS_socketcall+0x12b/0x300 [ 11.130014] [] syscall_call+0x7/0xb [ 11.130014] irq event stamp: 1184 [ 11.130014] hardirqs last enabled at (1184): [] local_bh_enable+0x71/0x110 [ 11.130014] hardirqs last disabled at (1183): [] local_bh_enable+0x3d/0x110 [ 11.130014] softirqs last enabled at (0): [] copy_process.part.42+0x45d/0x11a0 [ 11.130014] softirqs last disabled at (1147): [] irq_exit+0xa5/0xb0 [ 11.130014] [ 11.130014] other info that might help us debug this: [ 11.130014] Possible unsafe locking scenario: [ 11.130014] [ 11.130014]CPU0 [ 11.130014] [ 11.130014] lock(>syncp.seq#6); [ 11.130014] [ 11.130014] lock(>syncp.seq#6); [ 11.130014] [ 11.130014] *** DEADLOCK *** [ 11.130014] [ 11.130014] 3 locks held by init/4483: [ 11.130014] #0: (rcu_read_lock){.+.+..}, at: [] SyS_setpriority+0x4c/0x620 [ 11.130014] #1: (((>dad_timer))){+.-...}, at: [] call_timer_fn+0x0/0xf0 [ 11.130014] #2: (rcu_read_lock){.+.+..}, at: [] ndisc_send_skb+0x54/0x5d0 [ 11.130014] [ 11.130014] stack backtrace: [ 11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68 [ 11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 11.130014] c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123 [ 11.130014] c1ec1484 1183 0001 0003 0001 [ 11.130014] c1ec1484 0004 c5712dcc c55e5c84 c10de492 0004 c10755f2 [ 11.130014] Call Trace: [ 11.130014] [] dump_stack+0x4b/0x66 [ 11.130014] [] print_usage_bug+0x1d3/0x1dd [ 11.130014] [] mark_lock+0x282/0x2f0 [ 11.130014] [] ? kvm_clock_read+0x22/0x30 [ 11.130014] [] ? check_usage_backwards+0x150/0x150 [ 11.130014] [] __lock_acquire+0x584/0x1af0 [ 11.130014] [] ? sched_clock_cpu+0xef/0x190 [ 11.130014] [] ? mark_held_locks+0x8c/0xf0 [ 11.130014] [] lock_acquire+0x96/0xd0 [ 11.130014] [] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [] ndisc_send_skb+0x293/0x5d0 [ 11.130014] [] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [] ndisc_send_ns+0xe2/0x130 [ 11.130014] [] ? mod_timer+0xf2/0x160 [ 11.130014] [] ? addrconf_dad_timer+0xce/0x150 [ 11.130014] [] addrconf_dad_timer+0x10a/0x150 [ 11.130014] [] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [] call_timer_fn+0x73/0xf0 [ 11.130014] [] ? __internal_add_timer+0xb0/0xb0 [ 11.130014] [] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [] run_timer_softirq+0x141/0x1e0 [ 11.130014] [] ? __do_softirq+0x70/0x1b0 [ 11.130014] [] __do_softirq+0xc0/0x1b0 [ 11.130014] [] irq_exit+0xa5/0xb0 [ 11.130014] [] smp_apic_timer_interrupt+0x35/0x50 [ 11.130014] [] apic_timer_interrupt+0x32/0x38 [ 11.130014] [] ? SyS_setpriority+0xfd/0x620 [ 11.130014] [] ? lock_release+0x9/0x240 [ 11.130014] [] ? SyS_setpriority+0xe7/0x620 [ 11.130014] [] ? _raw_read_unlock+0x1d/0x30 [ 11.130014] [] SyS_setpriority+0x111/0x620 [ 11.130014] [] ? SyS_setpriority+0x4c/0x620 [ 11.130014] [] syscall_call+0x7/0xb Signed-off-by: John Stultz Acked-by: Eric Dumazet Signed-off-by: Peter Zijlstra Cc: Alexey Kuznetsov Cc: "David S. Miller" Cc: Hideaki YOSHIFUJI Cc: James Morris Cc: Mathieu Desnoyers Cc: Patrick McHardy Cc: Steven Rostedt Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git
[tip:core/locking] net: Explicitly initialize u64_stats_sync structures for lockdep
Commit-ID: 827da44c61419f29ae3be198c342e2147f1a10cb Gitweb: http://git.kernel.org/tip/827da44c61419f29ae3be198c342e2147f1a10cb Author: John Stultz AuthorDate: Mon, 7 Oct 2013 15:51:58 -0700 Committer: Ingo Molnar CommitDate: Wed, 6 Nov 2013 12:40:25 +0100 net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz Acked-by: Julian Anastasov Signed-off-by: Peter Zijlstra Cc: Alexey Kuznetsov Cc: "David S. Miller" Cc: Eric Dumazet Cc: Hideaki YOSHIFUJI Cc: James Morris Cc: Jesse Gross Cc: Mathieu Desnoyers Cc: "Michael S. Tsirkin" Cc: Mirko Lindner Cc: Patrick McHardy Cc: Roger Luethi Cc: Rusty Russell Cc: Simon Horman Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Thomas Petazzoni Cc: Wensong Zhang Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- drivers/net/dummy.c| 6 ++ drivers/net/ethernet/emulex/benet/be_main.c| 4 drivers/net/ethernet/intel/igb/igb_main.c | 5 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 drivers/net/ethernet/marvell/mvneta.c | 3 +++ drivers/net/ethernet/marvell/sky2.c| 3 +++ drivers/net/ethernet/neterion/vxge/vxge-main.c | 4 drivers/net/ethernet/nvidia/forcedeth.c| 2 ++ drivers/net/ethernet/realtek/8139too.c | 3 +++ drivers/net/ethernet/tile/tilepro.c| 2 ++ drivers/net/ethernet/via/via-rhine.c | 3 +++ drivers/net/ifb.c | 5 + drivers/net/loopback.c | 6 ++ drivers/net/macvlan.c | 7 +++ drivers/net/nlmon.c| 8 drivers/net/team/team.c| 6 ++ drivers/net/team/team_mode_loadbalance.c | 9 - drivers/net/veth.c | 8 drivers/net/virtio_net.c | 8 drivers/net/vxlan.c| 8 drivers/net/xen-netfront.c | 6 ++ include/linux/u64_stats_sync.h | 7 +++ net/8021q/vlan_dev.c | 9 - net/bridge/br_device.c | 7 +++ net/ipv4/af_inet.c | 14 ++ net/ipv4/ip_tunnel.c | 8 +++- net/ipv6/addrconf.c| 14 ++ net/ipv6/af_inet6.c| 14 ++ net/ipv6/ip6_gre.c | 15 +++ net/ipv6/ip6_tunnel.c | 7 +++ net/ipv6/sit.c | 15 +++ net/netfilter/ipvs/ip_vs_ctl.c | 25 ++--- net/openvswitch/datapath.c | 6 ++ net/openvswitch/vport.c| 8 34 files changed, 253 insertions(+), 6 deletions(-) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index b710c6b..bd8f84b 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -88,10 +88,16 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev) static int dummy_dev_init(struct net_device *dev) { + int i; dev->dstats = alloc_percpu(struct pcpu_dstats); if (!dev->dstats) return -ENOMEM; + for_each_possible_cpu(i) { + struct pcpu_dstats *dstats; + dstats = per_cpu_ptr(dev->dstats, i); + u64_stats_init(>syncp); + } return 0; } diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 2c38cc4..edd7595 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -2047,6 +2047,9 @@ static int be_tx_qs_create(struct be_adapter *adapter) if (status) return status; + u64_stats_init(>stats.sync); + u64_stats_init(>stats.sync_compl); + /* If num_evt_qs is less than num_tx_qs, then more than
[tip:core/locking] cpuset: Fix potential deadlock w/ set_mems_allowed
Commit-ID: db751fe3ea6880ff5ac5abe60cb7b80deb5a4140 Gitweb: http://git.kernel.org/tip/db751fe3ea6880ff5ac5abe60cb7b80deb5a4140 Author: John Stultz AuthorDate: Mon, 7 Oct 2013 15:52:00 -0700 Committer: Ingo Molnar CommitDate: Wed, 6 Nov 2013 12:40:27 +0100 cpuset: Fix potential deadlock w/ set_mems_allowed After adding lockdep support to seqlock/seqcount structures, I started seeing the following warning: [1.070907] == [1.072015] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [1.073181] 3.11.0+ #67 Not tainted [1.073801] -- [1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [1.076088] (>mems_allowed_seq){+.+...}, at: [] new_slab+0x5f/0x280 [1.077572] [1.077572] and this task is already holding: [1.078593] (&(>__queue_lock)->rlock){..-...}, at: [] blk_execute_rq_nowait+0x53/0xf0 [1.080042] which would create a new lock dependency: [1.080042] (&(>__queue_lock)->rlock){..-...} -> (>mems_allowed_seq){+.+...} [1.080042] [1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock: [1.080042] (&(>__queue_lock)->rlock){..-...} [1.080042] ... which became SOFTIRQ-irq-safe at: [1.080042] [] __lock_acquire+0x5b9/0x1db0 [1.080042] [] lock_acquire+0x95/0x130 [1.080042] [] _raw_spin_lock+0x41/0x80 [1.080042] [] scsi_device_unbusy+0x7e/0xd0 [1.080042] [] scsi_finish_command+0x32/0xf0 [1.080042] [] scsi_softirq_done+0xa1/0x130 [1.080042] [] blk_done_softirq+0x73/0x90 [1.080042] [] __do_softirq+0x110/0x2f0 [1.080042] [] run_ksoftirqd+0x2d/0x60 [1.080042] [] smpboot_thread_fn+0x156/0x1e0 [1.080042] [] kthread+0xd6/0xe0 [1.080042] [] ret_from_fork+0x7c/0xb0 [1.080042] [1.080042] to a SOFTIRQ-irq-unsafe lock: [1.080042] (>mems_allowed_seq){+.+...} [1.080042] ... which became SOFTIRQ-irq-unsafe at: [1.080042] ... [] __lock_acquire+0x613/0x1db0 [1.080042] [] lock_acquire+0x95/0x130 [1.080042] [] kthreadd+0x82/0x180 [1.080042] [] ret_from_fork+0x7c/0xb0 [1.080042] [1.080042] other info that might help us debug this: [1.080042] [1.080042] Possible interrupt unsafe locking scenario: [1.080042] [1.080042]CPU0CPU1 [1.080042] [1.080042] lock(>mems_allowed_seq); [1.080042]local_irq_disable(); [1.080042]lock(&(>__queue_lock)->rlock); [1.080042]lock(>mems_allowed_seq); [1.080042] [1.080042] lock(&(>__queue_lock)->rlock); [1.080042] [1.080042] *** DEADLOCK *** The issue stems from the kthreadd() function calling set_mems_allowed with irqs enabled. While its possibly unlikely for the actual deadlock to trigger, a fix is fairly simple: disable irqs before taking the mems_allowed_seq lock. Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra Acked-by: Li Zefan Cc: Mathieu Desnoyers Cc: Steven Rostedt Cc: "David S. Miller" Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-4-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- include/linux/cpuset.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index cc1b01c..3fe661f 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq) static inline void set_mems_allowed(nodemask_t nodemask) { + unsigned long flags; + task_lock(current); + local_irq_save(flags); write_seqcount_begin(>mems_allowed_seq); current->mems_allowed = nodemask; write_seqcount_end(>mems_allowed_seq); + local_irq_restore(flags); task_unlock(current); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:core/locking] seqcount: Add lockdep functionality to seqcount/seqlock structures
Commit-ID: 1ca7d67cf5d5a2aef26a8d9afd789006fa098347 Gitweb: http://git.kernel.org/tip/1ca7d67cf5d5a2aef26a8d9afd789006fa098347 Author: John Stultz AuthorDate: Mon, 7 Oct 2013 15:51:59 -0700 Committer: Ingo Molnar CommitDate: Wed, 6 Nov 2013 12:40:26 +0100 seqcount: Add lockdep functionality to seqcount/seqlock structures Currently seqlocks and seqcounts don't support lockdep. After running across a seqcount related deadlock in the timekeeping code, I used a less-refined and more focused variant of this patch to narrow down the cause of the issue. This is a first-pass attempt to properly enable lockdep functionality on seqlocks and seqcounts. Since seqcounts are used in the vdso gettimeofday code, I've provided non-lockdep accessors for those needs. I've also handled one case where there were nested seqlock writers and there may be more edge cases. Comments and feedback would be appreciated! Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra Cc: Eric Dumazet Cc: Li Zefan Cc: Mathieu Desnoyers Cc: Steven Rostedt Cc: "David S. Miller" Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- arch/x86/vdso/vclock_gettime.c | 8 ++--- fs/dcache.c| 4 +-- fs/fs_struct.c | 2 +- include/linux/init_task.h | 8 ++--- include/linux/lockdep.h| 8 +++-- include/linux/seqlock.h| 79 ++ mm/filemap_xip.c | 2 +- 7 files changed, 90 insertions(+), 21 deletions(-) diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c index 72074d5..2ada505 100644 --- a/arch/x86/vdso/vclock_gettime.c +++ b/arch/x86/vdso/vclock_gettime.c @@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts) ts->tv_nsec = 0; do { - seq = read_seqcount_begin(>seq); + seq = read_seqcount_begin_no_lockdep(>seq); mode = gtod->clock.vclock_mode; ts->tv_sec = gtod->wall_time_sec; ns = gtod->wall_time_snsec; @@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts) ts->tv_nsec = 0; do { - seq = read_seqcount_begin(>seq); + seq = read_seqcount_begin_no_lockdep(>seq); mode = gtod->clock.vclock_mode; ts->tv_sec = gtod->monotonic_time_sec; ns = gtod->monotonic_time_snsec; @@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin(>seq); + seq = read_seqcount_begin_no_lockdep(>seq); ts->tv_sec = gtod->wall_time_coarse.tv_sec; ts->tv_nsec = gtod->wall_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(>seq, seq))); @@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin(>seq); + seq = read_seqcount_begin_no_lockdep(>seq); ts->tv_sec = gtod->monotonic_time_coarse.tv_sec; ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(>seq, seq))); diff --git a/fs/dcache.c b/fs/dcache.c index ae6ebb8..f750be2 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2574,7 +2574,7 @@ static void __d_move(struct dentry * dentry, struct dentry * target) dentry_lock_for_move(dentry, target); write_seqcount_begin(>d_seq); - write_seqcount_begin(>d_seq); + write_seqcount_begin_nested(>d_seq, DENTRY_D_LOCK_NESTED); /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ @@ -2706,7 +2706,7 @@ static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon) dentry_lock_for_move(anon, dentry); write_seqcount_begin(>d_seq); - write_seqcount_begin(>d_seq); + write_seqcount_begin_nested(>d_seq, DENTRY_D_LOCK_NESTED); dparent = dentry->d_parent; diff --git a/fs/fs_struct.c b/fs/fs_struct.c index d8ac61d..7dca743 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -161,6 +161,6 @@ EXPORT_SYMBOL(current_umask); struct fs_struct init_fs = { .users = 1, .lock = __SPIN_LOCK_UNLOCKED(init_fs.lock), - .seq= SEQCNT_ZERO, + .seq= SEQCNT_ZERO(init_fs.seq), .umask = 0022, }; diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 5cd0f09..b0ed422 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -32,10 +32,10 @@ extern struct fs_struct init_fs; #endif #ifdef CONFIG_CPUSETS -#define INIT_CPUSET_SEQ \ - .mems_allowed_seq = SEQCNT_ZERO, +#define INIT_CPUSET_SEQ(tsk)
[tip:core/locking] seqcount: Add lockdep functionality to seqcount/seqlock structures
Commit-ID: 1ca7d67cf5d5a2aef26a8d9afd789006fa098347 Gitweb: http://git.kernel.org/tip/1ca7d67cf5d5a2aef26a8d9afd789006fa098347 Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 7 Oct 2013 15:51:59 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 6 Nov 2013 12:40:26 +0100 seqcount: Add lockdep functionality to seqcount/seqlock structures Currently seqlocks and seqcounts don't support lockdep. After running across a seqcount related deadlock in the timekeeping code, I used a less-refined and more focused variant of this patch to narrow down the cause of the issue. This is a first-pass attempt to properly enable lockdep functionality on seqlocks and seqcounts. Since seqcounts are used in the vdso gettimeofday code, I've provided non-lockdep accessors for those needs. I've also handled one case where there were nested seqlock writers and there may be more edge cases. Comments and feedback would be appreciated! Signed-off-by: John Stultz john.stu...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Cc: Eric Dumazet eric.duma...@gmail.com Cc: Li Zefan lize...@huawei.com Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: Steven Rostedt rost...@goodmis.org Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/x86/vdso/vclock_gettime.c | 8 ++--- fs/dcache.c| 4 +-- fs/fs_struct.c | 2 +- include/linux/init_task.h | 8 ++--- include/linux/lockdep.h| 8 +++-- include/linux/seqlock.h| 79 ++ mm/filemap_xip.c | 2 +- 7 files changed, 90 insertions(+), 21 deletions(-) diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c index 72074d5..2ada505 100644 --- a/arch/x86/vdso/vclock_gettime.c +++ b/arch/x86/vdso/vclock_gettime.c @@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts) ts-tv_nsec = 0; do { - seq = read_seqcount_begin(gtod-seq); + seq = read_seqcount_begin_no_lockdep(gtod-seq); mode = gtod-clock.vclock_mode; ts-tv_sec = gtod-wall_time_sec; ns = gtod-wall_time_snsec; @@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts) ts-tv_nsec = 0; do { - seq = read_seqcount_begin(gtod-seq); + seq = read_seqcount_begin_no_lockdep(gtod-seq); mode = gtod-clock.vclock_mode; ts-tv_sec = gtod-monotonic_time_sec; ns = gtod-monotonic_time_snsec; @@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin(gtod-seq); + seq = read_seqcount_begin_no_lockdep(gtod-seq); ts-tv_sec = gtod-wall_time_coarse.tv_sec; ts-tv_nsec = gtod-wall_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(gtod-seq, seq))); @@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts) { unsigned long seq; do { - seq = read_seqcount_begin(gtod-seq); + seq = read_seqcount_begin_no_lockdep(gtod-seq); ts-tv_sec = gtod-monotonic_time_coarse.tv_sec; ts-tv_nsec = gtod-monotonic_time_coarse.tv_nsec; } while (unlikely(read_seqcount_retry(gtod-seq, seq))); diff --git a/fs/dcache.c b/fs/dcache.c index ae6ebb8..f750be2 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2574,7 +2574,7 @@ static void __d_move(struct dentry * dentry, struct dentry * target) dentry_lock_for_move(dentry, target); write_seqcount_begin(dentry-d_seq); - write_seqcount_begin(target-d_seq); + write_seqcount_begin_nested(target-d_seq, DENTRY_D_LOCK_NESTED); /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ @@ -2706,7 +2706,7 @@ static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon) dentry_lock_for_move(anon, dentry); write_seqcount_begin(dentry-d_seq); - write_seqcount_begin(anon-d_seq); + write_seqcount_begin_nested(anon-d_seq, DENTRY_D_LOCK_NESTED); dparent = dentry-d_parent; diff --git a/fs/fs_struct.c b/fs/fs_struct.c index d8ac61d..7dca743 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -161,6 +161,6 @@ EXPORT_SYMBOL(current_umask); struct fs_struct init_fs = { .users = 1, .lock = __SPIN_LOCK_UNLOCKED(init_fs.lock), - .seq= SEQCNT_ZERO, + .seq= SEQCNT_ZERO(init_fs.seq), .umask = 0022, }; diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 5cd0f09..b0ed422 100644 --- a/include/linux/init_task.h +++
[tip:core/locking] cpuset: Fix potential deadlock w/ set_mems_allowed
Commit-ID: db751fe3ea6880ff5ac5abe60cb7b80deb5a4140 Gitweb: http://git.kernel.org/tip/db751fe3ea6880ff5ac5abe60cb7b80deb5a4140 Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 7 Oct 2013 15:52:00 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 6 Nov 2013 12:40:27 +0100 cpuset: Fix potential deadlock w/ set_mems_allowed After adding lockdep support to seqlock/seqcount structures, I started seeing the following warning: [1.070907] == [1.072015] [ INFO: SOFTIRQ-safe - SOFTIRQ-unsafe lock order detected ] [1.073181] 3.11.0+ #67 Not tainted [1.073801] -- [1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [1.076088] (p-mems_allowed_seq){+.+...}, at: [81187d7f] new_slab+0x5f/0x280 [1.077572] [1.077572] and this task is already holding: [1.078593] ((q-__queue_lock)-rlock){..-...}, at: [81339f03] blk_execute_rq_nowait+0x53/0xf0 [1.080042] which would create a new lock dependency: [1.080042] ((q-__queue_lock)-rlock){..-...} - (p-mems_allowed_seq){+.+...} [1.080042] [1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock: [1.080042] ((q-__queue_lock)-rlock){..-...} [1.080042] ... which became SOFTIRQ-irq-safe at: [1.080042] [810ec179] __lock_acquire+0x5b9/0x1db0 [1.080042] [810edfe5] lock_acquire+0x95/0x130 [1.080042] [818968a1] _raw_spin_lock+0x41/0x80 [1.080042] [81560c9e] scsi_device_unbusy+0x7e/0xd0 [1.080042] [8155a612] scsi_finish_command+0x32/0xf0 [1.080042] [81560e91] scsi_softirq_done+0xa1/0x130 [1.080042] [8133b0f3] blk_done_softirq+0x73/0x90 [1.080042] [81095dc0] __do_softirq+0x110/0x2f0 [1.080042] [81095fcd] run_ksoftirqd+0x2d/0x60 [1.080042] [810bc506] smpboot_thread_fn+0x156/0x1e0 [1.080042] [810b3916] kthread+0xd6/0xe0 [1.080042] [818980ac] ret_from_fork+0x7c/0xb0 [1.080042] [1.080042] to a SOFTIRQ-irq-unsafe lock: [1.080042] (p-mems_allowed_seq){+.+...} [1.080042] ... which became SOFTIRQ-irq-unsafe at: [1.080042] ... [810ec1d3] __lock_acquire+0x613/0x1db0 [1.080042] [810edfe5] lock_acquire+0x95/0x130 [1.080042] [810b3df2] kthreadd+0x82/0x180 [1.080042] [818980ac] ret_from_fork+0x7c/0xb0 [1.080042] [1.080042] other info that might help us debug this: [1.080042] [1.080042] Possible interrupt unsafe locking scenario: [1.080042] [1.080042]CPU0CPU1 [1.080042] [1.080042] lock(p-mems_allowed_seq); [1.080042]local_irq_disable(); [1.080042]lock((q-__queue_lock)-rlock); [1.080042]lock(p-mems_allowed_seq); [1.080042] Interrupt [1.080042] lock((q-__queue_lock)-rlock); [1.080042] [1.080042] *** DEADLOCK *** The issue stems from the kthreadd() function calling set_mems_allowed with irqs enabled. While its possibly unlikely for the actual deadlock to trigger, a fix is fairly simple: disable irqs before taking the mems_allowed_seq lock. Signed-off-by: John Stultz john.stu...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Acked-by: Li Zefan lize...@huawei.com Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: Steven Rostedt rost...@goodmis.org Cc: David S. Miller da...@davemloft.net Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-4-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/cpuset.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index cc1b01c..3fe661f 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq) static inline void set_mems_allowed(nodemask_t nodemask) { + unsigned long flags; + task_lock(current); + local_irq_save(flags); write_seqcount_begin(current-mems_allowed_seq); current-mems_allowed = nodemask; write_seqcount_end(current-mems_allowed_seq); + local_irq_restore(flags); task_unlock(current); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:core/locking] ipv6: Fix possible ipv6 seqlock deadlock
Commit-ID: 5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b Gitweb: http://git.kernel.org/tip/5ac68e7c34a4797aa4ca9615e5a6603bda1abe9b Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 7 Oct 2013 15:52:01 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 6 Nov 2013 12:40:28 +0100 ipv6: Fix possible ipv6 seqlock deadlock While enabling lockdep on seqlocks, I ran across the warning below caused by the ipv6 stats being updated in both irq and non-irq context. This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested by Eric Dumazet) to resolve this problem. [ 11.120383] = [ 11.121024] [ INFO: inconsistent lock state ] [ 11.121663] 3.12.0-rc1+ #68 Not tainted [ 11.19] - [ 11.122867] inconsistent {SOFTIRQ-ON-W} - {IN-SOFTIRQ-W} usage. [ 11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 11.124505] (stats-syncp.seq#6){+.?...}, at: [c1ab80c2] ndisc_send_ns+0xe2/0x130 [ 11.125736] {SOFTIRQ-ON-W} state was registered at: [ 11.126447] [c10e0eb7] __lock_acquire+0x5c7/0x1af0 [ 11.127222] [c10e2996] lock_acquire+0x96/0xd0 [ 11.127925] [c1a9a2c3] write_seqcount_begin+0x33/0x40 [ 11.128766] [c1a9aa03] ip6_dst_lookup_tail+0x3a3/0x460 [ 11.129582] [c1a9e0ce] ip6_dst_lookup_flow+0x2e/0x80 [ 11.130014] [c1ad18e0] ip6_datagram_connect+0x150/0x4e0 [ 11.130014] [c1a4d0b5] inet_dgram_connect+0x25/0x70 [ 11.130014] [c198dd61] SYSC_connect+0xa1/0xc0 [ 11.130014] [c198f571] SyS_connect+0x11/0x20 [ 11.130014] [c198fe6b] SyS_socketcall+0x12b/0x300 [ 11.130014] [c1bbf880] syscall_call+0x7/0xb [ 11.130014] irq event stamp: 1184 [ 11.130014] hardirqs last enabled at (1184): [c1086901] local_bh_enable+0x71/0x110 [ 11.130014] hardirqs last disabled at (1183): [c10868cd] local_bh_enable+0x3d/0x110 [ 11.130014] softirqs last enabled at (0): [c108014d] copy_process.part.42+0x45d/0x11a0 [ 11.130014] softirqs last disabled at (1147): [c1086e05] irq_exit+0xa5/0xb0 [ 11.130014] [ 11.130014] other info that might help us debug this: [ 11.130014] Possible unsafe locking scenario: [ 11.130014] [ 11.130014]CPU0 [ 11.130014] [ 11.130014] lock(stats-syncp.seq#6); [ 11.130014] Interrupt [ 11.130014] lock(stats-syncp.seq#6); [ 11.130014] [ 11.130014] *** DEADLOCK *** [ 11.130014] [ 11.130014] 3 locks held by init/4483: [ 11.130014] #0: (rcu_read_lock){.+.+..}, at: [c109363c] SyS_setpriority+0x4c/0x620 [ 11.130014] #1: (((ifa-dad_timer))){+.-...}, at: [c108c1c0] call_timer_fn+0x0/0xf0 [ 11.130014] #2: (rcu_read_lock){.+.+..}, at: [c1ab6494] ndisc_send_skb+0x54/0x5d0 [ 11.130014] [ 11.130014] stack backtrace: [ 11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68 [ 11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 11.130014] c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123 [ 11.130014] c1ec1484 1183 0001 0003 0001 [ 11.130014] c1ec1484 0004 c5712dcc c55e5c84 c10de492 0004 c10755f2 [ 11.130014] Call Trace: [ 11.130014] [c1bb0e71] dump_stack+0x4b/0x66 [ 11.130014] [c1badf79] print_usage_bug+0x1d3/0x1dd [ 11.130014] [c10de492] mark_lock+0x282/0x2f0 [ 11.130014] [c10755f2] ? kvm_clock_read+0x22/0x30 [ 11.130014] [c10dd8b0] ? check_usage_backwards+0x150/0x150 [ 11.130014] [c10e0e74] __lock_acquire+0x584/0x1af0 [ 11.130014] [c10b1baf] ? sched_clock_cpu+0xef/0x190 [ 11.130014] [c10de58c] ? mark_held_locks+0x8c/0xf0 [ 11.130014] [c10e2996] lock_acquire+0x96/0xd0 [ 11.130014] [c1ab80c2] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [c1ab66d3] ndisc_send_skb+0x293/0x5d0 [ 11.130014] [c1ab80c2] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [c1ab80c2] ndisc_send_ns+0xe2/0x130 [ 11.130014] [c108cc32] ? mod_timer+0xf2/0x160 [ 11.130014] [c1aa706e] ? addrconf_dad_timer+0xce/0x150 [ 11.130014] [c1aa70aa] addrconf_dad_timer+0x10a/0x150 [ 11.130014] [c1aa6fa0] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [c108c233] call_timer_fn+0x73/0xf0 [ 11.130014] [c108c1c0] ? __internal_add_timer+0xb0/0xb0 [ 11.130014] [c1aa6fa0] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [c108c5b1] run_timer_softirq+0x141/0x1e0 [ 11.130014] [c1086b20] ? __do_softirq+0x70/0x1b0 [ 11.130014] [c1086b70] __do_softirq+0xc0/0x1b0 [ 11.130014] [c1086e05] irq_exit+0xa5/0xb0 [ 11.130014] [c106cfd5] smp_apic_timer_interrupt+0x35/0x50 [ 11.130014] [c1bbfbca] apic_timer_interrupt+0x32/0x38 [ 11.130014] [c10936ed] ? SyS_setpriority+0xfd/0x620 [ 11.130014] [c10e26c9] ? lock_release+0x9/0x240 [ 11.130014] [c10936d7] ? SyS_setpriority+0xe7/0x620 [ 11.130014] [c1bbee6d] ? _raw_read_unlock+0x1d/0x30 [ 11.130014] [c1093701] SyS_setpriority+0x111/0x620 [ 11.130014] [c109363c] ? SyS_setpriority+0x4c/0x620 [ 11.130014] [c1bbf880]
[tip:core/locking] net: Explicitly initialize u64_stats_sync structures for lockdep
Commit-ID: 827da44c61419f29ae3be198c342e2147f1a10cb Gitweb: http://git.kernel.org/tip/827da44c61419f29ae3be198c342e2147f1a10cb Author: John Stultz john.stu...@linaro.org AuthorDate: Mon, 7 Oct 2013 15:51:58 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 6 Nov 2013 12:40:25 +0100 net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: Julian Anastasov j...@ssi.bg Signed-off-by: Peter Zijlstra pet...@infradead.org Cc: Alexey Kuznetsov kuz...@ms2.inr.ac.ru Cc: David S. Miller da...@davemloft.net Cc: Eric Dumazet eric.duma...@gmail.com Cc: Hideaki YOSHIFUJI yoshf...@linux-ipv6.org Cc: James Morris jmor...@namei.org Cc: Jesse Gross je...@nicira.com Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: Michael S. Tsirkin m...@redhat.com Cc: Mirko Lindner mlind...@marvell.com Cc: Patrick McHardy ka...@trash.net Cc: Roger Luethi r...@hellgate.ch Cc: Rusty Russell ru...@rustcorp.com.au Cc: Simon Horman ho...@verge.net.au Cc: Stephen Hemminger step...@networkplumber.org Cc: Steven Rostedt rost...@goodmis.org Cc: Thomas Petazzoni thomas.petazz...@free-electrons.com Cc: Wensong Zhang wens...@linux-vs.org Cc: net...@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- drivers/net/dummy.c| 6 ++ drivers/net/ethernet/emulex/benet/be_main.c| 4 drivers/net/ethernet/intel/igb/igb_main.c | 5 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 drivers/net/ethernet/marvell/mvneta.c | 3 +++ drivers/net/ethernet/marvell/sky2.c| 3 +++ drivers/net/ethernet/neterion/vxge/vxge-main.c | 4 drivers/net/ethernet/nvidia/forcedeth.c| 2 ++ drivers/net/ethernet/realtek/8139too.c | 3 +++ drivers/net/ethernet/tile/tilepro.c| 2 ++ drivers/net/ethernet/via/via-rhine.c | 3 +++ drivers/net/ifb.c | 5 + drivers/net/loopback.c | 6 ++ drivers/net/macvlan.c | 7 +++ drivers/net/nlmon.c| 8 drivers/net/team/team.c| 6 ++ drivers/net/team/team_mode_loadbalance.c | 9 - drivers/net/veth.c | 8 drivers/net/virtio_net.c | 8 drivers/net/vxlan.c| 8 drivers/net/xen-netfront.c | 6 ++ include/linux/u64_stats_sync.h | 7 +++ net/8021q/vlan_dev.c | 9 - net/bridge/br_device.c | 7 +++ net/ipv4/af_inet.c | 14 ++ net/ipv4/ip_tunnel.c | 8 +++- net/ipv6/addrconf.c| 14 ++ net/ipv6/af_inet6.c| 14 ++ net/ipv6/ip6_gre.c | 15 +++ net/ipv6/ip6_tunnel.c | 7 +++ net/ipv6/sit.c | 15 +++ net/netfilter/ipvs/ip_vs_ctl.c | 25 ++--- net/openvswitch/datapath.c | 6 ++ net/openvswitch/vport.c| 8 34 files changed, 253 insertions(+), 6 deletions(-) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index b710c6b..bd8f84b 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -88,10 +88,16 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev) static int dummy_dev_init(struct net_device *dev) { + int i; dev-dstats = alloc_percpu(struct pcpu_dstats); if (!dev-dstats) return -ENOMEM; + for_each_possible_cpu(i) { + struct pcpu_dstats *dstats; + dstats = per_cpu_ptr(dev-dstats, i); + u64_stats_init(dstats-syncp); + } return 0; } diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
[tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes
Commit-ID: 7bd36014460f793c19e7d6c94dab67b0afcfcb7f Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f Author: John Stultz AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700 Committer: Ingo Molnar CommitDate: Thu, 12 Sep 2013 07:49:51 +0200 timekeeping: Fix HRTICK related deadlock from ntp lock changes Gerlando Falauto reported that when HRTICK is enabled, it is possible to trigger system deadlocks. These were hard to reproduce, as HRTICK has been broken in the past, but seemed to be connected to the timekeeping_seq lock. Since seqlock/seqcount's aren't supported w/ lockdep, I added some extra spinlock based locking and triggered the following lockdep output: [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] (&(>lock)->rlock){..-...}, at: [] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100 [ 15.850051] Chain exists of: &(>lock)->rlock --> >pi_lock --> timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051]CPU0CPU1 [ 15.850051] [ 15.850051] lock(timekeeper_lock); [ 15.850051]lock(>pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&(>lock)->rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping: Hold timekeepering locks in do_adjtimex and hardpps") in 3.10 This patch avoids this deadlock, by moving the call to schedule_delayed_work() outside of the timekeeper lock critical section. Reported-by: Gerlando Falauto Tested-by: Lin Ming Signed-off-by: John Stultz Cc: Mathieu Desnoyers Cc: stable #3.11, 3.10 Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- include/linux/timex.h | 1 + kernel/time/ntp.c | 6 ++ kernel/time/timekeeping.c | 2 ++ 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/timex.h b/include/linux/timex.h index b3726e6..dd3edd7 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *); extern void hardpps(const struct timespec *, const struct timespec *); int read_current_timer(unsigned long *timer_val); +void ntp_notify_cmos_timer(void); /* The clock frequency of the i8253/i8254 PIT */ #define PIT_TICK_RATE 1193182ul diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 8f5b3b9..bb22151 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work) schedule_delayed_work(_cmos_work, timespec_to_jiffies()); } -static void notify_cmos_timer(void) +void ntp_notify_cmos_timer(void) { schedule_delayed_work(_cmos_work, 0); } #else -static inline void notify_cmos_timer(void) { } +void ntp_notify_cmos_timer(void) { } #endif @@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai) if (!(time_status & STA_NANO)) txc->time.tv_usec /= NSEC_PER_USEC; - notify_cmos_timer(); - return result; } diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 48b9fff..947ba25 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_seq); raw_spin_unlock_irqrestore(_lock, flags); + ntp_notify_cmos_timer(); + return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes
Commit-ID: 7bd36014460f793c19e7d6c94dab67b0afcfcb7f Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 12 Sep 2013 07:49:51 +0200 timekeeping: Fix HRTICK related deadlock from ntp lock changes Gerlando Falauto reported that when HRTICK is enabled, it is possible to trigger system deadlocks. These were hard to reproduce, as HRTICK has been broken in the past, but seemed to be connected to the timekeeping_seq lock. Since seqlock/seqcount's aren't supported w/ lockdep, I added some extra spinlock based locking and triggered the following lockdep output: [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] ((pool-lock)-rlock){..-...}, at: [810aa9b5] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [810df6df] do_adjtimex+0x7f/0x100 snip [ 15.850051] Chain exists of: (pool-lock)-rlock -- p-pi_lock -- timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051]CPU0CPU1 [ 15.850051] [ 15.850051] lock(timekeeper_lock); [ 15.850051]lock(p-pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock((pool-lock)-rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** The deadlock was introduced by 06c017fdd4dc48451a (timekeeping: Hold timekeepering locks in do_adjtimex and hardpps) in 3.10 This patch avoids this deadlock, by moving the call to schedule_delayed_work() outside of the timekeeper lock critical section. Reported-by: Gerlando Falauto gerlando.fala...@keymile.com Tested-by: Lin Ming min...@gmail.com Signed-off-by: John Stultz john.stu...@linaro.org Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: stable sta...@vger.kernel.org #3.11, 3.10 Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/timex.h | 1 + kernel/time/ntp.c | 6 ++ kernel/time/timekeeping.c | 2 ++ 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/timex.h b/include/linux/timex.h index b3726e6..dd3edd7 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *); extern void hardpps(const struct timespec *, const struct timespec *); int read_current_timer(unsigned long *timer_val); +void ntp_notify_cmos_timer(void); /* The clock frequency of the i8253/i8254 PIT */ #define PIT_TICK_RATE 1193182ul diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 8f5b3b9..bb22151 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work) schedule_delayed_work(sync_cmos_work, timespec_to_jiffies(next)); } -static void notify_cmos_timer(void) +void ntp_notify_cmos_timer(void) { schedule_delayed_work(sync_cmos_work, 0); } #else -static inline void notify_cmos_timer(void) { } +void ntp_notify_cmos_timer(void) { } #endif @@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai) if (!(time_status STA_NANO)) txc-time.tv_usec /= NSEC_PER_USEC; - notify_cmos_timer(); - return result; } diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 48b9fff..947ba25 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(timekeeper_seq); raw_spin_unlock_irqrestore(timekeeper_lock, flags); + ntp_notify_cmos_timer(); + return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
Commit-ID: b4f711ee03d28f776fd2324fd0bd999cc428e4d2 Gitweb: http://git.kernel.org/tip/b4f711ee03d28f776fd2324fd0bd999cc428e4d2 Author: John Stultz AuthorDate: Wed, 24 Apr 2013 11:32:56 -0700 Committer: Thomas Gleixner CommitDate: Tue, 14 May 2013 20:54:06 +0200 time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: Kay Sievers Signed-off-by: John Stultz Cc: stable #3.9 Cc: Feng Tang Cc: Jason Gunthorpe Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- arch/x86/Kconfig | 1 - drivers/rtc/Kconfig | 2 -- include/linux/time.h | 4 kernel/time/Kconfig | 5 - 4 files changed, 12 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5db2117..45c4124 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -108,7 +108,6 @@ config X86 select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC) select GENERIC_TIME_VSYSCALL if X86_64 select KTIME_SCALAR if X86_32 - select ALWAYS_USE_PERSISTENT_CLOCK select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select HAVE_CONTEXT_TRACKING if X86_64 diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 0c81915..b983813 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -20,7 +20,6 @@ if RTC_CLASS config RTC_HCTOSYS bool "Set system time from RTC on startup and resume" default y - depends on !ALWAYS_USE_PERSISTENT_CLOCK help If you say yes here, the system time (wall clock) will be set using the value read from a specified RTC device. This is useful to avoid @@ -29,7 +28,6 @@ config RTC_HCTOSYS config RTC_SYSTOHC bool "Set the RTC time based on NTP synchronization" default y - depends on !ALWAYS_USE_PERSISTENT_CLOCK help If you say yes here, the system time (wall clock) will be stored in the RTC specified by RTC_HCTOSYS_DEVICE approximately every 11 diff --git a/include/linux/time.h b/include/linux/time.h index 22d81b3..d5d229b 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -117,14 +117,10 @@ static inline bool timespec_valid_strict(const struct timespec *ts) extern bool persistent_clock_exist; -#ifdef ALWAYS_USE_PERSISTENT_CLOCK -#define has_persistent_clock() true -#else static inline bool has_persistent_clock(void) { return persistent_clock_exist; } -#endif extern void read_persistent_clock(struct timespec *ts); extern void read_boot_clock(struct timespec *ts); diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 24510d8..b696922 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -12,11 +12,6 @@ config CLOCKSOURCE_WATCHDOG config ARCH_CLOCKSOURCE_DATA bool -# Platforms has a persistent clock -config ALWAYS_USE_PERSISTENT_CLOCK - bool - default n - # Timekeeping vsyscall support config GENERIC_TIME_VSYSCALL bool -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
Commit-ID: b4f711ee03d28f776fd2324fd0bd999cc428e4d2 Gitweb: http://git.kernel.org/tip/b4f711ee03d28f776fd2324fd0bd999cc428e4d2 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 24 Apr 2013 11:32:56 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Tue, 14 May 2013 20:54:06 +0200 time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: Kay Sievers k...@vrfy.org Signed-off-by: John Stultz john.stu...@linaro.org Cc: stable sta...@vger.kernel.org #3.9 Cc: Feng Tang feng.t...@intel.com Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/x86/Kconfig | 1 - drivers/rtc/Kconfig | 2 -- include/linux/time.h | 4 kernel/time/Kconfig | 5 - 4 files changed, 12 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5db2117..45c4124 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -108,7 +108,6 @@ config X86 select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 X86_LOCAL_APIC) select GENERIC_TIME_VSYSCALL if X86_64 select KTIME_SCALAR if X86_32 - select ALWAYS_USE_PERSISTENT_CLOCK select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select HAVE_CONTEXT_TRACKING if X86_64 diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 0c81915..b983813 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -20,7 +20,6 @@ if RTC_CLASS config RTC_HCTOSYS bool Set system time from RTC on startup and resume default y - depends on !ALWAYS_USE_PERSISTENT_CLOCK help If you say yes here, the system time (wall clock) will be set using the value read from a specified RTC device. This is useful to avoid @@ -29,7 +28,6 @@ config RTC_HCTOSYS config RTC_SYSTOHC bool Set the RTC time based on NTP synchronization default y - depends on !ALWAYS_USE_PERSISTENT_CLOCK help If you say yes here, the system time (wall clock) will be stored in the RTC specified by RTC_HCTOSYS_DEVICE approximately every 11 diff --git a/include/linux/time.h b/include/linux/time.h index 22d81b3..d5d229b 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -117,14 +117,10 @@ static inline bool timespec_valid_strict(const struct timespec *ts) extern bool persistent_clock_exist; -#ifdef ALWAYS_USE_PERSISTENT_CLOCK -#define has_persistent_clock() true -#else static inline bool has_persistent_clock(void) { return persistent_clock_exist; } -#endif extern void read_persistent_clock(struct timespec *ts); extern void read_boot_clock(struct timespec *ts); diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 24510d8..b696922 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -12,11 +12,6 @@ config CLOCKSOURCE_WATCHDOG config ARCH_CLOCKSOURCE_DATA bool -# Platforms has a persistent clock -config ALWAYS_USE_PERSISTENT_CLOCK - bool - default n - # Timekeeping vsyscall support config GENERIC_TIME_VSYSCALL bool -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Make sure to notify hrtimers when TAI offset changes
Commit-ID: 4e8f8b34b92b6514cc070aeb94d317cadd5071d7 Gitweb: http://git.kernel.org/tip/4e8f8b34b92b6514cc070aeb94d317cadd5071d7 Author: John Stultz AuthorDate: Wed, 10 Apr 2013 12:41:49 -0700 Committer: Thomas Gleixner CommitDate: Thu, 11 Apr 2013 10:19:44 +0200 timekeeping: Make sure to notify hrtimers when TAI offset changes Now that we have CLOCK_TAI timers, make sure we notify hrtimer code when TAI offset is changed. Signed-off-by: John Stultz Link: http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c4d2a87..675f720 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -607,6 +607,7 @@ void timekeeping_set_tai_offset(s32 tai_offset) __timekeeping_set_tai_offset(tk, tai_offset); write_seqcount_end(_seq); raw_spin_unlock_irqrestore(_lock, flags); + clock_was_set(); } /** @@ -1639,7 +1640,7 @@ int do_adjtimex(struct timex *txc) struct timekeeper *tk = unsigned long flags; struct timespec ts; - s32 tai; + s32 orig_tai, tai; int ret; /* Validate the data before disabling interrupts */ @@ -1663,10 +1664,13 @@ int do_adjtimex(struct timex *txc) raw_spin_lock_irqsave(_lock, flags); write_seqcount_begin(_seq); - tai = tk->tai_offset; + orig_tai = tai = tk->tai_offset; ret = __do_adjtimex(txc, , ); - __timekeeping_set_tai_offset(tk, tai); + if (tai != orig_tai) { + __timekeeping_set_tai_offset(tk, tai); + clock_was_set_delayed(); + } write_seqcount_end(_seq); raw_spin_unlock_irqrestore(_lock, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/core] timekeeping: Make sure to notify hrtimers when TAI offset changes
Commit-ID: 4e8f8b34b92b6514cc070aeb94d317cadd5071d7 Gitweb: http://git.kernel.org/tip/4e8f8b34b92b6514cc070aeb94d317cadd5071d7 Author: John Stultz john.stu...@linaro.org AuthorDate: Wed, 10 Apr 2013 12:41:49 -0700 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Thu, 11 Apr 2013 10:19:44 +0200 timekeeping: Make sure to notify hrtimers when TAI offset changes Now that we have CLOCK_TAI timers, make sure we notify hrtimer code when TAI offset is changed. Signed-off-by: John Stultz john.stu...@linaro.org Link: http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stu...@linaro.org Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/timekeeping.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c4d2a87..675f720 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -607,6 +607,7 @@ void timekeeping_set_tai_offset(s32 tai_offset) __timekeeping_set_tai_offset(tk, tai_offset); write_seqcount_end(timekeeper_seq); raw_spin_unlock_irqrestore(timekeeper_lock, flags); + clock_was_set(); } /** @@ -1639,7 +1640,7 @@ int do_adjtimex(struct timex *txc) struct timekeeper *tk = timekeeper; unsigned long flags; struct timespec ts; - s32 tai; + s32 orig_tai, tai; int ret; /* Validate the data before disabling interrupts */ @@ -1663,10 +1664,13 @@ int do_adjtimex(struct timex *txc) raw_spin_lock_irqsave(timekeeper_lock, flags); write_seqcount_begin(timekeeper_seq); - tai = tk-tai_offset; + orig_tai = tai = tk-tai_offset; ret = __do_adjtimex(txc, ts, tai); - __timekeeping_set_tai_offset(tk, tai); + if (tai != orig_tai) { + __timekeeping_set_tai_offset(tk, tai); + clock_was_set_delayed(); + } write_seqcount_end(timekeeper_seq); raw_spin_unlock_irqrestore(timekeeper_lock, flags); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Fix timeekeping_get_ns overflow on 32bit systems
Commit-ID: ec145babe754f9ea1079034a108104b6001e001c Gitweb: http://git.kernel.org/tip/ec145babe754f9ea1079034a108104b6001e001c Author: John Stultz AuthorDate: Tue, 11 Sep 2012 19:26:03 -0400 Committer: Ingo Molnar CommitDate: Thu, 13 Sep 2012 17:39:14 +0200 time: Fix timeekeping_get_ns overflow on 32bit systems Daniel Lezcano reported seeing multi-second stalls from keyboard input on his T61 laptop when NOHZ and CPU_IDLE were enabled on a 32bit kernel. He bisected the problem down to commit 1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec"). After reproducing this issue, I narrowed the problem down to the fact that timekeeping_get_ns() returns a 64bit nsec value that hasn't been accumulated. In some cases this value was being then stored in timespec.tv_nsec (which is a long). On 32bit systems, with idle times larger then 4 seconds (or less, depending on the value of xtime_nsec), the returned nsec value would overflow 32bits. This limited kept time from increasing, causing timers to not expire. The fix is to make sure we don't directly store the result of timekeeping_get_ns() into a tv_nsec field, instead using a 64bit nsec value which can then be added into the timespec via timespec_add_ns(). Reported-and-bisected-by: Daniel Lezcano Tested-by: Daniel Lezcano Signed-off-by: John Stultz Acked-by: Prarit Bhargava Cc: Richard Cochran Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 34e5eac..d3b91e7 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts) seq = read_seqbegin(>lock); ts->tv_sec = tk->xtime_sec; - ts->tv_nsec = timekeeping_get_ns(tk); + nsecs = timekeeping_get_ns(tk); } while (read_seqretry(>lock, seq)); + ts->tv_nsec = 0; timespec_add_ns(ts, nsecs); } EXPORT_SYMBOL(getnstimeofday); @@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts) { struct timekeeper *tk = struct timespec tomono; + s64 nsec; unsigned int seq; WARN_ON(timekeeping_suspended); @@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts) do { seq = read_seqbegin(>lock); ts->tv_sec = tk->xtime_sec; - ts->tv_nsec = timekeeping_get_ns(tk); + nsec = timekeeping_get_ns(tk); tomono = tk->wall_to_monotonic; } while (read_seqretry(>lock, seq)); - set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec, - ts->tv_nsec + tomono.tv_nsec); + ts->tv_sec += tomono.tv_sec; + ts->tv_nsec = 0; + timespec_add_ns(ts, nsec + tomono.tv_nsec); } EXPORT_SYMBOL_GPL(ktime_get_ts); @@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts) { struct timekeeper *tk = struct timespec tomono, sleep; + s64 nsec; unsigned int seq; WARN_ON(timekeeping_suspended); @@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts) do { seq = read_seqbegin(>lock); ts->tv_sec = tk->xtime_sec; - ts->tv_nsec = timekeeping_get_ns(tk); + nsec = timekeeping_get_ns(tk); tomono = tk->wall_to_monotonic; sleep = tk->total_sleep_time; } while (read_seqretry(>lock, seq)); - set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec + sleep.tv_sec, - ts->tv_nsec + tomono.tv_nsec + sleep.tv_nsec); + ts->tv_sec += tomono.tv_sec + sleep.tv_sec; + ts->tv_nsec = 0; + timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec); } EXPORT_SYMBOL_GPL(get_monotonic_boottime); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Fix timeekeping_get_ns overflow on 32bit systems
Commit-ID: ec145babe754f9ea1079034a108104b6001e001c Gitweb: http://git.kernel.org/tip/ec145babe754f9ea1079034a108104b6001e001c Author: John Stultz john.stu...@linaro.org AuthorDate: Tue, 11 Sep 2012 19:26:03 -0400 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 13 Sep 2012 17:39:14 +0200 time: Fix timeekeping_get_ns overflow on 32bit systems Daniel Lezcano reported seeing multi-second stalls from keyboard input on his T61 laptop when NOHZ and CPU_IDLE were enabled on a 32bit kernel. He bisected the problem down to commit 1e75fa8be9fb6 (time: Condense timekeeper.xtime into xtime_sec). After reproducing this issue, I narrowed the problem down to the fact that timekeeping_get_ns() returns a 64bit nsec value that hasn't been accumulated. In some cases this value was being then stored in timespec.tv_nsec (which is a long). On 32bit systems, with idle times larger then 4 seconds (or less, depending on the value of xtime_nsec), the returned nsec value would overflow 32bits. This limited kept time from increasing, causing timers to not expire. The fix is to make sure we don't directly store the result of timekeeping_get_ns() into a tv_nsec field, instead using a 64bit nsec value which can then be added into the timespec via timespec_add_ns(). Reported-and-bisected-by: Daniel Lezcano daniel.lezc...@linaro.org Tested-by: Daniel Lezcano daniel.lezc...@linaro.org Signed-off-by: John Stultz john.stu...@linaro.org Acked-by: Prarit Bhargava pra...@redhat.com Cc: Richard Cochran richardcoch...@gmail.com Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/time/timekeeping.c | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 34e5eac..d3b91e7 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts) seq = read_seqbegin(tk-lock); ts-tv_sec = tk-xtime_sec; - ts-tv_nsec = timekeeping_get_ns(tk); + nsecs = timekeeping_get_ns(tk); } while (read_seqretry(tk-lock, seq)); + ts-tv_nsec = 0; timespec_add_ns(ts, nsecs); } EXPORT_SYMBOL(getnstimeofday); @@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts) { struct timekeeper *tk = timekeeper; struct timespec tomono; + s64 nsec; unsigned int seq; WARN_ON(timekeeping_suspended); @@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts) do { seq = read_seqbegin(tk-lock); ts-tv_sec = tk-xtime_sec; - ts-tv_nsec = timekeeping_get_ns(tk); + nsec = timekeeping_get_ns(tk); tomono = tk-wall_to_monotonic; } while (read_seqretry(tk-lock, seq)); - set_normalized_timespec(ts, ts-tv_sec + tomono.tv_sec, - ts-tv_nsec + tomono.tv_nsec); + ts-tv_sec += tomono.tv_sec; + ts-tv_nsec = 0; + timespec_add_ns(ts, nsec + tomono.tv_nsec); } EXPORT_SYMBOL_GPL(ktime_get_ts); @@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts) { struct timekeeper *tk = timekeeper; struct timespec tomono, sleep; + s64 nsec; unsigned int seq; WARN_ON(timekeeping_suspended); @@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts) do { seq = read_seqbegin(tk-lock); ts-tv_sec = tk-xtime_sec; - ts-tv_nsec = timekeeping_get_ns(tk); + nsec = timekeeping_get_ns(tk); tomono = tk-wall_to_monotonic; sleep = tk-total_sleep_time; } while (read_seqretry(tk-lock, seq)); - set_normalized_timespec(ts, ts-tv_sec + tomono.tv_sec + sleep.tv_sec, - ts-tv_nsec + tomono.tv_nsec + sleep.tv_nsec); + ts-tv_sec += tomono.tv_sec + sleep.tv_sec; + ts-tv_nsec = 0; + timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec); } EXPORT_SYMBOL_GPL(get_monotonic_boottime); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/