Re: [Bug 212265] New: clock_gettime(CLOCK_TAI, ...) should return an error when TAI has not been configured
On Fri, Mar 26, 2021 at 08:28:59PM -0700, Richard Cochran wrote: > Using ntpd on Debian, the service will set the offset, but only after > synchronization with the upstream server has been established, and > this takes about five minutes, IIRC. With the iburst option it shouldn't take more than 10 seconds. There might be an issue wrt stepping the clock when the initial offset is large. In Fedora and derived distros using chrony by default the TAI-UTC offset should be set right on the first update of the clock as expected. > Getting back to the original point of the kernel returning an error, > I don't see a need for this. Applications that require correct leap > seconds can simply call adjtimex() and wait until the initial zero > value is changed by ntpd/etc to the correct offset. That isn't > fundamentally harder than calling clock_gettime() and waiting until > the error would go away. There are at least two issues with handling a zero offset as a special value. One is that zero could potentially be a valid value in distant future. The other is that the kernel updates the offset when a leap second is inserted/deleted even if the original offset is zero, so checking for zero (in the kernel or an application) works only until the first leap second after boot. The kernel would need to set a flag that the offset was set. Returning an error in clock_gettime() until the offset is set sounds reasonable to me, but I have no idea how many of the existing applications it would break. -- Miroslav Lichvar
Re: [patch 5/8] ntp: Make the RTC synchronization more reliable
On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote: > Switch it to an hrtimer instead which schedules the actual update work. The > hrtimer will expire precisely (max 1 jiffie delay when high resolution > timers are not available). The actual scheduling delay of the work is the > same as before. It works well in my tests. > This becomes now: > > if (ntp_synced() && !hrtimer_is_queued(_hrtimer)) > queue_work(system_power_efficient_wq, _work, 0); > > which is racy when the hrtimer has expired and the work is currently > executed and has not yet managed to rearm the hrtimer. > > Not a big problem as it just schedules work for nothing. No more unexpected updates of the RTC observed. Tested-by: Miroslav Lichvar Thanks, -- Miroslav Lichvar
Re: [PATCH] rtc: adapt allowed RTC update error
On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote: > On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > > Something like the completely untested below should make this reliable > > and only needs to retry when the work is running late (busy machine), > > but the wakeup will be on time or at max 1 jiffie off when high > > resolution timers are not available or disabled. > > It seems to work nicely. In my test most of the updates succeeded on > the first attempt hitting the right tick, the rest succeeding on the > second attempt. Only when the clock was set to run 10% faster, it > needed few more attempts to converge to the target time. I noticed an observable change wrt adjtimex() calls though. It seems it now reschedules the RTC update, i.e. there can be more than one update per 11 minutes. Was this intended? > @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void) > > if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || > IS_ENABLED(CONFIG_RTC_SYSTOHC)) > - queue_delayed_work(system_power_efficient_wq, _work, 0); > + queue_work(system_power_efficient_wq, _work); > } -- Miroslav Lichvar
Re: [PATCH] rtc: adapt allowed RTC update error
On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > Something like the completely untested below should make this reliable > and only needs to retry when the work is running late (busy machine), > but the wakeup will be on time or at max 1 jiffie off when high > resolution timers are not available or disabled. It seems to work nicely. In my test most of the updates succeeded on the first attempt hitting the right tick, the rest succeeding on the second attempt. Only when the clock was set to run 10% faster, it needed few more attempts to converge to the target time. Thanks, -- Miroslav Lichvar
[PATCHv2] rtc: adapt allowed RTC update error
When the system clock is marked as synchronized via adjtimex(), the kernel is expected to copy the system time to the RTC every 11 minutes. There are reports that it doesn't always work reliably. It seems the current requirement for the RTC update to happen within 5 ticks of the target time in some cases can consistently fail for hours or even days. It is better to set the RTC with a larger error than let it drift for too long. Instead of increasing the constant again, use a static variable to count the checks and with each failed check increase the allowed error by one jiffie. Reset the counter when the check finally succeeds. This will allow the RTC update to keep good accuracy if it can happen in the first few attempts and it will not take more than a minute if the timing is consistently bad for any reason. Signed-off-by: Miroslav Lichvar Cc: Thomas Gleixner Cc: John Stultz Cc: Prarit Bhargava Cc: Jason Gunthorpe --- Notes: v2: - moved the static variable to callers in ntp.c drivers/rtc/systohc.c | 6 -- include/linux/rtc.h | 14 +- kernel/time/ntp.c | 9 +++-- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c index 8b70f0520e13..0777f590cdae 100644 --- a/drivers/rtc/systohc.c +++ b/drivers/rtc/systohc.c @@ -5,6 +5,7 @@ /** * rtc_set_ntp_time - Save NTP synchronized time to the RTC * @now: Current time of day + * @attempt: Number of previous failures used to adjust allowed error * @target_nsec: pointer for desired now->tv_nsec value * * Replacement for the NTP platform function update_persistent_clock64 @@ -18,7 +19,8 @@ * * If temporary failure is indicated the caller should try again 'soon' */ -int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) +int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt, +unsigned long *target_nsec) { struct rtc_device *rtc; struct rtc_time tm; @@ -44,7 +46,7 @@ int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) * it does not we update target_nsec and return EPROTO to make the ntp * code try again later. */ - ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, _set, ); + ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, attempt, _set, ); if (!ok) { err = -EPROTO; goto out_close; diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 22d1575e4991..9f3326b43620 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -165,7 +165,8 @@ int __rtc_register_device(struct module *owner, struct rtc_device *rtc); extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm); extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm); -extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec); +extern int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt, + unsigned long *target_nsec); int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm); extern int rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alrm); @@ -213,24 +214,27 @@ static inline bool is_leap_year(unsigned int year) * a zero in tv_nsecs, such that: *to_set - set_delay_nsec == now +/- FUZZ * + * The allowed error starts at 5 jiffies on the first attempt and is increased + * with each failed attempt to make sure the RTC will be set at some point, + * even if the timing is consistently inaccurate. */ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, + unsigned int attempt, struct timespec64 *to_set, const struct timespec64 *now) { - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; + unsigned long time_set_nsec_fuzz = (5 + attempt) * TICK_NSEC; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; *to_set = timespec64_add(*now, delay); - if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec < time_set_nsec_fuzz) { to_set->tv_nsec = 0; return true; } - if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) { to_set->tv_sec++; to_set->tv_nsec = 0; return true; diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 069ca78fb0bf..893bc7ed7845 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -531,6 +531,7 @@ static void sched_sync_hw_clock(struct timespec64 now, static void sync_rtc_clock(void) { + static unsigned int attempt; unsigned long target_nsec; struct timespe
Re: [PATCH] rtc: adapt allowed RTC update error
On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote: > On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote: > > + unsigned long time_set_nsec_fuzz; > > + static unsigned int attempt; > > Adding a static value instide a static inline should not be done Well, grepping through the other header files in include/linux, this would not be the first case. > I'm not sure using a static like this is the best idea anyhow, if you > want something like this it should be per-rtc, not global If there are multiple RTCs, are they all updated in this 11-minute sync? > Did you look at why time has become so in-accurate in your system? 5 > jiffies is usually a pretty long time? I found no good explanation. It seems to depend on what system is doing, if it's idle, etc. I suspect it's a property of the workqueues that they cannot generally guarantee the jobs to run exactly on time. I tried switching to the non-power-efficient and high priority workqueues and it didn't seem to make a big difference. -- Miroslav Lichvar
[PATCH] rtc: adapt allowed RTC update error
When the system clock is marked as synchronized via adjtimex(), the kernel is expected to copy the system time to the RTC every 11 minutes. There are reports that it doesn't always work reliably. It seems the current requirement for the RTC update to happen within 5 ticks of the target time in some cases can consistently fail for hours or even days. It is better to set the RTC with a larger error than let it drift for too long. Add a static variable to rtc_tv_nsec_ok() to count the checks. With each failed check, relax the requirement by one jiffie, and reset the counter when it finally succeeds. This should allow the RTC update to happen in a minute at most. Signed-off-by: Miroslav Lichvar Cc: Thomas Gleixner Cc: John Stultz Cc: Prarit Bhargava Cc: Jason Gunthorpe --- include/linux/rtc.h | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 22d1575e4991..8d105f10ef6a 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) { - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; + unsigned long time_set_nsec_fuzz; + static unsigned int attempt; *to_set = timespec64_add(*now, delay); - if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) { + /* +* Determine allowed error in tv_nsec. Start at 5 jiffies and add a +* jiffie with each failed attempt to make sure the RTC will be set at +* some point, even if the update cannot be scheduled very accurately. +*/ + time_set_nsec_fuzz = (5 + attempt++) * TICK_NSEC; + + if (to_set->tv_nsec < time_set_nsec_fuzz) { to_set->tv_nsec = 0; + attempt = 0; return true; } - if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) { to_set->tv_sec++; to_set->tv_nsec = 0; + attempt = 0; return true; } return false; -- 2.26.2
Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
On Wed, Aug 21, 2019 at 11:53:12AM +0200, Hubert Feurstein wrote: > Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar > > Because those reports/statistics are important in calculation of > > maximum error. If someone had a requirement for a clock to be accurate > > to 1.5 microseconds and the ioctl returned a delay indicating a > > sufficient accuracy when in reality it could be worse, that would be a > > problem. > > > Ok, I understand your point. But including the MDIO completion into > delay calculation > will indicate a much wore precision as it actually is. That's ok. It's the same with PCIe devices. It takes about 500 ns to read a PCI register, so we know in the worst case the offset is accurate to 250 ns. It's probably much better, maybe to 50 ns, but we don't really know. We don't know how much asymmetry there is in the PCIe delay (it certainly is not zero), or how much time the NIC actually needs to respond to the command and when exactly it reads the clock. > When the MDIO > driver implements > the PTP system timestamping as follows ... > > ptp_read_system_prets(bus->ptp_sts); > writel(value, mdio-reg) > ptp_read_system_postts(bus->ptp_sts); > > ... then we catch already the error caused by interrupts which hit the > pre/post_ts section. > Now we only have the additional error of one MDIO clock cycle > (~400ns). Because I expect > the MDIO controller to shift out the MDIO frame on the next MDIO clock > cycle. Is this always the case? > So if I subtract > one MDIO clock cycle from pre_ts and add one MDIO clock cycle to > post_ts the error indication > would be sufficiently corrected IMHO. If I understand it correctly, this ignores the time needed for the frame to be received, decoded and the clock to be read. -- Miroslav Lichvar
Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
On Tue, Aug 20, 2019 at 06:56:56PM +0200, Hubert Feurstein wrote: > Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar > > I think a large jitter is ok in this case. We just need to timestamp > > something that we know for sure happened after the PHC timestamp. It > > should have no impact on the offset and its stability, just the > > reported delay. A test with phc2sys should be able to confirm that. > > phc2sys selects the measurement with the shortest delay, which has > > least uncertainty. I'd say that applies to both interrupt and polling. > > > > If it is difficult to specify the minimum interrupt delay, I'd still > > prefer an overly pessimistic interval assuming a zero delay. > > > Currently I do not see the benefit from this. The original intention was to > compensate for the remaining offset as good as possible. That's ok, but IMHO the change should not break the assumptions of existing application and users. > The current code > of phc2sys uses the delay only for the filtering of the measurement record > with the shortest delay and for reporting and statistics. Why not simple shift > the timestamps with the offset to the point where we expect the PHC timestamp > to be captured, and we have a very good result compared to where we came > from. Because those reports/statistics are important in calculation of maximum error. If someone had a requirement for a clock to be accurate to 1.5 microseconds and the ioctl returned a delay indicating a sufficient accuracy when in reality it could be worse, that would be a problem. BTW, phc2sys is not the only user of the ioctl. -- Miroslav Lichvar
Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote: > > - take a second "post" system timestamp after the completion > > For this hardware, completion is an interrupt, which has a lot of > jitter on it. But this hardware is odd, in that it uses an > interrupt. Every other MDIO bus controller uses polled IO, with an > mdelay(10) or similar between each poll. So the jitter is going to be > much larger. I think a large jitter is ok in this case. We just need to timestamp something that we know for sure happened after the PHC timestamp. It should have no impact on the offset and its stability, just the reported delay. A test with phc2sys should be able to confirm that. phc2sys selects the measurement with the shortest delay, which has least uncertainty. I'd say that applies to both interrupt and polling. If it is difficult to specify the minimum interrupt delay, I'd still prefer an overly pessimistic interval assuming a zero delay. -- Miroslav Lichvar
Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
On Tue, Aug 20, 2019 at 02:29:27PM +0200, Hubert Feurstein wrote: > Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar > > This is important to not break the estimation of maximum error in the > > measured offset. Applications using the ioctl may assume that the > > maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by > > phc2sys). That would not work if the delay could be occasionally 50 > > microseconds for instance, i.e. the post_ts timestamp would be earlier > > than the PHC timestamp. > > > If the timestamps are taken in the MDIO driver (imx-fec in my case), then > everything is quite deterministic (see results in the cover letter). Of > course, > it still can be improved slightly, by splitting up the writel into iowmb and > write_relaxed and disable the interrupts while capturing the timestamps > (as I did in my original RFC patch). But phc2sys takes by default 5 > measurements > and uses the one with the smallest delay, so this shouldn't be necessary. The delay that phc2sys sees is the difference between post_ts and pre_ts, which doesn't contain the actual delay, right? So, I'm not sure how well the phc2sys filtering actually works. Even if the measured delay was related to the write delay, would be 5 measurements always enough to get a "correct" sample? > Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts > the timestamp is aligned with the PHC timestamp, but this also increases > the delay which is reported by phc2sys (~26us). But the maximum error > which must be expected is definitely much less (< 1us). So maybe it is better > to shift both timestamps pre_ts and post_ts by ptp_sts_offset. If you could guarantee that [pre_ts + ptp_sts_offset, post_ts + ptp_sts_offset] always contains the PHC timestamps, then that would be great. From what Andrew is writing, this doesn't seem to be the case. I'd suggest a different approach: - specify a minimum delay for the write and a minimum delay for the completion (if they are not equal) - take a second "post" system timestamp after the completion - adjust pre_ts and post_ts so that the middle point is equal to what you have now, but the interval contains both pre_ts + min_write_delay and post2_ts - min_completion_delay This way you get the best stability and also a delay that correctly estimates the maximum error. -- Miroslav Lichvar
Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote: > + /* PTP offset compensation: > + * After the MDIO access is completed (from the chip perspective), the > + * switch chip will snapshot the PHC timestamp. To make sure our system > + * timestamp corresponds to the PHC timestamp, we have to add the > + * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys > + * takes the average of pre_ts and post_ts to calculate the final > + * system timestamp. With this in mind, we have to add ptp_sts_offset > + * twice to post_ts, in order to not introduce an constant time offset. > + */ > + if (sts) > + timespec64_add_ns(>post_ts, 2 * bus->ptp_sts_offset); This correction looks good to me. Is the MDIO write delay constant in reality, or does it at least have an upper bound? That is, is it always true that the post_ts timestamp does not point to a time before the PHC timestamp was actually taken? This is important to not break the estimation of maximum error in the measured offset. Applications using the ioctl may assume that the maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by phc2sys). That would not work if the delay could be occasionally 50 microseconds for instance, i.e. the post_ts timestamp would be earlier than the PHC timestamp. -- Miroslav Lichvar
Re: [RFC] net: dsa: mv88e6xxx: ptp: improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec
On Mon, Aug 05, 2019 at 12:54:49PM +0300, Vladimir Oltean wrote: > - Along the lines of the above, I wonder how badly would the > maintainers shout at the proposal of adding a struct > ptp_system_timestamp pointer and its associated lock in struct device. > That way at least you don't have to change the API, like you did with > mdiobus_write_nested_ptp. Relatively speaking, this is the least > amount of intrusion (although, again, far from beautiful). That would make sense to me, if there are other drivers that could use it. > I also added Miroslav Lichvar, who originally created the > PTP_SYS_OFFSET_EXTENDED ioctl, maybe he can share some ideas on how it > is best served. The idea behind that ioctl was to allow drivers to take the system timestamps as close to the reading of the HW clock as possible, excluding delays (and jitter) due to other operations that are necessary to get that timestamp. In the ethernet drivers that was a single PCI read. If in this case it's a "write" operation that triggers the reading of the HW clock, then I think the patch is using is the ptp_read_system_*ts() calls correctly. > > --- a/drivers/net/ethernet/freescale/fec_main.c > > +++ b/drivers/net/ethernet/freescale/fec_main.c > > @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, > > int mii_id, int regnum, > > > > reinit_completion(>mdio_done); > > > > - /* start a write op */ > > - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE | > > - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) | > > - FEC_MMFR_TA | FEC_MMFR_DATA(value), > > - fep->hwp + FEC_MII_DATA); > > + if (bus->ptp_sts) { > > + unsigned long flags = 0; > > + local_irq_save(flags); > > + __iowmb(); > > + /* >Take the timestamp *after* the memory barrier */ > > + ptp_read_system_prets(bus->ptp_sts); > > + writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE | > > + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) | > > + FEC_MMFR_TA | FEC_MMFR_DATA(value), > > + fep->hwp + FEC_MII_DATA); > > + ptp_read_system_postts(bus->ptp_sts); > > + local_irq_restore(flags); > > + } else { > > + /* start a write op */ > > + writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE | > > + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) | > > + FEC_MMFR_TA | FEC_MMFR_DATA(value), > > + fep->hwp + FEC_MII_DATA); > > + } -- Miroslav Lichvar
[tip:timers/core] kselftests: timers: freq-step: Update maximum acceptable precision and errors
Commit-ID: d21e43f2ef32ba3242687dbedb3c4b9a76b3eebc Gitweb: https://git.kernel.org/tip/d21e43f2ef32ba3242687dbedb3c4b9a76b3eebc Author: Miroslav Lichvar AuthorDate: Tue, 18 Jun 2019 18:06:12 +0200 Committer: Thomas Gleixner CommitDate: Sat, 22 Jun 2019 11:28:53 +0200 kselftests: timers: freq-step: Update maximum acceptable precision and errors PTI has a significant impact on precision of the MONOTONIC_RAW clock, which prevents a lot of computers from running the freq-step test. Increase the maximum acceptable precision for the test to not be skipped to 500 nanoseconds. After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier directly from NTP tick length") the frequency and time errors should be much smaller. Reduce the maximum acceptable values for the test to pass to 0.02 ppm and 50 nanoseconds respectively. Signed-off-by: Miroslav Lichvar Signed-off-by: Thomas Gleixner Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Link: https://lkml.kernel.org/r/20190618160612.21957-1-mlich...@redhat.com --- tools/testing/selftests/timers/freq-step.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c index 8cd10662ffba..4b76450d78d1 100644 --- a/tools/testing/selftests/timers/freq-step.c +++ b/tools/testing/selftests/timers/freq-step.c @@ -21,9 +21,9 @@ #define SAMPLE_READINGS 10 #define MEAN_SAMPLE_INTERVAL 0.1 #define STEP_INTERVAL 1.0 -#define MAX_PRECISION 100e-9 -#define MAX_FREQ_ERROR 10e-6 -#define MAX_STDDEV 1000e-9 +#define MAX_PRECISION 500e-9 +#define MAX_FREQ_ERROR 0.02e-6 +#define MAX_STDDEV 50e-9 #ifndef ADJ_SETOFFSET #define ADJ_SETOFFSET 0x0100
[tip:timers/core] ntp: Limit TAI-UTC offset
Commit-ID: d897a4ab11dc8a9fda50d2eccc081a96a6385998 Gitweb: https://git.kernel.org/tip/d897a4ab11dc8a9fda50d2eccc081a96a6385998 Author: Miroslav Lichvar AuthorDate: Tue, 18 Jun 2019 17:47:13 +0200 Committer: Thomas Gleixner CommitDate: Sat, 22 Jun 2019 11:28:53 +0200 ntp: Limit TAI-UTC offset Don't allow the TAI-UTC offset of the system clock to be set by adjtimex() to a value larger than 10 seconds. This prevents an overflow in the conversion to int, prevents the CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock, and it is still large enough to allow leap seconds to be inserted at the maximum rate currently supported by the kernel (once per day) for the next ~270 years, however unlikely it is that someone can survive a catastrophic event which slowed down the rotation of the Earth so much. Reported-by: Weikang shi Signed-off-by: Miroslav Lichvar Signed-off-by: Thomas Gleixner Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Link: https://lkml.kernel.org/r/20190618154713.20929-1-mlich...@redhat.com --- kernel/time/ntp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 8de4f789dc1b..65eb796610dc 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -43,6 +43,7 @@ static u64tick_length_base; #define MAX_TICKADJ500LL /* usecs */ #define MAX_TICKADJ_SCALED \ (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) +#define MAX_TAI_OFFSET 10 /* * phase-lock loop variables @@ -691,7 +692,8 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc, time_constant = max(time_constant, 0l); } - if (txc->modes & ADJ_TAI && txc->constant >= 0) + if (txc->modes & ADJ_TAI && + txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) *time_tai = txc->constant; if (txc->modes & ADJ_OFFSET)
[PATCH repost] kselftests: timers: freq-step: Update maximum acceptable precision and errors
PTI has a significant impact on precision of the MONOTONIC_RAW clock, which prevents a lot of computers from running the freq-step test. Increase the maximum acceptable precision for the test to not be skipped to 500 nanoseconds. After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier directly from NTP tick length") the frequency and time errors should be much smaller. Reduce the maximum acceptable values for the test to pass to 0.02 ppm and 50 nanoseconds respectively. Cc: John Stultz Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Signed-off-by: Miroslav Lichvar --- tools/testing/selftests/timers/freq-step.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c index 8cd10662ffba..4b76450d78d1 100644 --- a/tools/testing/selftests/timers/freq-step.c +++ b/tools/testing/selftests/timers/freq-step.c @@ -21,9 +21,9 @@ #define SAMPLE_READINGS 10 #define MEAN_SAMPLE_INTERVAL 0.1 #define STEP_INTERVAL 1.0 -#define MAX_PRECISION 100e-9 -#define MAX_FREQ_ERROR 10e-6 -#define MAX_STDDEV 1000e-9 +#define MAX_PRECISION 500e-9 +#define MAX_FREQ_ERROR 0.02e-6 +#define MAX_STDDEV 50e-9 #ifndef ADJ_SETOFFSET #define ADJ_SETOFFSET 0x0100 -- 2.17.2
[PATCH] ntp: Limit TAI-UTC offset
Don't allow the TAI-UTC offset of the system clock to be set by adjtimex() to a value larger than 10 seconds. This prevents an overflow in the conversion to int, prevents the CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock, and it is still large enough to allow leap seconds to be inserted at the maximum rate currently supported by the kernel (once per day) for the next ~270 years, however unlikely it is that someone can survive a catastrophic event which slowed down the rotation of the Earth so much. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Signed-off-by: Miroslav Lichvar --- kernel/time/ntp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 8de4f789dc1b..65eb796610dc 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -43,6 +43,7 @@ static u64tick_length_base; #define MAX_TICKADJ500LL /* usecs */ #define MAX_TICKADJ_SCALED \ (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) +#define MAX_TAI_OFFSET 10 /* * phase-lock loop variables @@ -691,7 +692,8 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc, time_constant = max(time_constant, 0l); } - if (txc->modes & ADJ_TAI && txc->constant >= 0) + if (txc->modes & ADJ_TAI && + txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) *time_tai = txc->constant; if (txc->modes & ADJ_OFFSET) -- 2.17.2
Re: [PATCH] time: fix a assignment error in ntp module
On Mon, Jun 17, 2019 at 02:14:57PM +0200, Thomas Gleixner wrote: > On Mon, 17 Jun 2019, 维康石 wrote: > > Yes,the >UINT_MAX value can be passed by > > syscall adjtimex->do_adjtimex->__do_adjtimex->process_adjtimex_modes by the > > proper arugments. > > So there is clearly some sanity check missing, but surely not that > type cast. As the offset is saved in an int (and returned via adjtimex() in the tai field), should be the maximum INT_MAX? We probably also want to avoid overflow in the offset on a leap second and the CLOCK_TAI clock itself, so maybe it would make sense to specify a much smaller maximum like 100? Even 1000 should be good enough for near future. Negative values are not allowed anyway. If the Earth's rotation changed significantly (e.g. hitting a very large asteroid), there probably wouldn't be anyone left to care about TAI. -- Miroslav Lichvar
[tip:timers/urgent] ntp: Allow TAI-UTC offset to be set to zero
Commit-ID: fdc6bae940ee9eb869e493990540098b8c0fd6ab Gitweb: https://git.kernel.org/tip/fdc6bae940ee9eb869e493990540098b8c0fd6ab Author: Miroslav Lichvar AuthorDate: Wed, 17 Apr 2019 10:48:33 +0200 Committer: Thomas Gleixner CommitDate: Thu, 9 May 2019 10:46:58 +0200 ntp: Allow TAI-UTC offset to be set to zero The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock. It is typically set by NTP/PTP implementations and it is automatically updated by the kernel on leap seconds. The initial value is zero (which applications may interpret as unknown), but this value cannot be set by adjtimex. This limitation seems to go back to the original "nanokernel" implementation by David Mills. Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in order to allow setting it back to the initial value. Fixes: 153b5d054ac2 ("ntp: support for TAI") Suggested-by: Ondrej Mosnacek Signed-off-by: Miroslav Lichvar Signed-off-by: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Link: https://lkml.kernel.org/r/20190417084833.7401-1-mlich...@redhat.com --- kernel/time/ntp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 92a90014a925..f43d47c8c3b6 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -690,7 +690,7 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc, time_constant = max(time_constant, 0l); } - if (txc->modes & ADJ_TAI && txc->constant > 0) + if (txc->modes & ADJ_TAI && txc->constant >= 0) *time_tai = txc->constant; if (txc->modes & ADJ_OFFSET)
Re: [PATCH] ntp: Allow TAI-UTC offset to be set to zero
On Wed, Apr 17, 2019 at 11:00:23AM +0200, Ondrej Mosnacek wrote: > On Wed, Apr 17, 2019 at 10:48 AM Miroslav Lichvar wrote: > > Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in > > order to allow setting it back to the initial value. > Thanks for sending the patch! Maybe you (or the committer) could > consider adding: > > Fixes: 153b5d054ac2 ("ntp: support for TAI") To me the change looks more like an extension of the API, rather than a bug fix, so I'd leave that up to the committer. -- Miroslav Lichvar
[PATCH] ntp: Allow TAI-UTC offset to be set to zero
The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock. It is typically set by NTP/PTP implementations and it is automatically updated by the kernel on leap seconds. The initial value is zero (which applications may interpret as unknown), but this value cannot be set by adjtimex. This limitation seems to go back to the original "nanokernel" implementation by David Mills. Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in order to allow setting it back to the initial value. Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Suggested-by: Ondrej Mosnacek Signed-off-by: Miroslav Lichvar --- kernel/time/ntp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 92a90014a925..f43d47c8c3b6 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -690,7 +690,7 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc, time_constant = max(time_constant, 0l); } - if (txc->modes & ADJ_TAI && txc->constant > 0) + if (txc->modes & ADJ_TAI && txc->constant >= 0) *time_tai = txc->constant; if (txc->modes & ADJ_OFFSET) -- 2.17.2
Re: kernel/time/ntp.c: Possible off-by-one error in TAI range check?
On Mon, Apr 15, 2019 at 10:09:43AM +0200, Thomas Gleixner wrote: > > On Mon, Apr 8, 2019 at 10:47 AM Ondrej Mosnacek wrote: > > > Commit 153b5d054ac2 ("ntp: support for TAI") added a possibility to > > > change the TAI offset from userspace via adjtimex(2). The code checks > > > if the input value (txc->constant) is greater than 0 and if it is not, > > > then it doesn't modify the value. Ignoring the fact that this check > > > should probably be in timekeeping_validate_timex() and cause -EINVAL > > > to be returned when false, I find it strange that the check doesn't > > > allow to set the value to 0, which seems to be the default value... > > > > > > Was this behavior intended or should the code actually check for > > > txc->constant >= 0 instead of txc->constant > 0? I guess zero here means "unknown" and maybe the intention was to not allow setting the offset to an unknown value once it has been set to a valid value. The trouble is that after inserting a leap second the offset may change from zero to one. I think it should be changed to allow setting the offset to zero. -- Miroslav Lichvar
Re: [PATCH] timekeeping: Force upper bound for setting CLOCK_REALTIME
On Sat, Mar 23, 2019 at 11:36:19AM +0100, Thomas Gleixner wrote: > It is reasonable to force an upper bound for the various methods of setting > CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum > uptime of 30 years which is plenty enough even for esoteric embedded > systems. That results in an upper bound of year 2232 for setting the time. The patch looks good to me. I like this approach better than using a larger value closer to the overflow (e.g. one week) and stepping the clock back automatically when the clock reaches that time, but I suspect it might possibly break more tests (or any unusual applications messing with time) as a much larger interval is now EINVAL. Thanks, -- Miroslav Lichvar
Re: [RFC PATCH] ntp: Avoid undefined behaviour in second_overflow()
On Tue, Mar 05, 2019 at 05:42:25PM -0800, John Stultz wrote: > > --- a/kernel/time/ntp.c > > +++ b/kernel/time/ntp.c > > @@ -677,6 +677,8 @@ static inline void process_adjtimex_modes(const struct > > timex *txc, s32 *time_tai > > > > if (txc->modes & ADJ_MAXERROR) > > time_maxerror = txc->maxerror; > > + if (time_maxerror > NTP_PHASE_LIMIT) > > + time_maxerror = NTP_PHASE_LIMIT; > > This looks sane to me. > Acked-by: John Stultz > > Though it makes me wonder a bit more about the sanity checking on the > other parameters passed via adjtimex(), tick_usec for instance looks > like it could be similarly problematic. The tick length is checked earlier in timekeeping_validate_timex(), so that should be ok. What I'd like to see clamped is the system time itself. ktime_t overflows on Apr 11 2262. clock_settime() and adjtimex(ADJ_SETOFFSET) can set the time close to the overflow and let everything break. Boot a VM and try this: # date -s 'Apr 11 23:47:15 UTC 2262' There was a patch submitted couple years ago that prevented overflows in 32-bit time_t and ktime_t. http://lkml.iu.edu/hypermail/linux/kernel/1510.0/04719.html -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Fri, Nov 02, 2018 at 12:25:56PM +0100, Thomas Gleixner wrote: > On Fri, 2 Nov 2018, Miroslav Lichvar wrote: > > clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME > > > > or > > > > clocksource -> ? -> MONOTONIC_RAW > > -> MONOTONIC/REALTIME > > That's what we have now. At least I don't see how the raw thing is coupled > in NTP. Oh, right. So, for the first approach when the raw clock is stepped, the same step needs to be applied to the mono clock. That means the mono clock will have two large independent corrections applied to it, which will make it less stable, e.g. steps of up to 8 ns at 100 Hz instead of 4 ns. Otherwise they would drift from each other when not synchronized by NTP/PTP. In the second approach, the corrections of the two clocks are independent, but the NTP tick length needs to be adjusted before the multiplier is calculated to match the frequency of the raw clock. If the correction was applied later, I suspect it would require a +2 mult adjustment or larger steps as in the first approach. That tick adjustment generally won't be perfectly accurate, so an extra small correction would be needed to keep the clocks in sync over long intervals. This could be a +1 adjustment of the NTP tick length, or it could be a small forward step. I'm not sure what would be easier to implement. Does that make sense? -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Fri, Nov 02, 2018 at 12:25:56PM +0100, Thomas Gleixner wrote: > On Fri, 2 Nov 2018, Miroslav Lichvar wrote: > > clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME > > > > or > > > > clocksource -> ? -> MONOTONIC_RAW > > -> MONOTONIC/REALTIME > > That's what we have now. At least I don't see how the raw thing is coupled > in NTP. Oh, right. So, for the first approach when the raw clock is stepped, the same step needs to be applied to the mono clock. That means the mono clock will have two large independent corrections applied to it, which will make it less stable, e.g. steps of up to 8 ns at 100 Hz instead of 4 ns. Otherwise they would drift from each other when not synchronized by NTP/PTP. In the second approach, the corrections of the two clocks are independent, but the NTP tick length needs to be adjusted before the multiplier is calculated to match the frequency of the raw clock. If the correction was applied later, I suspect it would require a +2 mult adjustment or larger steps as in the first approach. That tick adjustment generally won't be perfectly accurate, so an extra small correction would be needed to keep the clocks in sync over long intervals. This could be a +1 adjustment of the NTP tick length, or it could be a small forward step. I'm not sure what would be easier to implement. Does that make sense? -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote: > On Thu, 1 Nov 2018, John Stultz wrote: > > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner wrote: > > > On Tue, 23 Oct 2018, John Stultz wrote: > > >> However, to be correct, the ntp adjustments made would have to be made > > >> to both the base interval + error, which mucks the math up a fair bit. > > > > > > Hmm, confused as usual. Why would you need to do anything like that? > > > > Because the NTP adjustment is done off of what is now the raw clock. > > If the raw clock is "corrected" the ppb adjustment has to be done off > > of that corrected rate. > > Sure, but why would that require any change? Right now the raw clock is > slightly off and you correct clock monotonic against NTP. So with that > extra correction you just see a slightly different raw clock slew and work > from there. It makes sense to me. I think there are basically two different ways how it could be done. One is to correct the frequency of the raw clock, on which sits the mono/real clock. The other is to create a new raw clock which is separate from the mono/real clock, and add an offset to the NTP frequency to match the frequencies of the two clocks when not synchronized by NTP/PTP. The latter would provide a more stable mono/real clock. clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME or clocksource -> ? -> MONOTONIC_RAW -> MONOTONIC/REALTIME -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote: > On Thu, 1 Nov 2018, John Stultz wrote: > > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner wrote: > > > On Tue, 23 Oct 2018, John Stultz wrote: > > >> However, to be correct, the ntp adjustments made would have to be made > > >> to both the base interval + error, which mucks the math up a fair bit. > > > > > > Hmm, confused as usual. Why would you need to do anything like that? > > > > Because the NTP adjustment is done off of what is now the raw clock. > > If the raw clock is "corrected" the ppb adjustment has to be done off > > of that corrected rate. > > Sure, but why would that require any change? Right now the raw clock is > slightly off and you correct clock monotonic against NTP. So with that > extra correction you just see a slightly different raw clock slew and work > from there. It makes sense to me. I think there are basically two different ways how it could be done. One is to correct the frequency of the raw clock, on which sits the mono/real clock. The other is to create a new raw clock which is separate from the mono/real clock, and add an offset to the NTP frequency to match the frequencies of the two clocks when not synchronized by NTP/PTP. The latter would provide a more stable mono/real clock. clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME or clocksource -> ? -> MONOTONIC_RAW -> MONOTONIC/REALTIME -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote: > On Wed, 24 Oct 2018, Miroslav Lichvar wrote: > > The error is too large to be corrected by stepping on clock updates. > > For a typical TSC frequency we have multiplier in the range of few > > millions, so that's a frequency error of up to few hundred ppb. In the > > old days when the clock was updated 1000 times per second that would > > be hidden in the resolution of the clock, but now with tickless > > kernels those steps would be very noticeable. > That only happens when the system was completely idle for a second and in > that case it's a non issue because the clock is updated before it's > used. So nothing will be able to observe the time jumping forward by a few > or even a few hundreds of nanoseconds. That's great news (to me). I think we should do the same with the mono/real clock. A periodic 4ns step would be better than a slew correcting tens or hundreds of nanoseconds. This would be a significant improvement in accuracy on idle systems, in theory identical to running with nohz=off. Maybe I am missing some important detail, but I think we can just drop the +1 mult adjustment and step on each update by the (truncated) amount that has accumulated in the NTP error register. With the changes that have been made earlier this year the clock should never be ahead, so the step would always be forward. > For the regular case, where CPUs are > busy and the update happens 100/250/1000 times per second the jump forward > will not be noticable at all. I think a 4ns jump at 100 Hz might be noticeable with a good reference clock and large number of measurements, but so would be the current +1 mult adjustment. -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote: > On Wed, 24 Oct 2018, Miroslav Lichvar wrote: > > The error is too large to be corrected by stepping on clock updates. > > For a typical TSC frequency we have multiplier in the range of few > > millions, so that's a frequency error of up to few hundred ppb. In the > > old days when the clock was updated 1000 times per second that would > > be hidden in the resolution of the clock, but now with tickless > > kernels those steps would be very noticeable. > That only happens when the system was completely idle for a second and in > that case it's a non issue because the clock is updated before it's > used. So nothing will be able to observe the time jumping forward by a few > or even a few hundreds of nanoseconds. That's great news (to me). I think we should do the same with the mono/real clock. A periodic 4ns step would be better than a slew correcting tens or hundreds of nanoseconds. This would be a significant improvement in accuracy on idle systems, in theory identical to running with nohz=off. Maybe I am missing some important detail, but I think we can just drop the +1 mult adjustment and step on each update by the (truncated) amount that has accumulated in the NTP error register. With the changes that have been made earlier this year the clock should never be ahead, so the step would always be forward. > For the regular case, where CPUs are > busy and the update happens 100/250/1000 times per second the jump forward > will not be noticable at all. I think a 4ns jump at 100 Hz might be noticeable with a good reference clock and large number of measurements, but so would be the current +1 mult adjustment. -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Wed, Oct 24, 2018 at 01:32:48PM -0400, Christopher Hall wrote: > On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote: > > The error is too large to be corrected by stepping on clock updates. > > For a typical TSC frequency we have multiplier in the range of few > > millions, so that's a frequency error of up to few hundred ppb. In the > > For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the > largest I've seen since measuring this on a few platforms. 20-40 PPB seems > more typical. It looks like there is rounding in the calculation of the multiplier, so for frequencies in the range of 3-5GHz the error will be up to about 149 ppb. Without rounding it would be twice as much, but always fast or slow. > > A better fix might be to modify the calculation of time to use a > > second multiplier, effectively increasing its resolution. However, > > I'm not sure, I'm understanding. Like cascading transforms? While this > would increase the precision, I think it would still drift over days. We > could probably fix up every second though. I was thinking about using a wider multiplier, but avoiding a full 64x64 bit multiplication. For example, in your case with the 3312MHz clock nsec = (cycles * 5065585) >> 24 could be replaced with nsec = (cycles * 5065584) >> 24 + (cycles * 4538763) >> 47 This would take few hours to drift by one nanosecond. If something like this was implemented for the raw clock, it would probably make sense to switch the other clocks too. > > that would slow down all users of the clock. > > Couldn't some clocksources specify an additional multiplier/precision and > others use lower precision? I guess it could, but I'm not sure if people here would be happy with the extra complexity. -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Wed, Oct 24, 2018 at 01:32:48PM -0400, Christopher Hall wrote: > On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote: > > The error is too large to be corrected by stepping on clock updates. > > For a typical TSC frequency we have multiplier in the range of few > > millions, so that's a frequency error of up to few hundred ppb. In the > > For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the > largest I've seen since measuring this on a few platforms. 20-40 PPB seems > more typical. It looks like there is rounding in the calculation of the multiplier, so for frequencies in the range of 3-5GHz the error will be up to about 149 ppb. Without rounding it would be twice as much, but always fast or slow. > > A better fix might be to modify the calculation of time to use a > > second multiplier, effectively increasing its resolution. However, > > I'm not sure, I'm understanding. Like cascading transforms? While this > would increase the precision, I think it would still drift over days. We > could probably fix up every second though. I was thinking about using a wider multiplier, but avoiding a full 64x64 bit multiplication. For example, in your case with the 3312MHz clock nsec = (cycles * 5065585) >> 24 could be replaced with nsec = (cycles * 5065584) >> 24 + (cycles * 4538763) >> 47 This would take few hours to drift by one nanosecond. If something like this was implemented for the raw clock, it would probably make sense to switch the other clocks too. > > that would slow down all users of the clock. > > Couldn't some clocksources specify an additional multiplier/precision and > others use lower precision? I guess it could, but I'm not sure if people here would be happy with the extra complexity. -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote: > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz wrote: > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner wrote: > >> On Fri, 19 Oct 2018, John Stultz wrote: > >>> We might be able to reduce the degree in this case, but I worry the > >>> extra complexity may only cause problems for others. > >> > >> Is it really that complex to add a fixed correction value periodically? The error is too large to be corrected by stepping on clock updates. For a typical TSC frequency we have multiplier in the range of few millions, so that's a frequency error of up to few hundred ppb. In the old days when the clock was updated 1000 times per second that would be hidden in the resolution of the clock, but now with tickless kernels those steps would be very noticeable. If the multiplier was adjusted in the same way as the non-raw clock, there wouldn't be any steps in time, but there would be steps in frequency and the time error would be proportional to the update interval. For a clock updated once per second that's an error of up to a few hundreds of nanoseconds. I agree with John. I think the raw monotonic clock should be stable. It would help if we better understood the use case. If I needed a clock that ticks in an exact proportion to the tsc, why wouldn't I use the tsc directly? Is this about having a fall back in case the tsc cannot be used (e.g. due to unsynchronized CPUs)? If the frequency error was exported, it could be compensated where necessary. Maybe that would work for the original poster? A better fix might be to modify the calculation of time to use a second multiplier, effectively increasing its resolution. However, that would slow down all users of the clock. -- Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote: > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz wrote: > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner wrote: > >> On Fri, 19 Oct 2018, John Stultz wrote: > >>> We might be able to reduce the degree in this case, but I worry the > >>> extra complexity may only cause problems for others. > >> > >> Is it really that complex to add a fixed correction value periodically? The error is too large to be corrected by stepping on clock updates. For a typical TSC frequency we have multiplier in the range of few millions, so that's a frequency error of up to few hundred ppb. In the old days when the clock was updated 1000 times per second that would be hidden in the resolution of the clock, but now with tickless kernels those steps would be very noticeable. If the multiplier was adjusted in the same way as the non-raw clock, there wouldn't be any steps in time, but there would be steps in frequency and the time error would be proportional to the update interval. For a clock updated once per second that's an error of up to a few hundreds of nanoseconds. I agree with John. I think the raw monotonic clock should be stable. It would help if we better understood the use case. If I needed a clock that ticks in an exact proportion to the tsc, why wouldn't I use the tsc directly? Is this about having a fall back in case the tsc cannot be used (e.g. due to unsynchronized CPUs)? If the frequency error was exported, it could be compensated where necessary. Maybe that would work for the original poster? A better fix might be to modify the calculation of time to use a second multiplier, effectively increasing its resolution. However, that would slow down all users of the clock. -- Miroslav Lichvar
Re: [RFC][PATCH v2] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress
On Thu, Jul 05, 2018 at 10:50:31AM -0700, John Stultz wrote: > In the past we've warned when ADJ_OFFSET was in progress, usually > caused by ntpd or some other time adjusting daemon running in non > steady sate, which can cause the skew calculations to be incorrect > > Thus, this patch sets a flag which we check if the clock was being > adjusted when we fail so that we don't cause false negatives. The commit message might need an update as the code no longer sets a flag. Other than that, it looks good to me. Thanks, > --- a/tools/testing/selftests/timers/raw_skew.c > +++ b/tools/testing/selftests/timers/raw_skew.c > @@ -134,6 +134,11 @@ int main(int argv, char **argc) > printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000))); > > if (llabs(eppm - ppm) > 1000) { > + if (tx1.offset || tx2.offset || > + tx1.freq != tx2.freq || tx1.tick != tx2.tick) { > + printf("[SKIP]\n"); > + return ksft_exit_skip("The clock was adjusted > externally. Shutdown NTPd or other time sync daemons\n"); > + } > printf("[FAILED]\n"); > return ksft_exit_fail(); > } > -- > 2.7.4 > -- Miroslav Lichvar
Re: [RFC][PATCH v2] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress
On Thu, Jul 05, 2018 at 10:50:31AM -0700, John Stultz wrote: > In the past we've warned when ADJ_OFFSET was in progress, usually > caused by ntpd or some other time adjusting daemon running in non > steady sate, which can cause the skew calculations to be incorrect > > Thus, this patch sets a flag which we check if the clock was being > adjusted when we fail so that we don't cause false negatives. The commit message might need an update as the code no longer sets a flag. Other than that, it looks good to me. Thanks, > --- a/tools/testing/selftests/timers/raw_skew.c > +++ b/tools/testing/selftests/timers/raw_skew.c > @@ -134,6 +134,11 @@ int main(int argv, char **argc) > printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000))); > > if (llabs(eppm - ppm) > 1000) { > + if (tx1.offset || tx2.offset || > + tx1.freq != tx2.freq || tx1.tick != tx2.tick) { > + printf("[SKIP]\n"); > + return ksft_exit_skip("The clock was adjusted > externally. Shutdown NTPd or other time sync daemons\n"); > + } > printf("[FAILED]\n"); > return ksft_exit_fail(); > } > -- > 2.7.4 > -- Miroslav Lichvar
Re: [RFC][PATCH] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET is in progress
On Tue, Jul 03, 2018 at 12:36:27PM -0700, John Stultz wrote: > In the past we've warned when ADJ_OFFSET was in progress, usually > caused by ntpd or some other time adjusting daemon running in non > steady sate, which can cause the skew calculations to be incorrect Good idea. > Thus, this patch sets a flag which we check when we fail so that > we don't cause false negatives. I'd suggest to check few more things to better detect when something is adjusting the clock. The offset should be zero also at the end of the test and the frequency and tick should be the same as at the beginning of the test. This will detect daemons that set the frequency of the clock directly, not using adjtime() or the PLL. Something like this: @@ -137,9 +135,10 @@ int main(int argv, char **argc) printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000))); if (llabs(eppm - ppm) > 1000) { - if (adj_offset_bad) { + if (tx1.offset || tx2.offset || + tx1.freq != tx2.freq || tx1.tick != tx2.tick) { printf("[SKIP]\n"); - return ksft_exit_skip("ADJ_OFFSET in progress. Shutdown NTPd or other time steering daemons\n"); + return ksft_exit_skip("The clock was adjusted. Shutdown ntpd or other time steering daemons\n"); } printf(" [FAILED]\n"); return ksft_exit_fail(); -- Miroslav Lichvar
Re: [RFC][PATCH] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET is in progress
On Tue, Jul 03, 2018 at 12:36:27PM -0700, John Stultz wrote: > In the past we've warned when ADJ_OFFSET was in progress, usually > caused by ntpd or some other time adjusting daemon running in non > steady sate, which can cause the skew calculations to be incorrect Good idea. > Thus, this patch sets a flag which we check when we fail so that > we don't cause false negatives. I'd suggest to check few more things to better detect when something is adjusting the clock. The offset should be zero also at the end of the test and the frequency and tick should be the same as at the beginning of the test. This will detect daemons that set the frequency of the clock directly, not using adjtime() or the PLL. Something like this: @@ -137,9 +135,10 @@ int main(int argv, char **argc) printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000))); if (llabs(eppm - ppm) > 1000) { - if (adj_offset_bad) { + if (tx1.offset || tx2.offset || + tx1.freq != tx2.freq || tx1.tick != tx2.tick) { printf("[SKIP]\n"); - return ksft_exit_skip("ADJ_OFFSET in progress. Shutdown NTPd or other time steering daemons\n"); + return ksft_exit_skip("The clock was adjusted. Shutdown ntpd or other time steering daemons\n"); } printf(" [FAILED]\n"); return ksft_exit_fail(); -- Miroslav Lichvar
[PATCHv2] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. The update is limited to archs using modern timekeeping (!ARCH_USES_GETTIMEOFFSET). Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- Notes: RFC->v1: - added a new parameter to force the update of the timekeeper to the current NTP tick length only from adjtimex() - added timekeeping_advance() to keep the parameter local to timekeeping.c v1->v2: - replaced bool parameter with enum - changed timekeeping_advance() to not make any updates for TK_ADV_FREQ with !ARCH_USES_GETTIMEOFFSET (an alternative might be to update with zero offset) kernel/time/timekeeping.c | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..a08ef3771ab5 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -34,6 +34,14 @@ #define TK_MIRROR (1 << 1) #define TK_CLOCK_WAS_SET (1 << 2) +enum timekeeping_adv_mode { + /* Update timekeeper when a tick has passed */ + TK_ADV_TICK, + + /* Update timekeeper on a direct frequency change */ + TK_ADV_FREQ +}; + /* * The most important data for readout fits into a single 64 byte * cache line. @@ -2021,11 +2029,11 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } -/** - * update_wall_time - Uses the current clocksource to increment the wall time - * +/* + * timekeeping_advance - Updates the timekeeper to the current time and + * current NTP tick length */ -void update_wall_time(void) +static void timekeeping_advance(enum timekeeping_adv_mode mode) { struct timekeeper *real_tk = _core.timekeeper; struct timekeeper *tk = _timekeeper; @@ -2042,14 +2050,17 @@ void update_wall_time(void) #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET offset = real_tk->cycle_interval; + + if (mode != TK_ADV_TICK) + goto out; #else offset = clocksource_delta(tk_clock_read(>tkr_mono), tk->tkr_mono.cycle_last, tk->tkr_mono.mask); -#endif /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) + if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) goto out; +#endif /* Do some additional sanity checking */ timekeeping_check_update(tk, offset); @@ -2105,6 +2116,15 @@ void update_wall_time(void) clock_was_set_delayed(); } +/** + * update_wall_time - Uses the current clocksource to increment the wall time + * + */ +void update_wall_time(void) +{ + timekeeping_advance(TK_ADV_TICK); +} + /** * getboottime64 - Return the real time of system boot. * @ts:pointer to the timespec64 to be set @@ -2332,6 +2352,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + timekeeping_advance(TK_ADV_FREQ); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
[PATCHv2] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. The update is limited to archs using modern timekeeping (!ARCH_USES_GETTIMEOFFSET). Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- Notes: RFC->v1: - added a new parameter to force the update of the timekeeper to the current NTP tick length only from adjtimex() - added timekeeping_advance() to keep the parameter local to timekeeping.c v1->v2: - replaced bool parameter with enum - changed timekeeping_advance() to not make any updates for TK_ADV_FREQ with !ARCH_USES_GETTIMEOFFSET (an alternative might be to update with zero offset) kernel/time/timekeeping.c | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..a08ef3771ab5 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -34,6 +34,14 @@ #define TK_MIRROR (1 << 1) #define TK_CLOCK_WAS_SET (1 << 2) +enum timekeeping_adv_mode { + /* Update timekeeper when a tick has passed */ + TK_ADV_TICK, + + /* Update timekeeper on a direct frequency change */ + TK_ADV_FREQ +}; + /* * The most important data for readout fits into a single 64 byte * cache line. @@ -2021,11 +2029,11 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } -/** - * update_wall_time - Uses the current clocksource to increment the wall time - * +/* + * timekeeping_advance - Updates the timekeeper to the current time and + * current NTP tick length */ -void update_wall_time(void) +static void timekeeping_advance(enum timekeeping_adv_mode mode) { struct timekeeper *real_tk = _core.timekeeper; struct timekeeper *tk = _timekeeper; @@ -2042,14 +2050,17 @@ void update_wall_time(void) #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET offset = real_tk->cycle_interval; + + if (mode != TK_ADV_TICK) + goto out; #else offset = clocksource_delta(tk_clock_read(>tkr_mono), tk->tkr_mono.cycle_last, tk->tkr_mono.mask); -#endif /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) + if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) goto out; +#endif /* Do some additional sanity checking */ timekeeping_check_update(tk, offset); @@ -2105,6 +2116,15 @@ void update_wall_time(void) clock_was_set_delayed(); } +/** + * update_wall_time - Uses the current clocksource to increment the wall time + * + */ +void update_wall_time(void) +{ + timekeeping_advance(TK_ADV_TICK); +} + /** * getboottime64 - Return the real time of system boot. * @ts:pointer to the timespec64 to be set @@ -2332,6 +2352,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + timekeeping_advance(TK_ADV_FREQ); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
Re: [PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly
On Tue, May 29, 2018 at 10:42:05AM -0700, John Stultz wrote: > On Tue, May 29, 2018 at 3:53 AM, Miroslav Lichvar wrote: > > -void update_wall_time(void) > > +static void timekeeping_advance(bool force_update) > > This is kind of a nit, but mind switching out bool for an enum? Using > something like TK_ADV_NORMAL and TK_ADV_FORCE? > > > +void update_wall_time(void) > > +{ > > + timekeeping_advance(false); > > +} > > The enum makes usage like timekeeping_advance(false) a little less > opaque to the reader ("Wait, don't advance? Let me go look at the > function"). > > We got bitten by this earlier when we had the old > "timekeeping_update(tk, true, false, true)" usage. Ok. That make sense. I'll send a v2. Thanks, -- Miroslav Lichvar
Re: [PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly
On Tue, May 29, 2018 at 10:42:05AM -0700, John Stultz wrote: > On Tue, May 29, 2018 at 3:53 AM, Miroslav Lichvar wrote: > > -void update_wall_time(void) > > +static void timekeeping_advance(bool force_update) > > This is kind of a nit, but mind switching out bool for an enum? Using > something like TK_ADV_NORMAL and TK_ADV_FORCE? > > > +void update_wall_time(void) > > +{ > > + timekeeping_advance(false); > > +} > > The enum makes usage like timekeeping_advance(false) a little less > opaque to the reader ("Wait, don't advance? Let me go look at the > function"). > > We got bitten by this earlier when we had the old > "timekeeping_update(tk, true, false, true)" usage. Ok. That make sense. I'll send a v2. Thanks, -- Miroslav Lichvar
[PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- Notes: RFC->v1: - added a new parameter to force the update of the timekeeper to the current NTP tick length only from adjtimex() - added timekeeping_advance() to keep the parameter local to timekeeping.c kernel/time/timekeeping.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..5524c07d43e3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2021,11 +2021,11 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } -/** - * update_wall_time - Uses the current clocksource to increment the wall time - * +/* + * timekeeping_advance - Updates the timekeeper to the current time and + * current NTP tick length */ -void update_wall_time(void) +static void timekeeping_advance(bool force_update) { struct timekeeper *real_tk = _core.timekeeper; struct timekeeper *tk = _timekeeper; @@ -2048,7 +2048,7 @@ void update_wall_time(void) #endif /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) + if (offset < real_tk->cycle_interval && !force_update) goto out; /* Do some additional sanity checking */ @@ -2105,6 +2105,15 @@ void update_wall_time(void) clock_was_set_delayed(); } +/** + * update_wall_time - Uses the current clocksource to increment the wall time + * + */ +void update_wall_time(void) +{ + timekeeping_advance(false); +} + /** * getboottime64 - Return the real time of system boot. * @ts:pointer to the timespec64 to be set @@ -2332,6 +2341,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + timekeeping_advance(true); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
[PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- Notes: RFC->v1: - added a new parameter to force the update of the timekeeper to the current NTP tick length only from adjtimex() - added timekeeping_advance() to keep the parameter local to timekeeping.c kernel/time/timekeeping.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..5524c07d43e3 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2021,11 +2021,11 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } -/** - * update_wall_time - Uses the current clocksource to increment the wall time - * +/* + * timekeeping_advance - Updates the timekeeper to the current time and + * current NTP tick length */ -void update_wall_time(void) +static void timekeeping_advance(bool force_update) { struct timekeeper *real_tk = _core.timekeeper; struct timekeeper *tk = _timekeeper; @@ -2048,7 +2048,7 @@ void update_wall_time(void) #endif /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) + if (offset < real_tk->cycle_interval && !force_update) goto out; /* Do some additional sanity checking */ @@ -2105,6 +2105,15 @@ void update_wall_time(void) clock_was_set_delayed(); } +/** + * update_wall_time - Uses the current clocksource to increment the wall time + * + */ +void update_wall_time(void) +{ + timekeeping_advance(false); +} + /** * getboottime64 - Return the real time of system boot. * @ts:pointer to the timespec64 to be set @@ -2332,6 +2341,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + timekeeping_advance(true); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
Re: [PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly
On Wed, May 23, 2018 at 11:05:34AM -0700, John Stultz wrote: > On Wed, May 23, 2018 at 4:33 AM, Miroslav Lichvar <mlich...@redhat.com> wrote: > > This removes a hidden non-deterministic delay in setting of the > > frequency and allows an extremely tight control of the system clock > > with update rates close to or even exceeding the kernel HZ. > I feel like we tried this years back, but had to revert it. But its > been awhile. Am I confusing things? IIRC we only talked about doing this. Before the recent changes in timekeeping, namely c2cda2a5 (Don't align NTP frequency adjustments to ticks), it wouldn't make much sense. There would still be a hidden delay, it would just be negative (from the applications point of view). > > - /* Check if there's really nothing to do */ > > - if (offset < real_tk->cycle_interval) > > - goto out; > > - > > Apologies again, as I don't have a lot of context here these days, but > this could mean we end up doing unnecessary work on every > update_wall_time, no? I'm not sure. If I understand it correctly, update_wall_time() is normally not called more than once per tick. Only when an update is late and it happens to include the next tick, the subsequent call might be unnecessary, right? > Would a "force" flag be better to pass to update_wall_time() to only > avoid the short-cut in the non-adjtimex case? Yes, that makes sense. Thanks, -- Miroslav Lichvar
Re: [PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly
On Wed, May 23, 2018 at 11:05:34AM -0700, John Stultz wrote: > On Wed, May 23, 2018 at 4:33 AM, Miroslav Lichvar wrote: > > This removes a hidden non-deterministic delay in setting of the > > frequency and allows an extremely tight control of the system clock > > with update rates close to or even exceeding the kernel HZ. > I feel like we tried this years back, but had to revert it. But its > been awhile. Am I confusing things? IIRC we only talked about doing this. Before the recent changes in timekeeping, namely c2cda2a5 (Don't align NTP frequency adjustments to ticks), it wouldn't make much sense. There would still be a hidden delay, it would just be negative (from the applications point of view). > > - /* Check if there's really nothing to do */ > > - if (offset < real_tk->cycle_interval) > > - goto out; > > - > > Apologies again, as I don't have a lot of context here these days, but > this could mean we end up doing unnecessary work on every > update_wall_time, no? I'm not sure. If I understand it correctly, update_wall_time() is normally not called more than once per tick. Only when an update is late and it happens to include the next tick, the subsequent call might be unnecessary, right? > Would a "force" flag be better to pass to update_wall_time() to only > avoid the short-cut in the non-adjtimex case? Yes, that makes sense. Thanks, -- Miroslav Lichvar
[PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. Cc: Thomas Gleixner <t...@linutronix.de> Cc: John Stultz <john.stu...@linaro.org> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Prarit Bhargava <pra...@redhat.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- kernel/time/timekeeping.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..6922dae7317c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2047,10 +2047,6 @@ void update_wall_time(void) tk->tkr_mono.cycle_last, tk->tkr_mono.mask); #endif - /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) - goto out; - /* Do some additional sanity checking */ timekeeping_check_update(tk, offset); @@ -2332,6 +2328,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + update_wall_time(); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
[PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. Cc: Thomas Gleixner Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- kernel/time/timekeeping.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 49cbceef5deb..6922dae7317c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2047,10 +2047,6 @@ void update_wall_time(void) tk->tkr_mono.cycle_last, tk->tkr_mono.mask); #endif - /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval) - goto out; - /* Do some additional sanity checking */ timekeeping_check_update(tk, offset); @@ -2332,6 +2328,10 @@ int do_adjtimex(struct timex *txc) write_seqcount_end(_core.seq); raw_spin_unlock_irqrestore(_lock, flags); + /* Update the multiplier immediately if frequency was set directly */ + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) + update_wall_time(); + if (tai != orig_tai) clock_was_set(); -- 2.14.3
Re: possible BUG: selftest about raw_skew failed
On Thu, Apr 26, 2018 at 11:28:29PM +0200, Thomas Gleixner wrote: > On Sat, 21 Apr 2018, Jeffrin Thalakkottoor wrote: > > selftests: raw_skew > > > > WARNING: ADJ_OFFSET in progress, this will cause inaccurate results > > Estimating clock drift: 17.910(est) 16.820(act) [FAILED] Was ntpd, systemd-timesyncd, or some other NTP/PTP daemon running shortly before or during the test? The warning indicates that the clock was slewed by adjtime() or adjtimex(), which makes the measurement of the frequency less accurate and the test may fail. Maybe this test and other tests that measure the frequency of the system clock should be modified to SKIP when adjtimex() returns a non-zero offset (or the frequency changes during the test)? -- Miroslav Lichvar
Re: possible BUG: selftest about raw_skew failed
On Thu, Apr 26, 2018 at 11:28:29PM +0200, Thomas Gleixner wrote: > On Sat, 21 Apr 2018, Jeffrin Thalakkottoor wrote: > > selftests: raw_skew > > > > WARNING: ADJ_OFFSET in progress, this will cause inaccurate results > > Estimating clock drift: 17.910(est) 16.820(act) [FAILED] Was ntpd, systemd-timesyncd, or some other NTP/PTP daemon running shortly before or during the test? The warning indicates that the clock was slewed by adjtime() or adjtimex(), which makes the measurement of the frequency less accurate and the test may fail. Maybe this test and other tests that measure the frequency of the system clock should be modified to SKIP when adjtimex() returns a non-zero offset (or the frequency changes during the test)? -- Miroslav Lichvar
[PATCH] kselftests: timers: freq-step: Update maximum acceptable precision and errors
PTI has a significant impact on precision of the MONOTONIC_RAW clock, which prevents a lot of computers from running the freq-step test. Increase the maximum acceptable precision for the test to not be skipped to 500 nanoseconds. After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier directly from NTP tick length") the frequency and time errors should be much smaller. Reduce the maximum acceptable values for the test to pass to 0.02 ppm and 50 nanoseconds respectively. Cc: John Stultz <john.stu...@linaro.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Stephen Boyd <stephen.b...@linaro.org> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- tools/testing/selftests/timers/freq-step.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c index 14a2b77fd012..cd094b49819f 100644 --- a/tools/testing/selftests/timers/freq-step.c +++ b/tools/testing/selftests/timers/freq-step.c @@ -29,9 +29,9 @@ #define SAMPLE_READINGS 10 #define MEAN_SAMPLE_INTERVAL 0.1 #define STEP_INTERVAL 1.0 -#define MAX_PRECISION 100e-9 -#define MAX_FREQ_ERROR 10e-6 -#define MAX_STDDEV 1000e-9 +#define MAX_PRECISION 500e-9 +#define MAX_FREQ_ERROR 0.02e-6 +#define MAX_STDDEV 50e-9 #ifndef ADJ_SETOFFSET #define ADJ_SETOFFSET 0x0100 -- 2.14.3
[PATCH] kselftests: timers: freq-step: Update maximum acceptable precision and errors
PTI has a significant impact on precision of the MONOTONIC_RAW clock, which prevents a lot of computers from running the freq-step test. Increase the maximum acceptable precision for the test to not be skipped to 500 nanoseconds. After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier directly from NTP tick length") the frequency and time errors should be much smaller. Reduce the maximum acceptable values for the test to pass to 0.02 ppm and 50 nanoseconds respectively. Cc: John Stultz Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Signed-off-by: Miroslav Lichvar --- tools/testing/selftests/timers/freq-step.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c index 14a2b77fd012..cd094b49819f 100644 --- a/tools/testing/selftests/timers/freq-step.c +++ b/tools/testing/selftests/timers/freq-step.c @@ -29,9 +29,9 @@ #define SAMPLE_READINGS 10 #define MEAN_SAMPLE_INTERVAL 0.1 #define STEP_INTERVAL 1.0 -#define MAX_PRECISION 100e-9 -#define MAX_FREQ_ERROR 10e-6 -#define MAX_STDDEV 1000e-9 +#define MAX_PRECISION 500e-9 +#define MAX_FREQ_ERROR 0.02e-6 +#define MAX_STDDEV 50e-9 #ifndef ADJ_SETOFFSET #define ADJ_SETOFFSET 0x0100 -- 2.14.3
[tip:timers/core] timekeeping/ntp: Determine the multiplier directly from NTP tick length
Commit-ID: 78b98e3c5a66d569a53b8f57b6a698f912794a43 Gitweb: https://git.kernel.org/tip/78b98e3c5a66d569a53b8f57b6a698f912794a43 Author: Miroslav Lichvar <mlich...@redhat.com> AuthorDate: Fri, 9 Mar 2018 10:42:48 -0800 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Sat, 10 Mar 2018 09:12:41 +0100 timekeeping/ntp: Determine the multiplier directly from NTP tick length When the length of the NTP tick changes significantly, e.g. when an NTP/PTP application is correcting the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This causes the clock to have an unstable frequency offset, which has a negative impact on the stability of synchronization with precise time sources (e.g. NTP/PTP using hardware timestamping or the PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length and replace the iterative approach. This removes the last major source of the NTP error. The only remaining source is now limited resolution of the multiplier, which is corrected by adding 1 to the multiplier when the system clock is behind the NTP time. Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> Signed-off-by: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Stephen Boyd <stephen.b...@linaro.org> Cc: Thomas Gleixner <t...@linutronix.de> Link: http://lkml.kernel.org/r/1520620971-9567-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar <mi...@kernel.org> --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 138 2 files changed, 49 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index d315c3d6725c..7acb953298a7 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -117,6 +117,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c1a0ac17336e..e11760121cb2 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1873,85 +1873,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool ne
[tip:timers/core] timekeeping/ntp: Determine the multiplier directly from NTP tick length
Commit-ID: 78b98e3c5a66d569a53b8f57b6a698f912794a43 Gitweb: https://git.kernel.org/tip/78b98e3c5a66d569a53b8f57b6a698f912794a43 Author: Miroslav Lichvar AuthorDate: Fri, 9 Mar 2018 10:42:48 -0800 Committer: Ingo Molnar CommitDate: Sat, 10 Mar 2018 09:12:41 +0100 timekeeping/ntp: Determine the multiplier directly from NTP tick length When the length of the NTP tick changes significantly, e.g. when an NTP/PTP application is correcting the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This causes the clock to have an unstable frequency offset, which has a negative impact on the stability of synchronization with precise time sources (e.g. NTP/PTP using hardware timestamping or the PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length and replace the iterative approach. This removes the last major source of the NTP error. The only remaining source is now limited resolution of the multiplier, which is corrected by adding 1 to the multiplier when the system clock is behind the NTP time. Signed-off-by: Miroslav Lichvar Signed-off-by: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1520620971-9567-3-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 138 2 files changed, 49 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index d315c3d6725c..7acb953298a7 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -117,6 +117,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c1a0ac17336e..e11760121cb2 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1873,85 +1873,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool negative; - u32 adj_scale; - - /* Remove any current error adj from freq calculation */ - if (tk->ntp_err_mult) - xinterval -= tk->cycle_interval; - - tk->ntp_tick = ntp_tick_length(); - - /* Calculate current error per tick */ - tick_error = ntp_tick_length() >>
[tip:timers/core] timekeeping/ntp: Don't align NTP frequency adjustments to ticks
Commit-ID: c2cda2a5bda9f1369c9d1ab54a20571c13cf2743 Gitweb: https://git.kernel.org/tip/c2cda2a5bda9f1369c9d1ab54a20571c13cf2743 Author: Miroslav Lichvar <mlich...@redhat.com> AuthorDate: Fri, 9 Mar 2018 10:42:47 -0800 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Sat, 10 Mar 2018 09:12:41 +0100 timekeeping/ntp: Don't align NTP frequency adjustments to ticks When the timekeeping multiplier is changed, the NTP error is updated to correct the clock for the delay between the tick and the update of the clock. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> Signed-off-by: John Stultz <john.stu...@linaro.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Stephen Boyd <stephen.b...@linaro.org> Cc: Thomas Gleixner <t...@linutronix.de> Link: http://lkml.kernel.org/r/1520620971-9567-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar <mi...@kernel.org> --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cd03317e7b57..c1a0ac17336e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1860,8 +1860,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1872,7 +1870,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /*
[tip:timers/core] timekeeping/ntp: Don't align NTP frequency adjustments to ticks
Commit-ID: c2cda2a5bda9f1369c9d1ab54a20571c13cf2743 Gitweb: https://git.kernel.org/tip/c2cda2a5bda9f1369c9d1ab54a20571c13cf2743 Author: Miroslav Lichvar AuthorDate: Fri, 9 Mar 2018 10:42:47 -0800 Committer: Ingo Molnar CommitDate: Sat, 10 Mar 2018 09:12:41 +0100 timekeeping/ntp: Don't align NTP frequency adjustments to ticks When the timekeeping multiplier is changed, the NTP error is updated to correct the clock for the delay between the tick and the update of the clock. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Signed-off-by: Miroslav Lichvar Signed-off-by: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Richard Cochran Cc: Stephen Boyd Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1520620971-9567-2-git-send-email-john.stu...@linaro.org Signed-off-by: Ingo Molnar --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cd03317e7b57..c1a0ac17336e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1860,8 +1860,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1872,7 +1870,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /*
[PATCHv2 0/2] Improve stability of system clock
(resending with fixed CC) v1->v2 - rebased to current code - improved commit messages This patch set improves stability and accuracy of the system clock when synchronized to precise time sources like the new PTP KVM clock or NTP/PTP with hardware timestamping. The biggest difference is expected on mostly idle systems running with NOHZ (i.e. having infrequent updates of the clock). The stability is affected by the error which accumulates in the ntp_error register and its correction, which is performed by adjusting the timekeeping multiplier. After removing the old vsyscalls, which added an error due to nanosecond rounding, there are now three sources of the error: - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multiplier Instead of trying to correct the error faster, which is difficult to do without causing instability due to irregular updates of the clock, the patches remove the first two sources of the error. With the only remaining source the correction logic can be significantly simplified and the frequency of the clock is much more stable and accurate. The freq-step selftest (with PTI disabled to pass the check of the precision of the MONOTONIC_RAW clock) shows a significant improvement: Before: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.131 22 147 +0.136 1 3 [OK] 40960 +0.012 51 347 +0.005 1 3 [OK] 40960 +3.832 25038 171297 -0.169 1 2 [OK] 40960 +2.151 13165 89429 +0.113 1 3 [OK] 40960 -0.095 1 2 -0.095 1 3 [OK] 640 -0.154 1 6 -0.154 1 3 [OK] 640 +0.003 1 6 +0.003 1 3 [OK] 640 -0.131 2 9 -0.131 1 2 [OK] 640 -0.026 19 125 -0.029 1 3 [OK] 640 -0.175 1 3 -0.175 1 3 [OK] 10 +0.001 3 9 +0.000 3 6 [OK] 10 +0.002 7 41 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 3 5 [OK] 10 +0.001 5 23 -0.000 2 5 [OK] 10 +0.000 2 5 -0.000 6 15 [OK] After: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.000 2 5 -0.000 2 5 [OK] 40960 -0.000 2 5 -0.000 3 8 [OK] 40960 -0.000 3 7 +0.000 3 6 [OK] 40960 +0.000 3 6 -0.000 2 6 [OK] 40960 -0.000 2 5 +0.000 3 7 [OK] 640 +0.001 2 9 -0.000 2 5 [OK] 640 -0.000 3 6 +0.000 2 5 [OK] 640 -0.001 2 4 +0.000 1 3 [OK] 640 -0.000 2 5 +0.000 2 4 [OK] 640 +0.000 3 6 -0.000 3 6 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 3 12 +0.000 4 11 [OK] Miroslav Lichvar (2): timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 140 2 files changed, 48 insertions(+), 94 deletions(-) -- 2.9.5
[PATCHv2 0/2] Improve stability of system clock
(resending with fixed CC) v1->v2 - rebased to current code - improved commit messages This patch set improves stability and accuracy of the system clock when synchronized to precise time sources like the new PTP KVM clock or NTP/PTP with hardware timestamping. The biggest difference is expected on mostly idle systems running with NOHZ (i.e. having infrequent updates of the clock). The stability is affected by the error which accumulates in the ntp_error register and its correction, which is performed by adjusting the timekeeping multiplier. After removing the old vsyscalls, which added an error due to nanosecond rounding, there are now three sources of the error: - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multiplier Instead of trying to correct the error faster, which is difficult to do without causing instability due to irregular updates of the clock, the patches remove the first two sources of the error. With the only remaining source the correction logic can be significantly simplified and the frequency of the clock is much more stable and accurate. The freq-step selftest (with PTI disabled to pass the check of the precision of the MONOTONIC_RAW clock) shows a significant improvement: Before: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.131 22 147 +0.136 1 3 [OK] 40960 +0.012 51 347 +0.005 1 3 [OK] 40960 +3.832 25038 171297 -0.169 1 2 [OK] 40960 +2.151 13165 89429 +0.113 1 3 [OK] 40960 -0.095 1 2 -0.095 1 3 [OK] 640 -0.154 1 6 -0.154 1 3 [OK] 640 +0.003 1 6 +0.003 1 3 [OK] 640 -0.131 2 9 -0.131 1 2 [OK] 640 -0.026 19 125 -0.029 1 3 [OK] 640 -0.175 1 3 -0.175 1 3 [OK] 10 +0.001 3 9 +0.000 3 6 [OK] 10 +0.002 7 41 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 3 5 [OK] 10 +0.001 5 23 -0.000 2 5 [OK] 10 +0.000 2 5 -0.000 6 15 [OK] After: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.000 2 5 -0.000 2 5 [OK] 40960 -0.000 2 5 -0.000 3 8 [OK] 40960 -0.000 3 7 +0.000 3 6 [OK] 40960 +0.000 3 6 -0.000 2 6 [OK] 40960 -0.000 2 5 +0.000 3 7 [OK] 640 +0.001 2 9 -0.000 2 5 [OK] 640 -0.000 3 6 +0.000 2 5 [OK] 640 -0.001 2 4 +0.000 1 3 [OK] 640 -0.000 2 5 +0.000 2 4 [OK] 640 +0.000 3 6 -0.000 3 6 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 3 12 +0.000 4 11 [OK] Miroslav Lichvar (2): timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 140 2 files changed, 48 insertions(+), 94 deletions(-) -- 2.9.5
[PATCHv2 1/2] timekeeping: Don't align frequency adjustments to ticks
When the timekeeping multiplier is changed, the NTP error is updated to correct the clock for the delay between the tick and the update of the clock. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cd03317..c1a0ac1 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1860,8 +1860,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1872,7 +1870,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /* -- 2.9.5
[PATCHv2 2/2] timekeeping: Determine multiplier directly from NTP tick length
When the length of the NTP tick changes significantly, e.g. when an NTP/PTP application is correcting the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This causes the clock to have an unstable frequency offset, which has a negative impact on the stability of synchronization with precise time sources (e.g. NTP/PTP using hardware timestamping or the PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length and replace the iterative approach. This removes the last major source of the NTP error. The only remaining source is now limited resolution of the multiplier, which is corrected by adding 1 to the multiplier when the system clock is behind the NTP time. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 138 2 files changed, 49 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index d315c3d..7acb953 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -117,6 +117,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c1a0ac1..e117601 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1873,85 +1873,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool negative; - u32 adj_scale; - - /* Remove any current error adj from freq calculation */ - if (tk->ntp_err_mult) - xinterval -= tk->cycle_interval; - - tk->ntp_tick = ntp_tick_length(); - - /* Calculate current error per tick */ - tick_error = ntp_tick_length() >> tk->ntp_error_shift; - tick_error -= (xinterval + tk->xtime_remainder); - - /* Don't worry about correcting it if its small */ - if (likely((tick_error >= 0) && (tick_error <= interval))) - return; - - /* preserve the direction of correction */ - negative = (tick_error < 0); + u32 mult; - /* If any adjustment would pass the max, just return */ - if (negative && (cu
[PATCHv2 1/2] timekeeping: Don't align frequency adjustments to ticks
When the timekeeping multiplier is changed, the NTP error is updated to correct the clock for the delay between the tick and the update of the clock. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cd03317..c1a0ac1 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1860,8 +1860,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1872,7 +1870,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /* -- 2.9.5
[PATCHv2 2/2] timekeeping: Determine multiplier directly from NTP tick length
When the length of the NTP tick changes significantly, e.g. when an NTP/PTP application is correcting the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This causes the clock to have an unstable frequency offset, which has a negative impact on the stability of synchronization with precise time sources (e.g. NTP/PTP using hardware timestamping or the PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length and replace the iterative approach. This removes the last major source of the NTP error. The only remaining source is now limited resolution of the multiplier, which is corrected by adding 1 to the multiplier when the system clock is behind the NTP time. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 138 2 files changed, 49 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index d315c3d..7acb953 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -117,6 +117,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index c1a0ac1..e117601 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1873,85 +1873,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool negative; - u32 adj_scale; - - /* Remove any current error adj from freq calculation */ - if (tk->ntp_err_mult) - xinterval -= tk->cycle_interval; - - tk->ntp_tick = ntp_tick_length(); - - /* Calculate current error per tick */ - tick_error = ntp_tick_length() >> tk->ntp_error_shift; - tick_error -= (xinterval + tk->xtime_remainder); - - /* Don't worry about correcting it if its small */ - if (likely((tick_error >= 0) && (tick_error <= interval))) - return; - - /* preserve the direction of correction */ - negative = (tick_error < 0); + u32 mult; - /* If any adjustment would pass the max, just return */ - if (negative && (cur_adj - 1) <= (base - max)) - return; - if (!negative && (cur_adj + 1) >= (base + m
[PATCHv2 0/2] Improve stability of system clock
v1->v2 - rebased to current code - improved commit messages This patch set improves stability and accuracy of the system clock when synchronized to precise time sources like the new PTP KVM clock or NTP/PTP with hardware timestamping. The biggest difference is expected on mostly idle systems running with NOHZ (i.e. having infrequent updates of the clock). The stability is affected by the error which accumulates in the ntp_error register and its correction, which is performed by adjusting the timekeeping multiplier. After removing the old vsyscalls, which added an error due to nanosecond rounding, there are now three sources of the error: - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multiplier Instead of trying to correct the error faster, which is difficult to do without causing instability due to irregular updates of the clock, the patches remove the first two sources of the error. With the only remaining source the correction logic can be significantly simplified and the frequency of the clock is much more stable and accurate. The freq-step selftest (with PTI disabled to pass the check of the precision of the MONOTONIC_RAW clock) shows a significant improvement: Before: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.131 22 147 +0.136 1 3 [OK] 40960 +0.012 51 347 +0.005 1 3 [OK] 40960 +3.832 25038 171297 -0.169 1 2 [OK] 40960 +2.151 13165 89429 +0.113 1 3 [OK] 40960 -0.095 1 2 -0.095 1 3 [OK] 640 -0.154 1 6 -0.154 1 3 [OK] 640 +0.003 1 6 +0.003 1 3 [OK] 640 -0.131 2 9 -0.131 1 2 [OK] 640 -0.026 19 125 -0.029 1 3 [OK] 640 -0.175 1 3 -0.175 1 3 [OK] 10 +0.001 3 9 +0.000 3 6 [OK] 10 +0.002 7 41 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 3 5 [OK] 10 +0.001 5 23 -0.000 2 5 [OK] 10 +0.000 2 5 -0.000 6 15 [OK] After: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.000 2 5 -0.000 2 5 [OK] 40960 -0.000 2 5 -0.000 3 8 [OK] 40960 -0.000 3 7 +0.000 3 6 [OK] 40960 +0.000 3 6 -0.000 2 6 [OK] 40960 -0.000 2 5 +0.000 3 7 [OK] 640 +0.001 2 9 -0.000 2 5 [OK] 640 -0.000 3 6 +0.000 2 5 [OK] 640 -0.001 2 4 +0.000 1 3 [OK] 640 -0.000 2 5 +0.000 2 4 [OK] 640 +0.000 3 6 -0.000 3 6 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 3 12 +0.000 4 11 [OK] Miroslav Lichvar (2): timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 140 2 files changed, 48 insertions(+), 94 deletions(-) -- 2.9.5
[PATCHv2 0/2] Improve stability of system clock
v1->v2 - rebased to current code - improved commit messages This patch set improves stability and accuracy of the system clock when synchronized to precise time sources like the new PTP KVM clock or NTP/PTP with hardware timestamping. The biggest difference is expected on mostly idle systems running with NOHZ (i.e. having infrequent updates of the clock). The stability is affected by the error which accumulates in the ntp_error register and its correction, which is performed by adjusting the timekeeping multiplier. After removing the old vsyscalls, which added an error due to nanosecond rounding, there are now three sources of the error: - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multiplier Instead of trying to correct the error faster, which is difficult to do without causing instability due to irregular updates of the clock, the patches remove the first two sources of the error. With the only remaining source the correction logic can be significantly simplified and the frequency of the clock is much more stable and accurate. The freq-step selftest (with PTI disabled to pass the check of the precision of the MONOTONIC_RAW clock) shows a significant improvement: Before: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.131 22 147 +0.136 1 3 [OK] 40960 +0.012 51 347 +0.005 1 3 [OK] 40960 +3.832 25038 171297 -0.169 1 2 [OK] 40960 +2.151 13165 89429 +0.113 1 3 [OK] 40960 -0.095 1 2 -0.095 1 3 [OK] 640 -0.154 1 6 -0.154 1 3 [OK] 640 +0.003 1 6 +0.003 1 3 [OK] 640 -0.131 2 9 -0.131 1 2 [OK] 640 -0.026 19 125 -0.029 1 3 [OK] 640 -0.175 1 3 -0.175 1 3 [OK] 10 +0.001 3 9 +0.000 3 6 [OK] 10 +0.002 7 41 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 3 5 [OK] 10 +0.001 5 23 -0.000 2 5 [OK] 10 +0.000 2 5 -0.000 6 15 [OK] After: Checking response to frequency step: Step 1st interval 2nd interval FreqDev Max FreqDev Max 40960 +0.000 2 5 -0.000 2 5 [OK] 40960 -0.000 2 5 -0.000 3 8 [OK] 40960 -0.000 3 7 +0.000 3 6 [OK] 40960 +0.000 3 6 -0.000 2 6 [OK] 40960 -0.000 2 5 +0.000 3 7 [OK] 640 +0.001 2 9 -0.000 2 5 [OK] 640 -0.000 3 6 +0.000 2 5 [OK] 640 -0.001 2 4 +0.000 1 3 [OK] 640 -0.000 2 5 +0.000 2 4 [OK] 640 +0.000 3 6 -0.000 3 6 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 -0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 2 5 +0.000 2 5 [OK] 10 +0.000 3 12 +0.000 4 11 [OK] Miroslav Lichvar (2): timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 140 2 files changed, 48 insertions(+), 94 deletions(-) -- 2.9.5
[tip:timers/urgent] timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD
Commit-ID: aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5 Gitweb: https://git.kernel.org/tip/aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5 Author: Miroslav Lichvar <mlich...@redhat.com> AuthorDate: Mon, 13 Nov 2017 14:51:31 -0800 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Tue, 14 Nov 2017 11:20:25 +0100 timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD As of d4d1fc61eb38f (ia64: Update fsyscall gettime to use modern vsyscall_update)the last user of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated, the legacy support for old-style vsyscall implementations can be removed from the timekeeping code. (Thanks again to Tony Luck for helping remove the last user!) [jstultz: Commit message rework] Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> Signed-off-by: John Stultz <john.stu...@linaro.org> Signed-off-by: Thomas Gleixner <t...@linutronix.de> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Tony Luck <tony.l...@intel.com> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Stephen Boyd <stephen.b...@linaro.org> Link: https://lkml.kernel.org/r/1510613491-16695-1-git-send-email-john.stu...@linaro.org --- include/linux/timekeeper_internal.h | 7 -- kernel/time/Kconfig | 4 kernel/time/timekeeping.c | 45 - 3 files changed, 56 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 7e90111..d315c3d 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -136,13 +136,6 @@ struct timekeeper { extern void update_vsyscall(struct timekeeper *tk); extern void update_vsyscall_tz(void); -#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD) - -extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, - struct clocksource *c, u32 mult, - u64 cycle_last); -extern void update_vsyscall_tz(void); - #else static inline void update_vsyscall(struct timekeeper *tk) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index d689a95..e776fc8 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE config GENERIC_TIME_VSYSCALL bool -# Timekeeping vsyscall support -config GENERIC_TIME_VSYSCALL_OLD - bool - # Old style timekeeping config ARCH_USES_GETTIMEOFFSET bool diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 198afa7..cd03317 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -557,45 +557,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk) update_fast_timekeeper(_dummy, _fast_raw); } -#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD -#warning Please contact your maintainers, as GENERIC_TIME_VSYSCALL_OLD compatibity will disappear soon. - -static inline void update_vsyscall(struct timekeeper *tk) -{ - struct timespec xt, wm; - - xt = timespec64_to_timespec(tk_xtime(tk)); - wm = timespec64_to_timespec(tk->wall_to_monotonic); - update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult, - tk->tkr_mono.cycle_last); -} - -static inline void old_vsyscall_fixup(struct timekeeper *tk) -{ - s64 remainder; - - /* - * Store only full nanoseconds into xtime_nsec after rounding - * it up and add the remainder to the error difference. - * XXX - This is necessary to avoid small 1ns inconsistnecies caused - * by truncating the remainder in vsyscalls. However, it causes - * additional work to be done in timekeeping_adjust(). Once - * the vsyscall implementations are converted to use xtime_nsec - * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD - * users are removed, this can be killed. - */ - remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 1); - if (remainder != 0) { - tk->tkr_mono.xtime_nsec -= remainder; - tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift; - tk->ntp_error += remainder << tk->ntp_error_shift; - tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << tk->ntp_error_shift; - } -} -#else -#define old_vsyscall_fixup(tk) -#endif - static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk, bool was_set) @@ -2164,12 +2125,6 @@ void update_wall_time(void) timekeeping_adjust(tk, offset); /* -* XXX This can be killed once everyone converts -* to the new update_vsyscall. -*/ - old_vsyscall_fixup(tk); - - /* * Finally, make sure that after the rounding * xtime_nsec isn't larger than NSEC_PER_SEC */
[tip:timers/urgent] timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD
Commit-ID: aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5 Gitweb: https://git.kernel.org/tip/aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5 Author: Miroslav Lichvar AuthorDate: Mon, 13 Nov 2017 14:51:31 -0800 Committer: Thomas Gleixner CommitDate: Tue, 14 Nov 2017 11:20:25 +0100 timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD As of d4d1fc61eb38f (ia64: Update fsyscall gettime to use modern vsyscall_update)the last user of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated, the legacy support for old-style vsyscall implementations can be removed from the timekeeping code. (Thanks again to Tony Luck for helping remove the last user!) [jstultz: Commit message rework] Signed-off-by: Miroslav Lichvar Signed-off-by: John Stultz Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Tony Luck Cc: Richard Cochran Cc: Stephen Boyd Link: https://lkml.kernel.org/r/1510613491-16695-1-git-send-email-john.stu...@linaro.org --- include/linux/timekeeper_internal.h | 7 -- kernel/time/Kconfig | 4 kernel/time/timekeeping.c | 45 - 3 files changed, 56 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 7e90111..d315c3d 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -136,13 +136,6 @@ struct timekeeper { extern void update_vsyscall(struct timekeeper *tk); extern void update_vsyscall_tz(void); -#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD) - -extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, - struct clocksource *c, u32 mult, - u64 cycle_last); -extern void update_vsyscall_tz(void); - #else static inline void update_vsyscall(struct timekeeper *tk) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index d689a95..e776fc8 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE config GENERIC_TIME_VSYSCALL bool -# Timekeeping vsyscall support -config GENERIC_TIME_VSYSCALL_OLD - bool - # Old style timekeeping config ARCH_USES_GETTIMEOFFSET bool diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 198afa7..cd03317 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -557,45 +557,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk) update_fast_timekeeper(_dummy, _fast_raw); } -#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD -#warning Please contact your maintainers, as GENERIC_TIME_VSYSCALL_OLD compatibity will disappear soon. - -static inline void update_vsyscall(struct timekeeper *tk) -{ - struct timespec xt, wm; - - xt = timespec64_to_timespec(tk_xtime(tk)); - wm = timespec64_to_timespec(tk->wall_to_monotonic); - update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult, - tk->tkr_mono.cycle_last); -} - -static inline void old_vsyscall_fixup(struct timekeeper *tk) -{ - s64 remainder; - - /* - * Store only full nanoseconds into xtime_nsec after rounding - * it up and add the remainder to the error difference. - * XXX - This is necessary to avoid small 1ns inconsistnecies caused - * by truncating the remainder in vsyscalls. However, it causes - * additional work to be done in timekeeping_adjust(). Once - * the vsyscall implementations are converted to use xtime_nsec - * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD - * users are removed, this can be killed. - */ - remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 1); - if (remainder != 0) { - tk->tkr_mono.xtime_nsec -= remainder; - tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift; - tk->ntp_error += remainder << tk->ntp_error_shift; - tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << tk->ntp_error_shift; - } -} -#else -#define old_vsyscall_fixup(tk) -#endif - static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk, bool was_set) @@ -2164,12 +2125,6 @@ void update_wall_time(void) timekeeping_adjust(tk, offset); /* -* XXX This can be killed once everyone converts -* to the new update_vsyscall. -*/ - old_vsyscall_fixup(tk); - - /* * Finally, make sure that after the rounding * xtime_nsec isn't larger than NSEC_PER_SEC */
Re: Extreme time jitter with suspend/resume cycles
On Wed, Oct 04, 2017 at 05:16:31PM -0700, John Stultz wrote: > On Wed, Oct 4, 2017 at 9:11 AM, Gabriel Beddingfield <g...@nestlabs.com> > wrote: > > We found that the problem is an interaction between the NTP code and > > what I call the "delta_delta hack." (see [1] and [2]) This code > > allocates a static variable in a function that contains an offset from > > the system time to the persistent/rtc clock. It uses that time to > > fudge the suspend timestamp so that on resume the sleep time will be > > compensated. It's kind of a statistical hack that assumes things will > > average out. It seems to have two main assumptions: > > > > 1. The persistent/rtc clock has only single-second precision > > 2. The system does not frequently suspend/resume. > > 3. If delta_delta is less than 2 seconds, these assumptions are "true" > > > > Because the delta_delta hack is trying to maintain an offset from the > > system time to the persistent/rtc clock, any minor NTP corrections > > that have occurred since the last suspend will be discarded. However, > > the NTP subsystem isn't notified that this is happening -- and so it > > causes some level of instability in its PLL logic. This is interesting. What polling interval was ntpd using? If I understand it correctly, with a high-resolution persistent clock the delta-delta compensation should be very small and shouldn't disrupt ntpd. Does this instability disappear when ntpd is not controlling the clock (i.e. "disable ntp" in ntp.conf)? > We should also figure out how to best handle ntpd in userspace dealing > with frequent suspend/resume cycles. This is problematic, as the > closest analogy is trying driving on the road while frequently falling > asleep. This is not something I think ntpd handles well. I suspect > our options are that any ntp adjustments being made might be made for > far too long (causing potentially massive over-correction) or not at > all, and not at all seems slightly better in my mind. Yeah, controlling the clock in such conditions will be difficult. The kernel/ntp PLL requires periodic updates. There is some code in ntp_update_offset() that reduces the frequency adjustment when PLL updates are missing, but I'm not actually sure if it works correctly with suspend. -- Miroslav Lichvar
Re: Extreme time jitter with suspend/resume cycles
On Wed, Oct 04, 2017 at 05:16:31PM -0700, John Stultz wrote: > On Wed, Oct 4, 2017 at 9:11 AM, Gabriel Beddingfield > wrote: > > We found that the problem is an interaction between the NTP code and > > what I call the "delta_delta hack." (see [1] and [2]) This code > > allocates a static variable in a function that contains an offset from > > the system time to the persistent/rtc clock. It uses that time to > > fudge the suspend timestamp so that on resume the sleep time will be > > compensated. It's kind of a statistical hack that assumes things will > > average out. It seems to have two main assumptions: > > > > 1. The persistent/rtc clock has only single-second precision > > 2. The system does not frequently suspend/resume. > > 3. If delta_delta is less than 2 seconds, these assumptions are "true" > > > > Because the delta_delta hack is trying to maintain an offset from the > > system time to the persistent/rtc clock, any minor NTP corrections > > that have occurred since the last suspend will be discarded. However, > > the NTP subsystem isn't notified that this is happening -- and so it > > causes some level of instability in its PLL logic. This is interesting. What polling interval was ntpd using? If I understand it correctly, with a high-resolution persistent clock the delta-delta compensation should be very small and shouldn't disrupt ntpd. Does this instability disappear when ntpd is not controlling the clock (i.e. "disable ntp" in ntp.conf)? > We should also figure out how to best handle ntpd in userspace dealing > with frequent suspend/resume cycles. This is problematic, as the > closest analogy is trying driving on the road while frequently falling > asleep. This is not something I think ntpd handles well. I suspect > our options are that any ntp adjustments being made might be made for > far too long (causing potentially massive over-correction) or not at > all, and not at all seems slightly better in my mind. Yeah, controlling the clock in such conditions will be difficult. The kernel/ntp PLL requires periodic updates. There is some code in ntp_update_offset() that reduces the frequency adjustment when PLL updates are missing, but I'm not actually sure if it works correctly with suspend. -- Miroslav Lichvar
Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission
On Mon, Sep 18, 2017 at 09:41:15AM +0200, Richard Cochran wrote: > This series is an early RFC that introduces a new socket option > allowing time based transmission of packets. This option will be > useful in implementing various real time protocols over Ethernet, > including but not limited to P802.1Qbv, which is currently finding > its way into 802.1Q. If I understand it correctly, this also allows us to make a PTP/NTP "one-step" clock with HW that doesn't support it directly. > * Open questions about SO_TXTIME semantics > > - What should the kernel do if the dialed Tx time is in the past? > Should the packet be sent ASAP, or should we throw an error? Dropping the packet with an error would make more sense to me. > - What should the timescale be for the dialed Tx time? Should the > kernel select UTC when using the SW Qdisc and the HW time > otherwise? Or should the socket option include a clockid_t? I think for applications that don't (want to) bind their socket to a specific interface it would be useful if the cmsg specified clockid_t or maybe if_index. If the packet would be sent using a different PHC/interface, it should be dropped. > | | plain preempt_rt | so_txtime | txtime @ 250 us | > |-+--+---+-| > | min:|+1.940800e+04 | +4.72e+02 | +4.72e+02 | > | max:|+7.556000e+04 | +5.68e+02 | +5.76e+02 | > | pk-pk: |+5.615200e+04 | +9.60e+01 | +1.04e+02 | > | mean: |+3.292776e+04 | +5.072274e+02 | +5.073602e+02 | > | stddev: |+6.514709e+03 | +1.310849e+01 | +1.507144e+01 | > | count: | 60 |60 | 240 | > > Using so_txtime, the peak to peak jitter is about 100 nanoseconds, Nice! -- Miroslav Lichvar
Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission
On Mon, Sep 18, 2017 at 09:41:15AM +0200, Richard Cochran wrote: > This series is an early RFC that introduces a new socket option > allowing time based transmission of packets. This option will be > useful in implementing various real time protocols over Ethernet, > including but not limited to P802.1Qbv, which is currently finding > its way into 802.1Q. If I understand it correctly, this also allows us to make a PTP/NTP "one-step" clock with HW that doesn't support it directly. > * Open questions about SO_TXTIME semantics > > - What should the kernel do if the dialed Tx time is in the past? > Should the packet be sent ASAP, or should we throw an error? Dropping the packet with an error would make more sense to me. > - What should the timescale be for the dialed Tx time? Should the > kernel select UTC when using the SW Qdisc and the HW time > otherwise? Or should the socket option include a clockid_t? I think for applications that don't (want to) bind their socket to a specific interface it would be useful if the cmsg specified clockid_t or maybe if_index. If the packet would be sent using a different PHC/interface, it should be dropped. > | | plain preempt_rt | so_txtime | txtime @ 250 us | > |-+--+---+-| > | min:|+1.940800e+04 | +4.72e+02 | +4.72e+02 | > | max:|+7.556000e+04 | +5.68e+02 | +5.76e+02 | > | pk-pk: |+5.615200e+04 | +9.60e+01 | +1.04e+02 | > | mean: |+3.292776e+04 | +5.072274e+02 | +5.073602e+02 | > | stddev: |+6.514709e+03 | +1.310849e+01 | +1.507144e+01 | > | count: | 60 |60 | 240 | > > Using so_txtime, the peak to peak jitter is about 100 nanoseconds, Nice! -- Miroslav Lichvar
Re: [RFC][PATCH 2/2] selftests: timers: freq-step: Fix build warning
On Mon, Aug 14, 2017 at 02:01:36PM -0700, John Stultz wrote: > Fixes the following build warning: > freq-step.c: In function ‘main’: > freq-step.c:271:1: warning: control reaches end of non-void function > [-Wreturn-type] > @@ -268,4 +268,6 @@ int main(int argc, char **argv) > ksft_exit_fail(); > > ksft_exit_pass(); > + > + return 0; > } It seems most tests use "return ksft_exit_pass();". Would that be preferred over separate return? I don't have a preference. Both patches in this set look good to me. Thanks, -- Miroslav Lichvar
Re: [RFC][PATCH 2/2] selftests: timers: freq-step: Fix build warning
On Mon, Aug 14, 2017 at 02:01:36PM -0700, John Stultz wrote: > Fixes the following build warning: > freq-step.c: In function ‘main’: > freq-step.c:271:1: warning: control reaches end of non-void function > [-Wreturn-type] > @@ -268,4 +268,6 @@ int main(int argc, char **argv) > ksft_exit_fail(); > > ksft_exit_pass(); > + > + return 0; > } It seems most tests use "return ksft_exit_pass();". Would that be preferred over separate return? I don't have a preference. Both patches in this set look good to me. Thanks, -- Miroslav Lichvar
Re: [PATCH] kselftest/timers: fix build failure
On Tue, Jul 25, 2017 at 01:20:37PM +0530, Santosh Sivaraj wrote: > "message" parameter was not passed to ksft_exit_skip. It looks like this was a bad merge of 76739256 and 54f57baa, which added the new parameter to ksft_exit_skip(). > @@ -231,7 +231,7 @@ static void init_test(void) > > if (precision > MAX_PRECISION) { > printf("[SKIP]\n"); > - ksft_exit_skip(); > + ksft_exit_skip("[SKIP]\n"); To avoid printing "skip" multiple times, I'd suggest to change the argument of ksft_exit_skip to NULL. Thanks, -- Miroslav Lichvar
Re: [PATCH] kselftest/timers: fix build failure
On Tue, Jul 25, 2017 at 01:20:37PM +0530, Santosh Sivaraj wrote: > "message" parameter was not passed to ksft_exit_skip. It looks like this was a bad merge of 76739256 and 54f57baa, which added the new parameter to ksft_exit_skip(). > @@ -231,7 +231,7 @@ static void init_test(void) > > if (precision > MAX_PRECISION) { > printf("[SKIP]\n"); > - ksft_exit_skip(); > + ksft_exit_skip("[SKIP]\n"); To avoid printing "skip" multiple times, I'd suggest to change the argument of ksft_exit_skip to NULL. Thanks, -- Miroslav Lichvar
[PATCH] kselftests: timers: Fix inconsistency-check to not ignore first timestamp
When the first timestamp in the list of clock readings was later than the second timestamp and all other timestamps were in order, the inconsistency was not reported because the index of the out-of-order timestamp was equal to the default value. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- tools/testing/selftests/timers/inconsistency-check.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/timers/inconsistency-check.c b/tools/testing/selftests/timers/inconsistency-check.c index caf1bc9..74c60e8 100644 --- a/tools/testing/selftests/timers/inconsistency-check.c +++ b/tools/testing/selftests/timers/inconsistency-check.c @@ -118,7 +118,7 @@ int consistency_test(int clock_type, unsigned long seconds) start_str = ctime(); while (seconds == -1 || now - then < seconds) { - inconsistent = 0; + inconsistent = -1; /* Fill list */ for (i = 0; i < CALLS_PER_LOOP; i++) @@ -130,7 +130,7 @@ int consistency_test(int clock_type, unsigned long seconds) inconsistent = i; /* display inconsistency */ - if (inconsistent) { + if (inconsistent >= 0) { unsigned long long delta; printf("\%s\n", start_str); -- 2.9.3
[PATCH] kselftests: timers: Fix inconsistency-check to not ignore first timestamp
When the first timestamp in the list of clock readings was later than the second timestamp and all other timestamps were in order, the inconsistency was not reported because the index of the out-of-order timestamp was equal to the default value. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- tools/testing/selftests/timers/inconsistency-check.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/timers/inconsistency-check.c b/tools/testing/selftests/timers/inconsistency-check.c index caf1bc9..74c60e8 100644 --- a/tools/testing/selftests/timers/inconsistency-check.c +++ b/tools/testing/selftests/timers/inconsistency-check.c @@ -118,7 +118,7 @@ int consistency_test(int clock_type, unsigned long seconds) start_str = ctime(); while (seconds == -1 || now - then < seconds) { - inconsistent = 0; + inconsistent = -1; /* Fill list */ for (i = 0; i < CALLS_PER_LOOP; i++) @@ -130,7 +130,7 @@ int consistency_test(int clock_type, unsigned long seconds) inconsistent = i; /* display inconsistency */ - if (inconsistent) { + if (inconsistent >= 0) { unsigned long long delta; printf("\%s\n", start_str); -- 2.9.3
Re: [PATCH 2/3 v3] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
On Thu, Jun 08, 2017 at 04:44:21PM -0700, John Stultz wrote: > Due to how the MONOTONIC_RAW accumulation logic was handled, > there is the potential for a 1ns discontinuity when we do > accumulations. This small discontinuity has for the most part > gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW > in their vDSO clock_gettime implementation, we've seen failures > with the inconsistency-check test in kselftest. > > This patch addresses the issue by using the same sub-ns > accumulation handling that CLOCK_MONOTONIC uses, which avoids > the issue for in-kernel users. The patch looks good to me and in testing on x86_64 I didn't see any issues with the CLOCK_MONOTONIC_RAW clock. Thanks, -- Miroslav Lichvar
Re: [PATCH 2/3 v3] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
On Thu, Jun 08, 2017 at 04:44:21PM -0700, John Stultz wrote: > Due to how the MONOTONIC_RAW accumulation logic was handled, > there is the potential for a 1ns discontinuity when we do > accumulations. This small discontinuity has for the most part > gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW > in their vDSO clock_gettime implementation, we've seen failures > with the inconsistency-check test in kselftest. > > This patch addresses the issue by using the same sub-ns > accumulation handling that CLOCK_MONOTONIC uses, which avoids > the issue for in-kernel users. The patch looks good to me and in testing on x86_64 I didn't see any issues with the CLOCK_MONOTONIC_RAW clock. Thanks, -- Miroslav Lichvar
[PATCH] kselftests: timers: Add test for frequency step
This test checks the response of the system clock to frequency steps made with adjtimex(). The frequency error and stability of the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock is measured in two intervals following the step. The test fails if values from the second interval exceed specified limits. Cc: John Stultz <john.stu...@linaro.org> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Prarit Bhargava <pra...@redhat.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- tools/testing/selftests/timers/Makefile| 5 +- tools/testing/selftests/timers/freq-step.c | 268 + 2 files changed, 271 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/timers/freq-step.c diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile index 5fa1d7e9..5801bbe 100644 --- a/tools/testing/selftests/timers/Makefile +++ b/tools/testing/selftests/timers/Makefile @@ -1,6 +1,6 @@ BUILD_FLAGS = -DKTEST CFLAGS += -O3 -Wl,-no-as-needed -Wall $(BUILD_FLAGS) -LDFLAGS += -lrt -lpthread +LDFLAGS += -lrt -lpthread -lm # these are all "safe" tests that don't modify # system time or require escalated privileges @@ -8,7 +8,7 @@ TEST_GEN_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \ inconsistency-check raw_skew threadtest rtctest TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \ - skew_consistency clocksource-switch leap-a-day \ + skew_consistency clocksource-switch freq-step leap-a-day \ leapcrash set-tai set-2038 set-tz @@ -24,6 +24,7 @@ run_destructive_tests: run_tests ./change_skew ./skew_consistency ./clocksource-switch + ./freq-step ./leap-a-day -s -i 10 ./leapcrash ./set-tz diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c new file mode 100644 index 000..e8c6183 --- /dev/null +++ b/tools/testing/selftests/timers/freq-step.c @@ -0,0 +1,268 @@ +/* + * This test checks the response of the system clock to frequency + * steps made with adjtimex(). The frequency error and stability of + * the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock + * is measured in two intervals following the step. The test fails if + * values from the second interval exceed specified limits. + * + * Copyright (C) Miroslav Lichvar <mlich...@redhat.com> 2017 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define SAMPLES 100 +#define SAMPLE_READINGS 10 +#define MEAN_SAMPLE_INTERVAL 0.1 +#define STEP_INTERVAL 1.0 +#define MAX_PRECISION 100e-9 +#define MAX_FREQ_ERROR 10e-6 +#define MAX_STDDEV 1000e-9 + +struct sample { + double offset; + double time; +}; + +static time_t mono_raw_base; +static time_t mono_base; +static long user_hz; +static double precision; +static double mono_freq_offset; + +static double diff_timespec(struct timespec *ts1, struct timespec *ts2) +{ + return ts1->tv_sec - ts2->tv_sec + (ts1->tv_nsec - ts2->tv_nsec) / 1e9; +} + +static double get_sample(struct sample *sample) +{ + double delay, mindelay = 0.0; + struct timespec ts1, ts2, ts3; + int i; + + for (i = 0; i < SAMPLE_READINGS; i++) { + clock_gettime(CLOCK_MONOTONIC_RAW, ); + clock_gettime(CLOCK_MONOTONIC, ); + clock_gettime(CLOCK_MONOTONIC_RAW, ); + + ts1.tv_sec -= mono_raw_base; + ts2.tv_sec -= mono_base; + ts3.tv_sec -= mono_raw_base; + + delay = diff_timespec(, ); + if (delay <= 1e-9) { + i--; + continue; + } + + if (!i || delay < mindelay) { + sample->offset = diff_timespec(, ); + sample->offset -= delay / 2.0; + sample->time = ts1.tv_sec + ts1.tv_nsec / 1e9; + mindelay = delay; + } + } + + return mindelay; +} + +static void reset_ntp_error(void) +{ + struct timex txc; + + txc.modes = ADJ_SETOFFSET; + txc.time.tv_sec = 0; + txc.time.tv_usec = 0; + + if (adjtimex() < 0) { + perror("[FAIL] adjtimex"); + ksft_exit_fail(); +
[PATCH] kselftests: timers: Add test for frequency step
This test checks the response of the system clock to frequency steps made with adjtimex(). The frequency error and stability of the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock is measured in two intervals following the step. The test fails if values from the second interval exceed specified limits. Cc: John Stultz Cc: Richard Cochran Cc: Prarit Bhargava Signed-off-by: Miroslav Lichvar --- tools/testing/selftests/timers/Makefile| 5 +- tools/testing/selftests/timers/freq-step.c | 268 + 2 files changed, 271 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/timers/freq-step.c diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile index 5fa1d7e9..5801bbe 100644 --- a/tools/testing/selftests/timers/Makefile +++ b/tools/testing/selftests/timers/Makefile @@ -1,6 +1,6 @@ BUILD_FLAGS = -DKTEST CFLAGS += -O3 -Wl,-no-as-needed -Wall $(BUILD_FLAGS) -LDFLAGS += -lrt -lpthread +LDFLAGS += -lrt -lpthread -lm # these are all "safe" tests that don't modify # system time or require escalated privileges @@ -8,7 +8,7 @@ TEST_GEN_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \ inconsistency-check raw_skew threadtest rtctest TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \ - skew_consistency clocksource-switch leap-a-day \ + skew_consistency clocksource-switch freq-step leap-a-day \ leapcrash set-tai set-2038 set-tz @@ -24,6 +24,7 @@ run_destructive_tests: run_tests ./change_skew ./skew_consistency ./clocksource-switch + ./freq-step ./leap-a-day -s -i 10 ./leapcrash ./set-tz diff --git a/tools/testing/selftests/timers/freq-step.c b/tools/testing/selftests/timers/freq-step.c new file mode 100644 index 000..e8c6183 --- /dev/null +++ b/tools/testing/selftests/timers/freq-step.c @@ -0,0 +1,268 @@ +/* + * This test checks the response of the system clock to frequency + * steps made with adjtimex(). The frequency error and stability of + * the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock + * is measured in two intervals following the step. The test fails if + * values from the second interval exceed specified limits. + * + * Copyright (C) Miroslav Lichvar 2017 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define SAMPLES 100 +#define SAMPLE_READINGS 10 +#define MEAN_SAMPLE_INTERVAL 0.1 +#define STEP_INTERVAL 1.0 +#define MAX_PRECISION 100e-9 +#define MAX_FREQ_ERROR 10e-6 +#define MAX_STDDEV 1000e-9 + +struct sample { + double offset; + double time; +}; + +static time_t mono_raw_base; +static time_t mono_base; +static long user_hz; +static double precision; +static double mono_freq_offset; + +static double diff_timespec(struct timespec *ts1, struct timespec *ts2) +{ + return ts1->tv_sec - ts2->tv_sec + (ts1->tv_nsec - ts2->tv_nsec) / 1e9; +} + +static double get_sample(struct sample *sample) +{ + double delay, mindelay = 0.0; + struct timespec ts1, ts2, ts3; + int i; + + for (i = 0; i < SAMPLE_READINGS; i++) { + clock_gettime(CLOCK_MONOTONIC_RAW, ); + clock_gettime(CLOCK_MONOTONIC, ); + clock_gettime(CLOCK_MONOTONIC_RAW, ); + + ts1.tv_sec -= mono_raw_base; + ts2.tv_sec -= mono_base; + ts3.tv_sec -= mono_raw_base; + + delay = diff_timespec(, ); + if (delay <= 1e-9) { + i--; + continue; + } + + if (!i || delay < mindelay) { + sample->offset = diff_timespec(, ); + sample->offset -= delay / 2.0; + sample->time = ts1.tv_sec + ts1.tv_nsec / 1e9; + mindelay = delay; + } + } + + return mindelay; +} + +static void reset_ntp_error(void) +{ + struct timex txc; + + txc.modes = ADJ_SETOFFSET; + txc.time.tv_sec = 0; + txc.time.tv_usec = 0; + + if (adjtimex() < 0) { + perror("[FAIL] adjtimex"); + ksft_exit_fail(); + } +} + +static void set_frequency(double freq) +{ + struct timex txc; + int tick_offset; + + tick_offset = 1e6 *
Re: [PATCH RFC 0/3] Improve stability of system clock
On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote: > >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar <mlich...@redhat.com> > >> wrote: > >> > Is there a better way to run the timekeeping code in an userspace > >> > application? I suspect it would need something like the Linux Kernel > >> > Library project. > So a few years ago I mentioned this at a testing session at I think > Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter > (or iptables?) simulator code that never made it upstream. However, > now that kselftests are integrated with the kernel this could change. > At least that's my memory of the discussion. > > Anyway, I still think its worth trying to submit. Worse case its a > huge pain and we pull it back out? I've tried something different. I've reimplemented the simulated test as an ordinary user-space application using the CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give results instantly, and it's not as precise as the original test, but it can clearly show the difference that this patchset makes. Before: Clock precision: 46 ns CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm base step freqdev max freqdev max 15.24 40960 -0.042731852 +0.01 2 3 228.26 40960 +6.07 33089 225914 -0.09 2 4 55.42 40960 -11.90 46413 232834 +0.01 1 3 237.65 40960 -4.75 25574 173479 +0.05 1 3 240.63 40960 -0.088465758 +0.05 2 3 205.52640 -0.117815231 -0.01 2 4 246.81640 +0.218095486 +0.08 1 3 164.04640 +0.162821920 +0.11 2 3 171.32640 -0.084082756 -0.01 2 3 243.04640 -0.033492377 +0.03 2 3 179.91 10 +0.07 28 62 -0.00 2 6 45.44 10 +0.10 18 119 +0.00 6 29 204.30 10 -0.00 21 122 -0.00 4 9 76.18 10 +0.03 39 85 -0.00 3 5 158.18 10 -0.02 26 94 -0.00 4 9 After: Clock precision: 46 ns CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm base step freqdev max freqdev max 93.98 40960 +0.00 3 7 +0.00 4 8 117.95 40960 +0.00 3 9 -0.00 3 8 230.44 40960 -0.00 4 9 -0.00 2 5 240.56 40960 +0.00 3 6 -0.00 3 7 228.39 40960 +0.00 3 7 -0.00 3 9 237.85640 -0.00 4 10 +0.00 3 8 250.74640 +0.00 4 11 -0.00 3 7 249.06640 +0.00 4 9 -0.00 3 9 114.98640 -0.00 3 8 +0.00 3 8 120.59640 +0.00 3 7 +0.00 3 7 190.66 10 -0.00 3 7 +0.00 3 7 228.83 10 +0.00 3 7 -0.00 3 6 18.91 10 +0.00 3 8 -0.00 3 8 12.39 10 +0.00 3 8 +0.00 4 8 12.01 10 +0.00 4 9 -0.00 4 9 Each line has statistics from 100 samples collected in 0.1 second interval. The frequency error in the second "freq" column with values up to 0.11 ppm shows the problem with the clock very slowly correcting a large NTP error. I can add some limits for the measured errors and submit it as new a kselftest. If the measured precision is too large (e.g. >100ns), the test can return "skip" in order to avoid false negatives. If adjtimex() had an option to return the NTP error directly, or it was possible to read the two clocks at the same time, the test could be much more sensitive and observe shorter intervals (spanning fewer clock updates). -- Miroslav Lichvar
Re: [PATCH RFC 0/3] Improve stability of system clock
On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote: > >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar > >> wrote: > >> > Is there a better way to run the timekeeping code in an userspace > >> > application? I suspect it would need something like the Linux Kernel > >> > Library project. > So a few years ago I mentioned this at a testing session at I think > Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter > (or iptables?) simulator code that never made it upstream. However, > now that kselftests are integrated with the kernel this could change. > At least that's my memory of the discussion. > > Anyway, I still think its worth trying to submit. Worse case its a > huge pain and we pull it back out? I've tried something different. I've reimplemented the simulated test as an ordinary user-space application using the CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give results instantly, and it's not as precise as the original test, but it can clearly show the difference that this patchset makes. Before: Clock precision: 46 ns CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm base step freqdev max freqdev max 15.24 40960 -0.042731852 +0.01 2 3 228.26 40960 +6.07 33089 225914 -0.09 2 4 55.42 40960 -11.90 46413 232834 +0.01 1 3 237.65 40960 -4.75 25574 173479 +0.05 1 3 240.63 40960 -0.088465758 +0.05 2 3 205.52640 -0.117815231 -0.01 2 4 246.81640 +0.218095486 +0.08 1 3 164.04640 +0.162821920 +0.11 2 3 171.32640 -0.084082756 -0.01 2 3 243.04640 -0.033492377 +0.03 2 3 179.91 10 +0.07 28 62 -0.00 2 6 45.44 10 +0.10 18 119 +0.00 6 29 204.30 10 -0.00 21 122 -0.00 4 9 76.18 10 +0.03 39 85 -0.00 3 5 158.18 10 -0.02 26 94 -0.00 4 9 After: Clock precision: 46 ns CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm base step freqdev max freqdev max 93.98 40960 +0.00 3 7 +0.00 4 8 117.95 40960 +0.00 3 9 -0.00 3 8 230.44 40960 -0.00 4 9 -0.00 2 5 240.56 40960 +0.00 3 6 -0.00 3 7 228.39 40960 +0.00 3 7 -0.00 3 9 237.85640 -0.00 4 10 +0.00 3 8 250.74640 +0.00 4 11 -0.00 3 7 249.06640 +0.00 4 9 -0.00 3 9 114.98640 -0.00 3 8 +0.00 3 8 120.59640 +0.00 3 7 +0.00 3 7 190.66 10 -0.00 3 7 +0.00 3 7 228.83 10 +0.00 3 7 -0.00 3 6 18.91 10 +0.00 3 8 -0.00 3 8 12.39 10 +0.00 3 8 +0.00 4 8 12.01 10 +0.00 4 9 -0.00 4 9 Each line has statistics from 100 samples collected in 0.1 second interval. The frequency error in the second "freq" column with values up to 0.11 ppm shows the problem with the clock very slowly correcting a large NTP error. I can add some limits for the measured errors and submit it as new a kselftest. If the measured precision is too large (e.g. >100ns), the test can return "skip" in order to avoid false negatives. If adjtimex() had an option to return the NTP error directly, or it was possible to read the two clocks at the same time, the test could be much more sensitive and observe shorter intervals (spanning fewer clock updates). -- Miroslav Lichvar
Re: [PATCH RFC 0/3] Improve stability of system clock
On Wed, May 17, 2017 at 10:02:00AM -0700, John Stultz wrote: > On Wed, May 17, 2017 at 9:57 AM, Miroslav Lichvar <mlich...@redhat.com> wrote: > > On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote: > >> Could you submit your linux-tktest infrastructure to the kselftests dir? > > > > I can, but it's a mess that breaks frequently as the timekeeping and > > other kernel code changes. Are you sure you want that in the kernel > > tree? :) > > Being a mess is a slight concern, but as for breaking, if its > in-kernel, then folks can't make changes that break it, right? It duplicates/stubs quite a few kernel functions that are needed to compile and link the timekeeping.c file into an executable. See linux-tktest/missing.c. If their signature changes, or new functions are needed, it will break. Is there a better way to run the timekeeping code in an userspace application? I suspect it would need something like the Linux Kernel Library project. -- Miroslav Lichvar
Re: [PATCH RFC 0/3] Improve stability of system clock
On Wed, May 17, 2017 at 10:02:00AM -0700, John Stultz wrote: > On Wed, May 17, 2017 at 9:57 AM, Miroslav Lichvar wrote: > > On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote: > >> Could you submit your linux-tktest infrastructure to the kselftests dir? > > > > I can, but it's a mess that breaks frequently as the timekeeping and > > other kernel code changes. Are you sure you want that in the kernel > > tree? :) > > Being a mess is a slight concern, but as for breaking, if its > in-kernel, then folks can't make changes that break it, right? It duplicates/stubs quite a few kernel functions that are needed to compile and link the timekeeping.c file into an executable. See linux-tktest/missing.c. If their signature changes, or new functions are needed, it will break. Is there a better way to run the timekeeping code in an userspace application? I suspect it would need something like the Linux Kernel Library project. -- Miroslav Lichvar
Re: [PATCH RFC 0/3] Improve stability of system clock
On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote: > So thanks for sending these out. I still need to look them over in > depth, but can I make another ask here? :) > > Could you submit your linux-tktest infrastructure to the kselftests dir? I can, but it's a mess that breaks frequently as the timekeeping and other kernel code changes. Are you sure you want that in the kernel tree? :) -- Miroslav Lichvar
Re: [PATCH RFC 0/3] Improve stability of system clock
On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote: > So thanks for sending these out. I still need to look them over in > depth, but can I make another ask here? :) > > Could you submit your linux-tktest infrastructure to the kselftests dir? I can, but it's a mess that breaks frequently as the timekeeping and other kernel code changes. Are you sure you want that in the kernel tree? :) -- Miroslav Lichvar
[PATCH RFC 1/3] timekeeping: Remove support for old vsyscalls
As the last users of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated, the support can be removed from the timekeeping code. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- include/linux/timekeeper_internal.h | 7 -- kernel/time/Kconfig | 4 kernel/time/timekeeping.c | 44 - 3 files changed, 55 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 110f453..b7ae5b0 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -132,13 +132,6 @@ struct timekeeper { extern void update_vsyscall(struct timekeeper *tk); extern void update_vsyscall_tz(void); -#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD) - -extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, - struct clocksource *c, u32 mult, - u64 cycle_last); -extern void update_vsyscall_tz(void); - #else static inline void update_vsyscall(struct timekeeper *tk) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 4008d9f..55d61a3 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE config GENERIC_TIME_VSYSCALL bool -# Timekeeping vsyscall support -config GENERIC_TIME_VSYSCALL_OLD - bool - # Old style timekeeping config ARCH_USES_GETTIMEOFFSET bool diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 8fd77c6..ff542dd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -487,44 +487,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk) update_fast_timekeeper(_dummy, _fast_raw); } -#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD - -static inline void update_vsyscall(struct timekeeper *tk) -{ - struct timespec xt, wm; - - xt = timespec64_to_timespec(tk_xtime(tk)); - wm = timespec64_to_timespec(tk->wall_to_monotonic); - update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult, - tk->tkr_mono.cycle_last); -} - -static inline void old_vsyscall_fixup(struct timekeeper *tk) -{ - s64 remainder; - - /* - * Store only full nanoseconds into xtime_nsec after rounding - * it up and add the remainder to the error difference. - * XXX - This is necessary to avoid small 1ns inconsistnecies caused - * by truncating the remainder in vsyscalls. However, it causes - * additional work to be done in timekeeping_adjust(). Once - * the vsyscall implementations are converted to use xtime_nsec - * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD - * users are removed, this can be killed. - */ - remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 1); - if (remainder != 0) { - tk->tkr_mono.xtime_nsec -= remainder; - tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift; - tk->ntp_error += remainder << tk->ntp_error_shift; - tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << tk->ntp_error_shift; - } -} -#else -#define old_vsyscall_fixup(tk) -#endif - static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk, bool was_set) @@ -2065,12 +2027,6 @@ void update_wall_time(void) timekeeping_adjust(tk, offset); /* -* XXX This can be killed once everyone converts -* to the new update_vsyscall. -*/ - old_vsyscall_fixup(tk); - - /* * Finally, make sure that after the rounding * xtime_nsec isn't larger than NSEC_PER_SEC */ -- 2.9.3
[PATCH RFC 1/3] timekeeping: Remove support for old vsyscalls
As the last users of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated, the support can be removed from the timekeeping code. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- include/linux/timekeeper_internal.h | 7 -- kernel/time/Kconfig | 4 kernel/time/timekeeping.c | 44 - 3 files changed, 55 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 110f453..b7ae5b0 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -132,13 +132,6 @@ struct timekeeper { extern void update_vsyscall(struct timekeeper *tk); extern void update_vsyscall_tz(void); -#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD) - -extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm, - struct clocksource *c, u32 mult, - u64 cycle_last); -extern void update_vsyscall_tz(void); - #else static inline void update_vsyscall(struct timekeeper *tk) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 4008d9f..55d61a3 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE config GENERIC_TIME_VSYSCALL bool -# Timekeeping vsyscall support -config GENERIC_TIME_VSYSCALL_OLD - bool - # Old style timekeeping config ARCH_USES_GETTIMEOFFSET bool diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 8fd77c6..ff542dd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -487,44 +487,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk) update_fast_timekeeper(_dummy, _fast_raw); } -#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD - -static inline void update_vsyscall(struct timekeeper *tk) -{ - struct timespec xt, wm; - - xt = timespec64_to_timespec(tk_xtime(tk)); - wm = timespec64_to_timespec(tk->wall_to_monotonic); - update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult, - tk->tkr_mono.cycle_last); -} - -static inline void old_vsyscall_fixup(struct timekeeper *tk) -{ - s64 remainder; - - /* - * Store only full nanoseconds into xtime_nsec after rounding - * it up and add the remainder to the error difference. - * XXX - This is necessary to avoid small 1ns inconsistnecies caused - * by truncating the remainder in vsyscalls. However, it causes - * additional work to be done in timekeeping_adjust(). Once - * the vsyscall implementations are converted to use xtime_nsec - * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD - * users are removed, this can be killed. - */ - remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 1); - if (remainder != 0) { - tk->tkr_mono.xtime_nsec -= remainder; - tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift; - tk->ntp_error += remainder << tk->ntp_error_shift; - tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << tk->ntp_error_shift; - } -} -#else -#define old_vsyscall_fixup(tk) -#endif - static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk, bool was_set) @@ -2065,12 +2027,6 @@ void update_wall_time(void) timekeeping_adjust(tk, offset); /* -* XXX This can be killed once everyone converts -* to the new update_vsyscall. -*/ - old_vsyscall_fixup(tk); - - /* * Finally, make sure that after the rounding * xtime_nsec isn't larger than NSEC_PER_SEC */ -- 2.9.3
[PATCH RFC 2/3] timekeeping: Don't align frequency adjustments to ticks
When the timekeeping multiplier is adjusted, the NTP error is adjusted to correct the clock for the misalignment of the update to the start of the tick. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index ff542dd..5ae6f27 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1760,8 +1760,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1772,7 +1770,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /* -- 2.9.3
[PATCH RFC 2/3] timekeeping: Don't align frequency adjustments to ticks
When the timekeeping multiplier is adjusted, the NTP error is adjusted to correct the clock for the misalignment of the update to the start of the tick. This error is corrected in later updates and the clock appears as if the frequency was changed exactly on the tick. Remove this correction to keep the point where the frequency is effectively changed at the time of the update. This removes a major source of the NTP error. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- kernel/time/timekeeping.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index ff542dd..5ae6f27 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1760,8 +1760,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplfies to: * xtime_nsec -= offset -* -* XXX - TODO: Doc ntp_error calculation. */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ @@ -1772,7 +1770,6 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; - tk->ntp_error -= (interval - offset) << tk->ntp_error_shift; } /* -- 2.9.3
[PATCH RFC 0/3] Improve stability of system clock
This is an attempt to improve stability and accuracy of the system clock with very accurate time sources like the new PTP KVM clock or NTP/PTP using hardware timestamping. It affects mainly kernels running with NOHZ. It requires updating of the old ia64 and powerpc vsyscalls. The main problem is that the error accumulated in the ntp_error register takes too long to correct and this cannot be easily fixed. There are four sources of the error: - rounding of time for old vsyscalls - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multipler Instead of trying to correct the error faster, the patches remove the first three sources. With the only remaining source the correction logic can be simplified and the frequency of the clock is much more stable and accurate. Simulations of a frequency step in linux-tktest (values are in ppm and nanoseconds): Before: nohz on [1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 1.472221341.32217.8 0.06322 0.2 0.5 30 0.20799 849.52448.7 0.06311 0.2 0.6 100 0.04101 492.12895.2 0.06311 0.2 0.5 300 0.05660 295.53026.1 0.02064 28.3 108.9 1000 0.01994 409.82732.1 0.00355 13.7 52.2 3000 0.00477 469.13238.9 0.00070 11.0 40.9 10.00081 377.33791.6 0.00013 9.4 36.2 30.00016 259.94055.7 0.4 8.9 34.1 10 0.3 159.04177.2 0.0 13.7 58.4 nohz off[1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 3.55062 6.2 10.8 0.05730 0.0 0.0 30 0.44672 4.5 14.1 0.05724 0.2 0.5 100 0.03649 2.7 17.4 0.05711 0.2 0.5 300 0.05815 1.7 18.7 0.06313 0.2 0.5 1000 0.06270 1.0 19.1 0.06315 0.2 0.5 3000 0.05720 1.9 19.9 0.02065 1.1 4.1 10.01947 13.5 41.0 0.00339 0.5 1.7 30.00448 17.5 75.9 0.00065 0.3 1.0 10 0.00078 14.2 101.7 0.00012 0.2 0.7 After: nohz on [1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 0.01584 9.0 14.2 0.02937 2.7 7.2 30 0.00426 10.9 22.4 0.00481 6.5 19.2 100 0.00077 11.6 26.3 0.00074 9.0 26.9 300 0.00013 12.4 29.9 0.00018 8.7 29.3 1000 0.3 12.6 31.8 0.3 8.7 32.1 3000 0.1 12.6 33.3 0.1 9.1 33.4 10.0 12.9 34.0 0.0 9.0 34.1 30.0 12.8 34.5 0.0 9.0 34.5 10 0.0 16.5 51.2 0.0 13.7 58.5 nohz off[1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 0.10309 0.1 0.1 0.12717 0.0 0.1 30 0.04269 0.1 0.3 0.02592 0.1 0.4 100 0.00629 0.3 0.5 0.00521 0.2 0.5 300 0.00109 0.3 0.6 0.00099 0.2 0.5 1000 0.00019 0.3 0.6 0.00022 0.2 0.6 3000 0.2 0.3 0.6 0.2 0.2 0.6 10.0 0.3 0.6 0.0 0.2 0.6 30.0 0.3 0.6 0.0 0.2 0.6 10 0.0 0.3 0.6 0.0 0.2 0.6 Miroslav Lichvar (3): timekeeping: Remove support for old vsyscalls timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 9 +- kernel/time/Kconfig | 4 - kernel/time/timekeeping.c | 184 +--- 3 files changed, 48 insertions(+), 149 deletions(-) -- 2.9.3
[PATCH RFC 0/3] Improve stability of system clock
This is an attempt to improve stability and accuracy of the system clock with very accurate time sources like the new PTP KVM clock or NTP/PTP using hardware timestamping. It affects mainly kernels running with NOHZ. It requires updating of the old ia64 and powerpc vsyscalls. The main problem is that the error accumulated in the ntp_error register takes too long to correct and this cannot be easily fixed. There are four sources of the error: - rounding of time for old vsyscalls - alignment of frequency adjustments to ticks - iterative correction of the multiplier - limited resolution of the multipler Instead of trying to correct the error faster, the patches remove the first three sources. With the only remaining source the correction logic can be simplified and the frequency of the clock is much more stable and accurate. Simulations of a frequency step in linux-tktest (values are in ppm and nanoseconds): Before: nohz on [1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 1.472221341.32217.8 0.06322 0.2 0.5 30 0.20799 849.52448.7 0.06311 0.2 0.6 100 0.04101 492.12895.2 0.06311 0.2 0.5 300 0.05660 295.53026.1 0.02064 28.3 108.9 1000 0.01994 409.82732.1 0.00355 13.7 52.2 3000 0.00477 469.13238.9 0.00070 11.0 40.9 10.00081 377.33791.6 0.00013 9.4 36.2 30.00016 259.94055.7 0.4 8.9 34.1 10 0.3 159.04177.2 0.0 13.7 58.4 nohz off[1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 3.55062 6.2 10.8 0.05730 0.0 0.0 30 0.44672 4.5 14.1 0.05724 0.2 0.5 100 0.03649 2.7 17.4 0.05711 0.2 0.5 300 0.05815 1.7 18.7 0.06313 0.2 0.5 1000 0.06270 1.0 19.1 0.06315 0.2 0.5 3000 0.05720 1.9 19.9 0.02065 1.1 4.1 10.01947 13.5 41.0 0.00339 0.5 1.7 30.00448 17.5 75.9 0.00065 0.3 1.0 10 0.00078 14.2 101.7 0.00012 0.2 0.7 After: nohz on [1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 0.01584 9.0 14.2 0.02937 2.7 7.2 30 0.00426 10.9 22.4 0.00481 6.5 19.2 100 0.00077 11.6 26.3 0.00074 9.0 26.9 300 0.00013 12.4 29.9 0.00018 8.7 29.3 1000 0.3 12.6 31.8 0.3 8.7 32.1 3000 0.1 12.6 33.3 0.1 9.1 33.4 10.0 12.9 34.0 0.0 9.0 34.1 30.0 12.8 34.5 0.0 9.0 34.5 10 0.0 16.5 51.2 0.0 13.7 58.5 nohz off[1, samples/2] [samples/2 + 1, samples] samples freq dev max freq dev max 10 0.10309 0.1 0.1 0.12717 0.0 0.1 30 0.04269 0.1 0.3 0.02592 0.1 0.4 100 0.00629 0.3 0.5 0.00521 0.2 0.5 300 0.00109 0.3 0.6 0.00099 0.2 0.5 1000 0.00019 0.3 0.6 0.00022 0.2 0.6 3000 0.2 0.3 0.6 0.2 0.2 0.6 10.0 0.3 0.6 0.0 0.2 0.6 30.0 0.3 0.6 0.0 0.2 0.6 10 0.0 0.3 0.6 0.0 0.2 0.6 Miroslav Lichvar (3): timekeeping: Remove support for old vsyscalls timekeeping: Don't align frequency adjustments to ticks timekeeping: Determine multiplier directly from NTP tick length include/linux/timekeeper_internal.h | 9 +- kernel/time/Kconfig | 4 - kernel/time/timekeeping.c | 184 +--- 3 files changed, 48 insertions(+), 149 deletions(-) -- 2.9.3
[PATCH RFC 3/3] timekeeping: Determine multiplier directly from NTP tick length
When the length of the NTP tick changes significantly, e.g. when an NTP/PTP implementation corrects the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This creates a small unstable frequency offset and prevents stable synchronization of the clock with very stable time sources (e.g. NTP/PTP using hardware timestamping or PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length in order to replace the iterative approach and remove the last major source of the NTP error. The only remaining source of the error is now limited resolution of the multiplier, which can be effectively corrected by adding 0 or 1 to the multiplier according to the sign of the error. Cc: John Stultz <john.stu...@linaro.org> Cc: Prarit Bhargava <pra...@redhat.com> Cc: Richard Cochran <richardcoch...@gmail.com> Signed-off-by: Miroslav Lichvar <mlich...@redhat.com> --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 137 2 files changed, 48 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index b7ae5b0..e9a46e5 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -113,6 +113,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 5ae6f27..b4cc606 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -289,6 +289,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1699,20 +1700,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1773,85 +1773,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool negative; - u32 adj_scale; - - /* Remove any current error adj from freq calculation */ - if (tk->ntp_err_mult) - xinterval -= tk->cycle_interval; - - tk->ntp_tick = ntp_tick_length(); - - /* Calculate current error per tick */ - tick_error = ntp_tick_length() >> tk->ntp_error_shift; - tick_error -= (xinterval + tk->xtime_remainder); - - /* Don't worry about correcting it if its small */ - if (likely((tick_error >= 0) && (tick_error <= interval))) - return; - - /* preserve the direction of correction */ - negative = (tick_error < 0); + u32 mult; - /* If any adjustment would pass the max, just return */ - if (negative && (cu
[PATCH RFC 3/3] timekeeping: Determine multiplier directly from NTP tick length
When the length of the NTP tick changes significantly, e.g. when an NTP/PTP implementation corrects the initial offset of the clock, a large value may accumulate in the NTP error before the multiplier converges to the correct value. It may then take a very long time (hours or even days) before the error is corrected. This creates a small unstable frequency offset and prevents stable synchronization of the clock with very stable time sources (e.g. NTP/PTP using hardware timestamping or PTP KVM clock). Use division to determine the correct multiplier directly from the NTP tick length in order to replace the iterative approach and remove the last major source of the NTP error. The only remaining source of the error is now limited resolution of the multiplier, which can be effectively corrected by adding 0 or 1 to the multiplier according to the sign of the error. Cc: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Signed-off-by: Miroslav Lichvar --- include/linux/timekeeper_internal.h | 2 + kernel/time/timekeeping.c | 137 2 files changed, 48 insertions(+), 91 deletions(-) diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index b7ae5b0..e9a46e5 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -113,6 +113,8 @@ struct timekeeper { s64 ntp_error; u32 ntp_error_shift; u32 ntp_err_mult; + /* Flag used to avoid updating NTP twice with same second */ + u32 skip_second_overflow; #ifdef CONFIG_DEBUG_TIMEKEEPING longlast_warning; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 5ae6f27..b4cc606 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -289,6 +289,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; + tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ @@ -1699,20 +1700,19 @@ device_initcall(timekeeping_init_ops); */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, -bool negative, -int adj_scale) +s32 mult_adj) { s64 interval = tk->cycle_interval; - s32 mult_adj = 1; - if (negative) { - mult_adj = -mult_adj; + if (mult_adj == 0) { + return; + } else if (mult_adj == -1) { interval = -interval; - offset = -offset; + offset = -offset; + } else if (mult_adj != 1) { + interval *= mult_adj; + offset *= mult_adj; } - mult_adj <<= adj_scale; - interval <<= adj_scale; - offset <<= adj_scale; /* * So the following can be confusing. @@ -1773,85 +1773,35 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, } /* - * Calculate the multiplier adjustment needed to match the frequency - * specified by NTP + * Adjust the timekeeper's multiplier to the correct frequency + * and also to reduce the accumulated error value. */ -static __always_inline void timekeeping_freqadjust(struct timekeeper *tk, - s64 offset) +static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { - s64 interval = tk->cycle_interval; - s64 xinterval = tk->xtime_interval; - u32 base = tk->tkr_mono.clock->mult; - u32 max = tk->tkr_mono.clock->maxadj; - u32 cur_adj = tk->tkr_mono.mult; - s64 tick_error; - bool negative; - u32 adj_scale; - - /* Remove any current error adj from freq calculation */ - if (tk->ntp_err_mult) - xinterval -= tk->cycle_interval; - - tk->ntp_tick = ntp_tick_length(); - - /* Calculate current error per tick */ - tick_error = ntp_tick_length() >> tk->ntp_error_shift; - tick_error -= (xinterval + tk->xtime_remainder); - - /* Don't worry about correcting it if its small */ - if (likely((tick_error >= 0) && (tick_error <= interval))) - return; - - /* preserve the direction of correction */ - negative = (tick_error < 0); + u32 mult; - /* If any adjustment would pass the max, just return */ - if (negative && (cur_adj - 1) <= (base - max)) - return; - if (!negative && (cur_adj + 1) >= (base + m
Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
On Fri, May 12, 2017 at 10:26:12AM -0700, John Stultz wrote: > On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar <mlich...@redhat.com> wrote: > > I see this with real PHCs and PTP/NTP synchronization too. It's very > > confusing when the timekeeping changes so much for no apparent reason. > > If we can't remove the old vsyscalls yet, I was thinking maybe a new > > flag could be added to adjtimex to report the error, so applications > > can at least detect this problem and consider stepping the clock in > > order to reset the error? > > > > Thoughts? > > I'd rather not have short-term hacks that applications have to adapt. > So I think we should drop the old vsyscall method in the near term. > Sorry this sort of fell off my radar. Ok. Sounds good. > Do you have an updated set of patches you want to get ready to address > the issue? We can get those reviewed while we increase the pressure on > dropping the OLD_VSYSCALL implementations. I'll send an RFC series shortly. Thanks, -- Miroslav Lichvar
Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
On Fri, May 12, 2017 at 10:26:12AM -0700, John Stultz wrote: > On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar wrote: > > I see this with real PHCs and PTP/NTP synchronization too. It's very > > confusing when the timekeeping changes so much for no apparent reason. > > If we can't remove the old vsyscalls yet, I was thinking maybe a new > > flag could be added to adjtimex to report the error, so applications > > can at least detect this problem and consider stepping the clock in > > order to reset the error? > > > > Thoughts? > > I'd rather not have short-term hacks that applications have to adapt. > So I think we should drop the old vsyscall method in the near term. > Sorry this sort of fell off my radar. Ok. Sounds good. > Do you have an updated set of patches you want to get ready to address > the issue? We can get those reviewed while we increase the pressure on > dropping the OLD_VSYSCALL implementations. I'll send an RFC series shortly. Thanks, -- Miroslav Lichvar
Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote: > On 07/08/2014 04:08 AM, Miroslav Lichvar wrote: > > I spent some time trying to figure out a workaround for the nanosecond > > rounding, but I didn't find anything that wouldn't complicate the mult > > adjustment logic and bring back the problems which the direct division > > approach is supposed to solve. > > > > It seems it may be a while before the old vsyscalls are fixed. How > > about including only the first two patches from this set for now? > So thanks for the ping here. If you're happy with the first two as an > initial step, I'd be willing to try to push those in. The only trouble > is there's a whole lot of timekeeping churn headed for 3.17 that Thomas > has cooked up. While there isn't likely to be direct conflicts in the > changes, I get nervous about mixing too many changes in subtle code at once. I'm sorry for digging up this skeleton. Are we any closer to being able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented simplifying the steering logic of the internal clock? The two patches added in 3.17 helped, but there is still the problem that it takes too long for the kernel clock to converge to the internal NTP time when a large error was accumulated, e.g. a large frequency adjustment was made to correct the initial offset of the clock. It seems the error can reach milliseconds and take hours or even days to be corrected. As the rate at which the error is decreasing is not constant (due to random clock updates), it doesn't seem to be possible to synchronize the clock with better stability than few tens or hundreds of nanoseconds. With the new PTP KVM clock the problem can be easily observed. Here is a graph of the offset and frequency as measured by chronyd when configured to synchronize the guest's clock to the host using the virtual PHC. In the middle is the point when the NTP error reached zero. The apparent frequency jumped by about 50 ppb and the offset improved by an order of magnitude. https://mlichvar.fedorapeople.org/tmp/kvm_phc.png I see this with real PHCs and PTP/NTP synchronization too. It's very confusing when the timekeeping changes so much for no apparent reason. If we can't remove the old vsyscalls yet, I was thinking maybe a new flag could be added to adjtimex to report the error, so applications can at least detect this problem and consider stepping the clock in order to reset the error? Thoughts? -- Miroslav Lichvar
Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote: > On 07/08/2014 04:08 AM, Miroslav Lichvar wrote: > > I spent some time trying to figure out a workaround for the nanosecond > > rounding, but I didn't find anything that wouldn't complicate the mult > > adjustment logic and bring back the problems which the direct division > > approach is supposed to solve. > > > > It seems it may be a while before the old vsyscalls are fixed. How > > about including only the first two patches from this set for now? > So thanks for the ping here. If you're happy with the first two as an > initial step, I'd be willing to try to push those in. The only trouble > is there's a whole lot of timekeeping churn headed for 3.17 that Thomas > has cooked up. While there isn't likely to be direct conflicts in the > changes, I get nervous about mixing too many changes in subtle code at once. I'm sorry for digging up this skeleton. Are we any closer to being able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented simplifying the steering logic of the internal clock? The two patches added in 3.17 helped, but there is still the problem that it takes too long for the kernel clock to converge to the internal NTP time when a large error was accumulated, e.g. a large frequency adjustment was made to correct the initial offset of the clock. It seems the error can reach milliseconds and take hours or even days to be corrected. As the rate at which the error is decreasing is not constant (due to random clock updates), it doesn't seem to be possible to synchronize the clock with better stability than few tens or hundreds of nanoseconds. With the new PTP KVM clock the problem can be easily observed. Here is a graph of the offset and frequency as measured by chronyd when configured to synchronize the guest's clock to the host using the virtual PHC. In the middle is the point when the NTP error reached zero. The apparent frequency jumped by about 50 ppb and the offset improved by an order of magnitude. https://mlichvar.fedorapeople.org/tmp/kvm_phc.png I see this with real PHCs and PTP/NTP synchronization too. It's very confusing when the timekeeping changes so much for no apparent reason. If we can't remove the old vsyscalls yet, I was thinking maybe a new flag could be added to adjtimex to report the error, so applications can at least detect this problem and consider stepping the clock in order to reset the error? Thoughts? -- Miroslav Lichvar
Re: [patch 4/5] PTP: add PTP_SYS_OFFSET emulation via cross timestamps infrastructure
On Tue, Jan 24, 2017 at 06:32:43AM +0100, Richard Cochran wrote: > On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote: > > Can you please describe what problem exists with this scheme? > > This new kernel code exists just because chrony doesn't implement the > PRECISE ioctl. Instead of adding new "fake" modes, just teach chrony > about the better method. The latest development code of chrony now supports the PRECISE ioctl. I did some tests with an e1000e NIC (i219) and it seemed the stability was slightly worse than with the non-PRECISE ioctl, but there was a 400-500ns offset between the two, so it should be much more accurate (we finally have something that avoids the asymmetry on the PCI-e bus?). Configuring the refclock with a shorter dpoll should compensate for the decrease in stability. In any case, there is a "nocrossts" option to not use the PRECISE ioctl even if it's supported. -- Miroslav Lichvar
Re: [patch 4/5] PTP: add PTP_SYS_OFFSET emulation via cross timestamps infrastructure
On Tue, Jan 24, 2017 at 06:32:43AM +0100, Richard Cochran wrote: > On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote: > > Can you please describe what problem exists with this scheme? > > This new kernel code exists just because chrony doesn't implement the > PRECISE ioctl. Instead of adding new "fake" modes, just teach chrony > about the better method. The latest development code of chrony now supports the PRECISE ioctl. I did some tests with an e1000e NIC (i219) and it seemed the stability was slightly worse than with the non-PRECISE ioctl, but there was a 400-500ns offset between the two, so it should be much more accurate (we finally have something that avoids the asymmetry on the PCI-e bus?). Configuring the refclock with a shorter dpoll should compensate for the decrease in stability. In any case, there is a "nocrossts" option to not use the PRECISE ioctl even if it's supported. -- Miroslav Lichvar