Re: [Bug 212265] New: clock_gettime(CLOCK_TAI, ...) should return an error when TAI has not been configured

2021-03-29 Thread Miroslav Lichvar
On Fri, Mar 26, 2021 at 08:28:59PM -0700, Richard Cochran wrote:
> Using ntpd on Debian, the service will set the offset, but only after
> synchronization with the upstream server has been established, and
> this takes about five minutes, IIRC.

With the iburst option it shouldn't take more than 10 seconds. There
might be an issue wrt stepping the clock when the initial offset is
large. In Fedora and derived distros using chrony by default the
TAI-UTC offset should be set right on the first update of the clock as
expected.

> Getting back to the original point of the kernel returning an error,
> I don't see a need for this.  Applications that require correct leap
> seconds can simply call adjtimex() and wait until the initial zero
> value is changed by ntpd/etc to the correct offset.  That isn't
> fundamentally harder than calling clock_gettime() and waiting until
> the error would go away.

There are at least two issues with handling a zero offset as a special
value. One is that zero could potentially be a valid value in distant
future. The other is that the kernel updates the offset when a leap
second is inserted/deleted even if the original offset is zero, so
checking for zero (in the kernel or an application) works only until
the first leap second after boot.

The kernel would need to set a flag that the offset was set. Returning
an error in clock_gettime() until the offset is set sounds reasonable
to me, but I have no idea how many of the existing applications it
would break.

-- 
Miroslav Lichvar



Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

2020-12-07 Thread Miroslav Lichvar
On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote:
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.

It works well in my tests.

> This becomes now:
> 
>  if (ntp_synced() && !hrtimer_is_queued(_hrtimer))
>   queue_work(system_power_efficient_wq, _work, 0);
> 
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
> 
> Not a big problem as it just schedules work for nothing.

No more unexpected updates of the RTC observed.

Tested-by: Miroslav Lichvar 

Thanks,

-- 
Miroslav Lichvar



Re: [PATCH] rtc: adapt allowed RTC update error

2020-12-02 Thread Miroslav Lichvar
On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote:
> On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
> > Something like the completely untested below should make this reliable
> > and only needs to retry when the work is running late (busy machine),
> > but the wakeup will be on time or at max 1 jiffie off when high
> > resolution timers are not available or disabled.
> 
> It seems to work nicely. In my test most of the updates succeeded on
> the first attempt hitting the right tick, the rest succeeding on the
> second attempt. Only when the clock was set to run 10% faster, it
> needed few more attempts to converge to the target time.

I noticed an observable change wrt adjtimex() calls though. It seems
it now reschedules the RTC update, i.e. there can be more than one
update per 11 minutes. Was this intended?

> @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void)
>  
> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> IS_ENABLED(CONFIG_RTC_SYSTOHC))
> -   queue_delayed_work(system_power_efficient_wq, _work, 0);
> +   queue_work(system_power_efficient_wq, _work);
>  }

-- 
Miroslav Lichvar



Re: [PATCH] rtc: adapt allowed RTC update error

2020-12-02 Thread Miroslav Lichvar
On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote:
> Something like the completely untested below should make this reliable
> and only needs to retry when the work is running late (busy machine),
> but the wakeup will be on time or at max 1 jiffie off when high
> resolution timers are not available or disabled.

It seems to work nicely. In my test most of the updates succeeded on
the first attempt hitting the right tick, the rest succeeding on the
second attempt. Only when the clock was set to run 10% faster, it
needed few more attempts to converge to the target time.

Thanks,

-- 
Miroslav Lichvar



[PATCHv2] rtc: adapt allowed RTC update error

2020-12-02 Thread Miroslav Lichvar
When the system clock is marked as synchronized via adjtimex(), the
kernel is expected to copy the system time to the RTC every 11 minutes.

There are reports that it doesn't always work reliably. It seems the
current requirement for the RTC update to happen within 5 ticks of the
target time in some cases can consistently fail for hours or even days.

It is better to set the RTC with a larger error than let it drift for
too long.

Instead of increasing the constant again, use a static variable to count
the checks and with each failed check increase the allowed error by one
jiffie. Reset the counter when the check finally succeeds. This will
allow the RTC update to keep good accuracy if it can happen in the first
few attempts and it will not take more than a minute if the timing is
consistently bad for any reason.

Signed-off-by: Miroslav Lichvar 
Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Jason Gunthorpe 
---

Notes:
v2:
- moved the static variable to callers in ntp.c

 drivers/rtc/systohc.c |  6 --
 include/linux/rtc.h   | 14 +-
 kernel/time/ntp.c |  9 +++--
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index 8b70f0520e13..0777f590cdae 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -5,6 +5,7 @@
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @attempt: Number of previous failures used to adjust allowed error
  * @target_nsec: pointer for desired now->tv_nsec value
  *
  * Replacement for the NTP platform function update_persistent_clock64
@@ -18,7 +19,8 @@
  *
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
+int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt,
+unsigned long *target_nsec)
 {
struct rtc_device *rtc;
struct rtc_time tm;
@@ -44,7 +46,7 @@ int rtc_set_ntp_time(struct timespec64 now, unsigned long 
*target_nsec)
 * it does not we update target_nsec and return EPROTO to make the ntp
 * code try again later.
 */
-   ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, _set, );
+   ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, attempt, _set, );
if (!ok) {
err = -EPROTO;
goto out_close;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 22d1575e4991..9f3326b43620 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -165,7 +165,8 @@ int __rtc_register_device(struct module *owner, struct 
rtc_device *rtc);
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
+extern int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt,
+   unsigned long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
struct rtc_wkalrm *alrm);
@@ -213,24 +214,27 @@ static inline bool is_leap_year(unsigned int year)
  * a zero in tv_nsecs, such that:
  *to_set - set_delay_nsec == now +/- FUZZ
  *
+ * The allowed error starts at 5 jiffies on the first attempt and is increased
+ * with each failed attempt to make sure the RTC will be set at some point,
+ * even if the timing is consistently inaccurate.
  */
 static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
+ unsigned int attempt,
  struct timespec64 *to_set,
  const struct timespec64 *now)
 {
-   /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
-   const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
+   unsigned long time_set_nsec_fuzz = (5 + attempt) * TICK_NSEC;
struct timespec64 delay = {.tv_sec = 0,
   .tv_nsec = set_offset_nsec};
 
*to_set = timespec64_add(*now, delay);
 
-   if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+   if (to_set->tv_nsec < time_set_nsec_fuzz) {
to_set->tv_nsec = 0;
return true;
}
 
-   if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+   if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) {
to_set->tv_sec++;
to_set->tv_nsec = 0;
return true;
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 069ca78fb0bf..893bc7ed7845 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -531,6 +531,7 @@ static void sched_sync_hw_clock(struct timespec64 now,
 
 static void sync_rtc_clock(void)
 {
+   static unsigned int attempt;
unsigned long target_nsec;
struct timespe

Re: [PATCH] rtc: adapt allowed RTC update error

2020-12-01 Thread Miroslav Lichvar
On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote:
> > +   unsigned long time_set_nsec_fuzz;
> > +   static unsigned int attempt;
> 
> Adding a static value instide a static inline should not be done

Well, grepping through the other header files in include/linux, this
would not be the first case.

> I'm not sure using a static like this is the best idea anyhow, if you
> want something like this it should be per-rtc, not global

If there are multiple RTCs, are they all updated in this 11-minute
sync?

> Did you look at why time has become so in-accurate in your system? 5
> jiffies is usually a pretty long time?

I found no good explanation. It seems to depend on what system is
doing, if it's idle, etc. I suspect it's a property of the workqueues
that they cannot generally guarantee the jobs to run exactly on time.
I tried switching to the non-power-efficient and high priority
workqueues and it didn't seem to make a big difference.

-- 
Miroslav Lichvar



[PATCH] rtc: adapt allowed RTC update error

2020-12-01 Thread Miroslav Lichvar
When the system clock is marked as synchronized via adjtimex(), the
kernel is expected to copy the system time to the RTC every 11 minutes.

There are reports that it doesn't always work reliably. It seems the
current requirement for the RTC update to happen within 5 ticks of the
target time in some cases can consistently fail for hours or even days.

It is better to set the RTC with a larger error than let it drift for
too long.

Add a static variable to rtc_tv_nsec_ok() to count the checks. With each
failed check, relax the requirement by one jiffie, and reset the counter
when it finally succeeds. This should allow the RTC update to happen in
a minute at most.

Signed-off-by: Miroslav Lichvar 
Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Jason Gunthorpe 
---
 include/linux/rtc.h | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 22d1575e4991..8d105f10ef6a 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
  struct timespec64 *to_set,
  const struct timespec64 *now)
 {
-   /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
-   const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
struct timespec64 delay = {.tv_sec = 0,
   .tv_nsec = set_offset_nsec};
+   unsigned long time_set_nsec_fuzz;
+   static unsigned int attempt;
 
*to_set = timespec64_add(*now, delay);
 
-   if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+   /*
+* Determine allowed error in tv_nsec.  Start at 5 jiffies and add a
+* jiffie with each failed attempt to make sure the RTC will be set at
+* some point, even if the update cannot be scheduled very accurately.
+*/
+   time_set_nsec_fuzz = (5 + attempt++) * TICK_NSEC;
+
+   if (to_set->tv_nsec < time_set_nsec_fuzz) {
to_set->tv_nsec = 0;
+   attempt = 0;
return true;
}
 
-   if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+   if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) {
to_set->tv_sec++;
to_set->tv_nsec = 0;
+   attempt = 0;
return true;
}
return false;
-- 
2.26.2



Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

2019-08-21 Thread Miroslav Lichvar
On Wed, Aug 21, 2019 at 11:53:12AM +0200, Hubert Feurstein wrote:
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is.

That's ok. It's the same with PCIe devices. It takes about 500 ns to
read a PCI register, so we know in the worst case the offset is
accurate to 250 ns. It's probably much better, maybe to 50 ns, but we
don't really know. We don't know how much asymmetry there is in the
PCIe delay (it certainly is not zero), or how much time the NIC
actually needs to respond to the command and when exactly it reads the
clock.

> When the MDIO
> driver implements
> the PTP system timestamping as follows ...
> 
>   ptp_read_system_prets(bus->ptp_sts);
>   writel(value, mdio-reg)
>   ptp_read_system_postts(bus->ptp_sts);
> 
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle.

Is this always the case?

> So if I subtract
> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO.

If I understand it correctly, this ignores the time needed for the
frame to be received, decoded and the clock to be read.

-- 
Miroslav Lichvar


Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

2019-08-21 Thread Miroslav Lichvar
On Tue, Aug 20, 2019 at 06:56:56PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
> > I think a large jitter is ok in this case. We just need to timestamp
> > something that we know for sure happened after the PHC timestamp. It
> > should have no impact on the offset and its stability, just the
> > reported delay. A test with phc2sys should be able to confirm that.
> > phc2sys selects the measurement with the shortest delay, which has
> > least uncertainty. I'd say that applies to both interrupt and polling.
> >
> > If it is difficult to specify the minimum interrupt delay, I'd still
> > prefer an overly pessimistic interval assuming a zero delay.
> >
> Currently I do not see the benefit from this. The original intention was to
> compensate for the remaining offset as good as possible.

That's ok, but IMHO the change should not break the assumptions of
existing application and users.

> The current code
> of phc2sys uses the delay only for the filtering of the measurement record
> with the shortest delay and for reporting and statistics. Why not simple shift
> the timestamps with the offset to the point where we expect the PHC timestamp
> to be captured, and we have a very good result compared to where we came
> from.

Because those reports/statistics are important in calculation of
maximum error. If someone had a requirement for a clock to be accurate
to 1.5 microseconds and the ioctl returned a delay indicating a
sufficient accuracy when in reality it could be worse, that would be a
problem.

BTW, phc2sys is not the only user of the ioctl.

-- 
Miroslav Lichvar


Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

2019-08-20 Thread Miroslav Lichvar
On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > - take a second "post" system timestamp after the completion
> 
> For this hardware, completion is an interrupt, which has a lot of
> jitter on it. But this hardware is odd, in that it uses an
> interrupt. Every other MDIO bus controller uses polled IO, with an
> mdelay(10) or similar between each poll. So the jitter is going to be
> much larger.

I think a large jitter is ok in this case. We just need to timestamp
something that we know for sure happened after the PHC timestamp. It
should have no impact on the offset and its stability, just the
reported delay. A test with phc2sys should be able to confirm that.
phc2sys selects the measurement with the shortest delay, which has
least uncertainty. I'd say that applies to both interrupt and polling.

If it is difficult to specify the minimum interrupt delay, I'd still
prefer an overly pessimistic interval assuming a zero delay.

-- 
Miroslav Lichvar


Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

2019-08-20 Thread Miroslav Lichvar
On Tue, Aug 20, 2019 at 02:29:27PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
> > This is important to not break the estimation of maximum error in the
> > measured offset. Applications using the ioctl may assume that the
> > maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> > phc2sys). That would not work if the delay could be occasionally 50
> > microseconds for instance, i.e. the post_ts timestamp would be earlier
> > than the PHC timestamp.
> >
> If the timestamps are taken in the MDIO driver (imx-fec in my case), then
> everything is quite deterministic (see results in the cover letter). Of 
> course,
> it still can be improved slightly, by splitting up the writel into iowmb and
> write_relaxed and disable the interrupts while capturing the timestamps
> (as I did in my original RFC patch). But phc2sys takes by default 5 
> measurements
> and uses the one with the smallest delay, so this shouldn't be necessary.

The delay that phc2sys sees is the difference between post_ts and
pre_ts, which doesn't contain the actual delay, right? So, I'm not
sure how well the phc2sys filtering actually works. Even if the
measured delay was related to the write delay, would be 5 measurements
always enough to get a "correct" sample?

> Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
> the timestamp is aligned with the PHC timestamp, but this also increases
> the delay which is reported by phc2sys (~26us). But the maximum error
> which must be expected is definitely much less (< 1us). So maybe it is better
> to shift both timestamps pre_ts and post_ts by ptp_sts_offset.

If you could guarantee that [pre_ts + ptp_sts_offset, post_ts +
ptp_sts_offset] always contains the PHC timestamps, then that would be
great. From what Andrew is writing, this doesn't seem to be the case.

I'd suggest a different approach:
- specify a minimum delay for the write and a minimum delay for the
  completion (if they are not equal)
- take a second "post" system timestamp after the completion
- adjust pre_ts and post_ts so that the middle point is equal to what
  you have now, but the interval contains both
  pre_ts + min_write_delay and post2_ts - min_completion_delay

This way you get the best stability and also a delay that correctly
estimates the maximum error.

-- 
Miroslav Lichvar


Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

2019-08-20 Thread Miroslav Lichvar
On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:

> + /* PTP offset compensation:
> +  * After the MDIO access is completed (from the chip perspective), the
> +  * switch chip will snapshot the PHC timestamp. To make sure our system
> +  * timestamp corresponds to the PHC timestamp, we have to add the
> +  * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> +  * takes the average of pre_ts and post_ts to calculate the final
> +  * system timestamp. With this in mind, we have to add ptp_sts_offset
> +  * twice to post_ts, in order to not introduce an constant time offset.
> +  */
> + if (sts)
> + timespec64_add_ns(>post_ts, 2 * bus->ptp_sts_offset);

This correction looks good to me.

Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?

This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.

-- 
Miroslav Lichvar


Re: [RFC] net: dsa: mv88e6xxx: ptp: improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec

2019-08-05 Thread Miroslav Lichvar
On Mon, Aug 05, 2019 at 12:54:49PM +0300, Vladimir Oltean wrote:
> - Along the lines of the above, I wonder how badly would the
> maintainers shout at the proposal of adding a struct
> ptp_system_timestamp pointer and its associated lock in struct device.
> That way at least you don't have to change the API, like you did with
> mdiobus_write_nested_ptp. Relatively speaking, this is the least
> amount of intrusion (although, again, far from beautiful).

That would make sense to me, if there are other drivers that could use
it.

> I also added Miroslav Lichvar, who originally created the
> PTP_SYS_OFFSET_EXTENDED ioctl, maybe he can share some ideas on how it
> is best served.

The idea behind that ioctl was to allow drivers to take the system
timestamps as close to the reading of the HW clock as possible,
excluding delays (and jitter) due to other operations that are
necessary to get that timestamp. In the ethernet drivers that was a
single PCI read. If in this case it's a "write" operation that
triggers the reading of the HW clock, then I think the patch is
using is the ptp_read_system_*ts() calls correctly.

> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, 
> > int mii_id, int regnum,
> >
> > reinit_completion(>mdio_done);
> >
> > -   /* start a write op */
> > -   writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > -   FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > -   FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > -   fep->hwp + FEC_MII_DATA);
> > +   if (bus->ptp_sts) {
> > +   unsigned long flags = 0;
> > +   local_irq_save(flags);
> > +   __iowmb();
> > +   /* >Take the timestamp *after* the memory barrier */
> > +   ptp_read_system_prets(bus->ptp_sts);
> > +   writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > +   FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > +   FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > +   fep->hwp + FEC_MII_DATA);
> > +   ptp_read_system_postts(bus->ptp_sts);
> > +   local_irq_restore(flags);
> > +   } else {
> > +   /* start a write op */
> > +   writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > +   FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > +   FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > +   fep->hwp + FEC_MII_DATA);
> > +   }

-- 
Miroslav Lichvar


[tip:timers/core] kselftests: timers: freq-step: Update maximum acceptable precision and errors

2019-06-22 Thread tip-bot for Miroslav Lichvar
Commit-ID:  d21e43f2ef32ba3242687dbedb3c4b9a76b3eebc
Gitweb: https://git.kernel.org/tip/d21e43f2ef32ba3242687dbedb3c4b9a76b3eebc
Author: Miroslav Lichvar 
AuthorDate: Tue, 18 Jun 2019 18:06:12 +0200
Committer:  Thomas Gleixner 
CommitDate: Sat, 22 Jun 2019 11:28:53 +0200

kselftests: timers: freq-step: Update maximum acceptable precision and errors

PTI has a significant impact on precision of the MONOTONIC_RAW clock,
which prevents a lot of computers from running the freq-step test.
Increase the maximum acceptable precision for the test to not be skipped
to 500 nanoseconds.

After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier
directly from NTP tick length") the frequency and time errors should be
much smaller. Reduce the maximum acceptable values for the test to pass
to 0.02 ppm and 50 nanoseconds respectively.

Signed-off-by: Miroslav Lichvar 
Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Link: https://lkml.kernel.org/r/20190618160612.21957-1-mlich...@redhat.com

---
 tools/testing/selftests/timers/freq-step.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
index 8cd10662ffba..4b76450d78d1 100644
--- a/tools/testing/selftests/timers/freq-step.c
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -21,9 +21,9 @@
 #define SAMPLE_READINGS 10
 #define MEAN_SAMPLE_INTERVAL 0.1
 #define STEP_INTERVAL 1.0
-#define MAX_PRECISION 100e-9
-#define MAX_FREQ_ERROR 10e-6
-#define MAX_STDDEV 1000e-9
+#define MAX_PRECISION 500e-9
+#define MAX_FREQ_ERROR 0.02e-6
+#define MAX_STDDEV 50e-9
 
 #ifndef ADJ_SETOFFSET
   #define ADJ_SETOFFSET 0x0100


[tip:timers/core] ntp: Limit TAI-UTC offset

2019-06-22 Thread tip-bot for Miroslav Lichvar
Commit-ID:  d897a4ab11dc8a9fda50d2eccc081a96a6385998
Gitweb: https://git.kernel.org/tip/d897a4ab11dc8a9fda50d2eccc081a96a6385998
Author: Miroslav Lichvar 
AuthorDate: Tue, 18 Jun 2019 17:47:13 +0200
Committer:  Thomas Gleixner 
CommitDate: Sat, 22 Jun 2019 11:28:53 +0200

ntp: Limit TAI-UTC offset

Don't allow the TAI-UTC offset of the system clock to be set by adjtimex()
to a value larger than 10 seconds.

This prevents an overflow in the conversion to int, prevents the CLOCK_TAI
clock from getting too far ahead of the CLOCK_REALTIME clock, and it is
still large enough to allow leap seconds to be inserted at the maximum rate
currently supported by the kernel (once per day) for the next ~270 years,
however unlikely it is that someone can survive a catastrophic event which
slowed down the rotation of the Earth so much.

Reported-by: Weikang shi 
Signed-off-by: Miroslav Lichvar 
Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Link: https://lkml.kernel.org/r/20190618154713.20929-1-mlich...@redhat.com

---
 kernel/time/ntp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8de4f789dc1b..65eb796610dc 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -43,6 +43,7 @@ static u64tick_length_base;
 #define MAX_TICKADJ500LL   /* usecs */
 #define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
+#define MAX_TAI_OFFSET 10
 
 /*
  * phase-lock loop variables
@@ -691,7 +692,8 @@ static inline void process_adjtimex_modes(const struct 
__kernel_timex *txc,
time_constant = max(time_constant, 0l);
}
 
-   if (txc->modes & ADJ_TAI && txc->constant >= 0)
+   if (txc->modes & ADJ_TAI &&
+   txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET)
*time_tai = txc->constant;
 
if (txc->modes & ADJ_OFFSET)


[PATCH repost] kselftests: timers: freq-step: Update maximum acceptable precision and errors

2019-06-18 Thread Miroslav Lichvar
PTI has a significant impact on precision of the MONOTONIC_RAW clock,
which prevents a lot of computers from running the freq-step test.
Increase the maximum acceptable precision for the test to not be skipped
to 500 nanoseconds.

After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier
directly from NTP tick length") the frequency and time errors should be
much smaller. Reduce the maximum acceptable values for the test to pass
to 0.02 ppm and 50 nanoseconds respectively.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Signed-off-by: Miroslav Lichvar 
---
 tools/testing/selftests/timers/freq-step.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
index 8cd10662ffba..4b76450d78d1 100644
--- a/tools/testing/selftests/timers/freq-step.c
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -21,9 +21,9 @@
 #define SAMPLE_READINGS 10
 #define MEAN_SAMPLE_INTERVAL 0.1
 #define STEP_INTERVAL 1.0
-#define MAX_PRECISION 100e-9
-#define MAX_FREQ_ERROR 10e-6
-#define MAX_STDDEV 1000e-9
+#define MAX_PRECISION 500e-9
+#define MAX_FREQ_ERROR 0.02e-6
+#define MAX_STDDEV 50e-9
 
 #ifndef ADJ_SETOFFSET
   #define ADJ_SETOFFSET 0x0100
-- 
2.17.2



[PATCH] ntp: Limit TAI-UTC offset

2019-06-18 Thread Miroslav Lichvar
Don't allow the TAI-UTC offset of the system clock to be set by
adjtimex() to a value larger than 10 seconds.

This prevents an overflow in the conversion to int, prevents the
CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock,
and it is still large enough to allow leap seconds to be inserted at the
maximum rate currently supported by the kernel (once per day) for the
next ~270 years, however unlikely it is that someone can survive a
catastrophic event which slowed down the rotation of the Earth so much.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Signed-off-by: Miroslav Lichvar 
---
 kernel/time/ntp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8de4f789dc1b..65eb796610dc 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -43,6 +43,7 @@ static u64tick_length_base;
 #define MAX_TICKADJ500LL   /* usecs */
 #define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
+#define MAX_TAI_OFFSET 10
 
 /*
  * phase-lock loop variables
@@ -691,7 +692,8 @@ static inline void process_adjtimex_modes(const struct 
__kernel_timex *txc,
time_constant = max(time_constant, 0l);
}
 
-   if (txc->modes & ADJ_TAI && txc->constant >= 0)
+   if (txc->modes & ADJ_TAI &&
+   txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET)
*time_tai = txc->constant;
 
if (txc->modes & ADJ_OFFSET)
-- 
2.17.2



Re: [PATCH] time: fix a assignment error in ntp module

2019-06-17 Thread Miroslav Lichvar
On Mon, Jun 17, 2019 at 02:14:57PM +0200, Thomas Gleixner wrote:
> On Mon, 17 Jun 2019, 维康石 wrote:
> > Yes,the  >UINT_MAX value can be passed by
> > syscall adjtimex->do_adjtimex->__do_adjtimex->process_adjtimex_modes by the
> > proper arugments.
> 
> So there is clearly some sanity check missing, but surely not that
> type cast.

As the offset is saved in an int (and returned via adjtimex() in the
tai field), should be the maximum INT_MAX?

We probably also want to avoid overflow in the offset on a leap second
and the CLOCK_TAI clock itself, so maybe it would make sense to
specify a much smaller maximum like 100?

Even 1000 should be good enough for near future. Negative values are
not allowed anyway. If the Earth's rotation changed significantly
(e.g. hitting a very large asteroid), there probably wouldn't be
anyone left to care about TAI. 

-- 
Miroslav Lichvar


[tip:timers/urgent] ntp: Allow TAI-UTC offset to be set to zero

2019-05-09 Thread tip-bot for Miroslav Lichvar
Commit-ID:  fdc6bae940ee9eb869e493990540098b8c0fd6ab
Gitweb: https://git.kernel.org/tip/fdc6bae940ee9eb869e493990540098b8c0fd6ab
Author: Miroslav Lichvar 
AuthorDate: Wed, 17 Apr 2019 10:48:33 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 9 May 2019 10:46:58 +0200

ntp: Allow TAI-UTC offset to be set to zero

The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
It is typically set by NTP/PTP implementations and it is automatically
updated by the kernel on leap seconds. The initial value is zero (which
applications may interpret as unknown), but this value cannot be set by
adjtimex. This limitation seems to go back to the original "nanokernel"
implementation by David Mills.

Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
order to allow setting it back to the initial value.

Fixes: 153b5d054ac2 ("ntp: support for TAI")
Suggested-by: Ondrej Mosnacek 
Signed-off-by: Miroslav Lichvar 
Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Link: https://lkml.kernel.org/r/20190417084833.7401-1-mlich...@redhat.com

---
 kernel/time/ntp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 92a90014a925..f43d47c8c3b6 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -690,7 +690,7 @@ static inline void process_adjtimex_modes(const struct 
__kernel_timex *txc,
time_constant = max(time_constant, 0l);
}
 
-   if (txc->modes & ADJ_TAI && txc->constant > 0)
+   if (txc->modes & ADJ_TAI && txc->constant >= 0)
*time_tai = txc->constant;
 
if (txc->modes & ADJ_OFFSET)


Re: [PATCH] ntp: Allow TAI-UTC offset to be set to zero

2019-04-18 Thread Miroslav Lichvar
On Wed, Apr 17, 2019 at 11:00:23AM +0200, Ondrej Mosnacek wrote:
> On Wed, Apr 17, 2019 at 10:48 AM Miroslav Lichvar  wrote:
> > Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
> > order to allow setting it back to the initial value.

> Thanks for sending the patch! Maybe you (or the committer) could
> consider adding:
> 
> Fixes: 153b5d054ac2 ("ntp: support for TAI")

To me the change looks more like an extension of the API, rather than
a bug fix, so I'd leave that up to the committer.

-- 
Miroslav Lichvar


[PATCH] ntp: Allow TAI-UTC offset to be set to zero

2019-04-17 Thread Miroslav Lichvar
The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
It is typically set by NTP/PTP implementations and it is automatically
updated by the kernel on leap seconds. The initial value is zero (which
applications may interpret as unknown), but this value cannot be set by
adjtimex. This limitation seems to go back to the original "nanokernel"
implementation by David Mills.

Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
order to allow setting it back to the initial value.

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Suggested-by: Ondrej Mosnacek 
Signed-off-by: Miroslav Lichvar 
---
 kernel/time/ntp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 92a90014a925..f43d47c8c3b6 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -690,7 +690,7 @@ static inline void process_adjtimex_modes(const struct 
__kernel_timex *txc,
time_constant = max(time_constant, 0l);
}
 
-   if (txc->modes & ADJ_TAI && txc->constant > 0)
+   if (txc->modes & ADJ_TAI && txc->constant >= 0)
*time_tai = txc->constant;
 
if (txc->modes & ADJ_OFFSET)
-- 
2.17.2



Re: kernel/time/ntp.c: Possible off-by-one error in TAI range check?

2019-04-15 Thread Miroslav Lichvar
On Mon, Apr 15, 2019 at 10:09:43AM +0200, Thomas Gleixner wrote:
> > On Mon, Apr 8, 2019 at 10:47 AM Ondrej Mosnacek  wrote:
> > > Commit 153b5d054ac2 ("ntp: support for TAI") added a possibility to
> > > change the TAI offset from userspace via adjtimex(2). The code checks
> > > if the input value (txc->constant) is greater than 0 and if it is not,
> > > then it doesn't modify the value. Ignoring the fact that this check
> > > should probably be in timekeeping_validate_timex() and cause -EINVAL
> > > to be returned when false, I find it strange that the check doesn't
> > > allow to set the value to 0, which seems to be the default value...
> > >
> > > Was this behavior intended or should the code actually check for
> > > txc->constant >= 0 instead of txc->constant > 0?

I guess zero here means "unknown" and maybe the intention was to not
allow setting the offset to an unknown value once it has been set to a
valid value. The trouble is that after inserting a leap second the
offset may change from zero to one.

I think it should be changed to allow setting the offset to zero.

-- 
Miroslav Lichvar


Re: [PATCH] timekeeping: Force upper bound for setting CLOCK_REALTIME

2019-03-26 Thread Miroslav Lichvar
On Sat, Mar 23, 2019 at 11:36:19AM +0100, Thomas Gleixner wrote:
> It is reasonable to force an upper bound for the various methods of setting
> CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum
> uptime of 30 years which is plenty enough even for esoteric embedded
> systems. That results in an upper bound of year 2232 for setting the time.

The patch looks good to me.

I like this approach better than using a larger value closer to the
overflow (e.g. one week) and stepping the clock back automatically
when the clock reaches that time, but I suspect it might possibly
break more tests (or any unusual applications messing with time) as a
much larger interval is now EINVAL.

Thanks,

-- 
Miroslav Lichvar


Re: [RFC PATCH] ntp: Avoid undefined behaviour in second_overflow()

2019-03-06 Thread Miroslav Lichvar
On Tue, Mar 05, 2019 at 05:42:25PM -0800, John Stultz wrote:
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -677,6 +677,8 @@ static inline void process_adjtimex_modes(const struct 
> > timex *txc, s32 *time_tai
> >
> > if (txc->modes & ADJ_MAXERROR)
> > time_maxerror = txc->maxerror;
> > +   if (time_maxerror > NTP_PHASE_LIMIT)
> > +   time_maxerror = NTP_PHASE_LIMIT;
> 
> This looks sane to me.
> Acked-by: John Stultz 
> 
> Though it makes me wonder a bit more about the sanity checking on the
> other parameters passed via adjtimex(), tick_usec for instance looks
> like it could be similarly problematic.

The tick length is checked earlier in timekeeping_validate_timex(), so
that should be ok.

What I'd like to see clamped is the system time itself. ktime_t
overflows on Apr 11 2262. clock_settime() and adjtimex(ADJ_SETOFFSET)
can set the time close to the overflow and let everything break.

Boot a VM and try this:

# date -s 'Apr 11 23:47:15 UTC 2262'

There was a patch submitted couple years ago that prevented overflows
in 32-bit time_t and ktime_t.

http://lkml.iu.edu/hypermail/linux/kernel/1510.0/04719.html

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Fri, Nov 02, 2018 at 12:25:56PM +0100, Thomas Gleixner wrote:
> On Fri, 2 Nov 2018, Miroslav Lichvar wrote:
> > clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME
> > 
> > or 
> > 
> > clocksource -> ? -> MONOTONIC_RAW
> >  -> MONOTONIC/REALTIME
> 
> That's what we have now. At least I don't see how the raw thing is coupled
> in NTP.

Oh, right.

So, for the first approach when the raw clock is stepped, the same
step needs to be applied to the mono clock. That means the mono clock
will have two large independent corrections applied to it, which will
make it less stable, e.g. steps of up to 8 ns at 100 Hz instead of 4
ns. Otherwise they would drift from each other when not synchronized
by NTP/PTP.

In the second approach, the corrections of the two clocks are
independent, but the NTP tick length needs to be adjusted before the
multiplier is calculated to match the frequency of the raw clock. If
the correction was applied later, I suspect it would require a +2 mult
adjustment or larger steps as in the first approach.

That tick adjustment generally won't be perfectly accurate, so an
extra small correction would be needed to keep the clocks in sync over
long intervals. This could be a +1 adjustment of the NTP tick length,
or it could be a small forward step. I'm not sure what would be easier
to implement.

Does that make sense?

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Fri, Nov 02, 2018 at 12:25:56PM +0100, Thomas Gleixner wrote:
> On Fri, 2 Nov 2018, Miroslav Lichvar wrote:
> > clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME
> > 
> > or 
> > 
> > clocksource -> ? -> MONOTONIC_RAW
> >  -> MONOTONIC/REALTIME
> 
> That's what we have now. At least I don't see how the raw thing is coupled
> in NTP.

Oh, right.

So, for the first approach when the raw clock is stepped, the same
step needs to be applied to the mono clock. That means the mono clock
will have two large independent corrections applied to it, which will
make it less stable, e.g. steps of up to 8 ns at 100 Hz instead of 4
ns. Otherwise they would drift from each other when not synchronized
by NTP/PTP.

In the second approach, the corrections of the two clocks are
independent, but the NTP tick length needs to be adjusted before the
multiplier is calculated to match the frequency of the raw clock. If
the correction was applied later, I suspect it would require a +2 mult
adjustment or larger steps as in the first approach.

That tick adjustment generally won't be perfectly accurate, so an
extra small correction would be needed to keep the clocks in sync over
long intervals. This could be a +1 adjustment of the NTP tick length,
or it could be a small forward step. I'm not sure what would be easier
to implement.

Does that make sense?

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote:
> On Thu, 1 Nov 2018, John Stultz wrote:
> > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner  wrote:
> > > On Tue, 23 Oct 2018, John Stultz wrote:
> > >> However, to be correct, the ntp adjustments made would have to be made
> > >> to both the base interval + error, which mucks the math up a fair bit.
> > >
> > > Hmm, confused as usual. Why would you need to do anything like that?
> > 
> > Because the NTP adjustment is done off of what is now the raw clock.
> > If the raw clock is "corrected" the ppb adjustment has to be done off
> > of that corrected rate.
> 
> Sure, but why would that require any change? Right now the raw clock is
> slightly off and you correct clock monotonic against NTP. So with that
> extra correction you just see a slightly different raw clock slew and work
> from there.

It makes sense to me.

I think there are basically two different ways how it could be done.
One is to correct the frequency of the raw clock, on which sits the
mono/real clock. The other is to create a new raw clock which is
separate from the mono/real clock, and add an offset to the NTP
frequency to match the frequencies of the two clocks when not
synchronized by NTP/PTP. The latter would provide a more stable
mono/real clock.

clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME

or 

clocksource -> ? -> MONOTONIC_RAW
 -> MONOTONIC/REALTIME

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Thu, Nov 01, 2018 at 07:03:37PM +0100, Thomas Gleixner wrote:
> On Thu, 1 Nov 2018, John Stultz wrote:
> > On Thu, Nov 1, 2018 at 10:44 AM, Thomas Gleixner  wrote:
> > > On Tue, 23 Oct 2018, John Stultz wrote:
> > >> However, to be correct, the ntp adjustments made would have to be made
> > >> to both the base interval + error, which mucks the math up a fair bit.
> > >
> > > Hmm, confused as usual. Why would you need to do anything like that?
> > 
> > Because the NTP adjustment is done off of what is now the raw clock.
> > If the raw clock is "corrected" the ppb adjustment has to be done off
> > of that corrected rate.
> 
> Sure, but why would that require any change? Right now the raw clock is
> slightly off and you correct clock monotonic against NTP. So with that
> extra correction you just see a slightly different raw clock slew and work
> from there.

It makes sense to me.

I think there are basically two different ways how it could be done.
One is to correct the frequency of the raw clock, on which sits the
mono/real clock. The other is to create a new raw clock which is
separate from the mono/real clock, and add an offset to the NTP
frequency to match the frequencies of the two clocks when not
synchronized by NTP/PTP. The latter would provide a more stable
mono/real clock.

clocksource -> MONOTONIC_RAW -> MONOTONIC/REALTIME

or 

clocksource -> ? -> MONOTONIC_RAW
 -> MONOTONIC/REALTIME

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote:
> On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> > old days when the clock was updated 1000 times per second that would
> > be hidden in the resolution of the clock, but now with tickless
> > kernels those steps would be very noticeable.

> That only happens when the system was completely idle for a second and in
> that case it's a non issue because the clock is updated before it's
> used. So nothing will be able to observe the time jumping forward by a few
> or even a few hundreds of nanoseconds.

That's great news (to me). I think we should do the same with the
mono/real clock. A periodic 4ns step would be better than a slew
correcting tens or hundreds of nanoseconds. This would be a
significant improvement in accuracy on idle systems, in theory
identical to running with nohz=off.

Maybe I am missing some important detail, but I think we can just drop
the +1 mult adjustment and step on each update by the (truncated)
amount that has accumulated in the NTP error register. With the
changes that have been made earlier this year the clock should never
be ahead, so the step would always be forward.

> For the regular case, where CPUs are
> busy and the update happens 100/250/1000 times per second the jump forward
> will not be noticable at all.

I think a 4ns jump at 100 Hz might be noticeable with a good reference
clock and large number of measurements, but so would be the current +1
mult adjustment.

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-11-02 Thread Miroslav Lichvar
On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote:
> On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> > old days when the clock was updated 1000 times per second that would
> > be hidden in the resolution of the clock, but now with tickless
> > kernels those steps would be very noticeable.

> That only happens when the system was completely idle for a second and in
> that case it's a non issue because the clock is updated before it's
> used. So nothing will be able to observe the time jumping forward by a few
> or even a few hundreds of nanoseconds.

That's great news (to me). I think we should do the same with the
mono/real clock. A periodic 4ns step would be better than a slew
correcting tens or hundreds of nanoseconds. This would be a
significant improvement in accuracy on idle systems, in theory
identical to running with nohz=off.

Maybe I am missing some important detail, but I think we can just drop
the +1 mult adjustment and step on each update by the (truncated)
amount that has accumulated in the NTP error register. With the
changes that have been made earlier this year the clock should never
be ahead, so the step would always be forward.

> For the regular case, where CPUs are
> busy and the update happens 100/250/1000 times per second the jump forward
> will not be noticable at all.

I think a 4ns jump at 100 Hz might be noticeable with a good reference
clock and large number of measurements, but so would be the current +1
mult adjustment.

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-10-25 Thread Miroslav Lichvar
On Wed, Oct 24, 2018 at 01:32:48PM -0400, Christopher Hall wrote:
> On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> 
> For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
> largest I've seen since measuring this on a few platforms. 20-40 PPB seems
> more typical.

It looks like there is rounding in the calculation of the multiplier,
so for frequencies in the range of 3-5GHz the error will be up to
about 149 ppb. Without rounding it would be twice as much, but always
fast or slow.

> > A better fix might be to modify the calculation of time to use a
> > second multiplier, effectively increasing its resolution. However,
> 
> I'm not sure, I'm understanding. Like cascading transforms? While this
> would increase the precision, I think it would still drift over days. We
> could probably fix up every second though.

I was thinking about using a wider multiplier, but avoiding a full
64x64 bit multiplication.

For example, in your case with the 3312MHz clock

nsec = (cycles * 5065585) >> 24

could be replaced with

nsec = (cycles * 5065584) >> 24 + (cycles * 4538763) >> 47

This would take few hours to drift by one nanosecond. If something
like this was implemented for the raw clock, it would probably make
sense to switch the other clocks too.

> > that would slow down all users of the clock.
> 
> Couldn't some clocksources specify an additional multiplier/precision and
> others use lower precision?

I guess it could, but I'm not sure if people here would be happy with
the extra complexity.

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-10-25 Thread Miroslav Lichvar
On Wed, Oct 24, 2018 at 01:32:48PM -0400, Christopher Hall wrote:
> On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> 
> For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
> largest I've seen since measuring this on a few platforms. 20-40 PPB seems
> more typical.

It looks like there is rounding in the calculation of the multiplier,
so for frequencies in the range of 3-5GHz the error will be up to
about 149 ppb. Without rounding it would be twice as much, but always
fast or slow.

> > A better fix might be to modify the calculation of time to use a
> > second multiplier, effectively increasing its resolution. However,
> 
> I'm not sure, I'm understanding. Like cascading transforms? While this
> would increase the precision, I think it would still drift over days. We
> could probably fix up every second though.

I was thinking about using a wider multiplier, but avoiding a full
64x64 bit multiplication.

For example, in your case with the 3312MHz clock

nsec = (cycles * 5065585) >> 24

could be replaced with

nsec = (cycles * 5065584) >> 24 + (cycles * 4538763) >> 47

This would take few hours to drift by one nanosecond. If something
like this was implemented for the raw clock, it would probably make
sense to switch the other clocks too.

> > that would slow down all users of the clock.
> 
> Couldn't some clocksources specify an additional multiplier/precision and
> others use lower precision?

I guess it could, but I'm not sure if people here would be happy with
the extra complexity.

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-10-24 Thread Miroslav Lichvar
On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz  wrote:
> > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner  wrote:
> >> On Fri, 19 Oct 2018, John Stultz wrote:
> >>> We might be able to reduce the degree in this case, but I worry the
> >>> extra complexity may only cause problems for others.
> >>
> >> Is it really that complex to add a fixed correction value periodically?

The error is too large to be corrected by stepping on clock updates.
For a typical TSC frequency we have multiplier in the range of few
millions, so that's a frequency error of up to few hundred ppb. In the
old days when the clock was updated 1000 times per second that would
be hidden in the resolution of the clock, but now with tickless
kernels those steps would be very noticeable.

If the multiplier was adjusted in the same way as the non-raw clock,
there wouldn't be any steps in time, but there would be steps in
frequency and the time error would be proportional to the update
interval. For a clock updated once per second that's an error of up to
a few hundreds of nanoseconds.

I agree with John. I think the raw monotonic clock should be stable.

It would help if we better understood the use case. If I needed a
clock that ticks in an exact proportion to the tsc, why wouldn't I use
the tsc directly? Is this about having a fall back in case the tsc
cannot be used (e.g. due to unsynchronized CPUs)?

If the frequency error was exported, it could be compensated where
necessary. Maybe that would work for the original poster?

A better fix might be to modify the calculation of time to use a
second multiplier, effectively increasing its resolution. However,
that would slow down all users of the clock.

-- 
Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-10-24 Thread Miroslav Lichvar
On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> On Fri, Oct 19, 2018 at 3:36 PM, John Stultz  wrote:
> > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner  wrote:
> >> On Fri, 19 Oct 2018, John Stultz wrote:
> >>> We might be able to reduce the degree in this case, but I worry the
> >>> extra complexity may only cause problems for others.
> >>
> >> Is it really that complex to add a fixed correction value periodically?

The error is too large to be corrected by stepping on clock updates.
For a typical TSC frequency we have multiplier in the range of few
millions, so that's a frequency error of up to few hundred ppb. In the
old days when the clock was updated 1000 times per second that would
be hidden in the resolution of the clock, but now with tickless
kernels those steps would be very noticeable.

If the multiplier was adjusted in the same way as the non-raw clock,
there wouldn't be any steps in time, but there would be steps in
frequency and the time error would be proportional to the update
interval. For a clock updated once per second that's an error of up to
a few hundreds of nanoseconds.

I agree with John. I think the raw monotonic clock should be stable.

It would help if we better understood the use case. If I needed a
clock that ticks in an exact proportion to the tsc, why wouldn't I use
the tsc directly? Is this about having a fall back in case the tsc
cannot be used (e.g. due to unsynchronized CPUs)?

If the frequency error was exported, it could be compensated where
necessary. Maybe that would work for the original poster?

A better fix might be to modify the calculation of time to use a
second multiplier, effectively increasing its resolution. However,
that would slow down all users of the clock.

-- 
Miroslav Lichvar


Re: [RFC][PATCH v2] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress

2018-07-09 Thread Miroslav Lichvar
On Thu, Jul 05, 2018 at 10:50:31AM -0700, John Stultz wrote:
> In the past we've warned when ADJ_OFFSET was in progress, usually
> caused by ntpd or some other time adjusting daemon running in non
> steady sate, which can cause the skew calculations to be incorrect
> 
> Thus, this patch sets a flag which we check if the clock was being
> adjusted when we fail so that we don't cause false negatives.

The commit message might need an update as the code no longer sets a
flag. Other than that, it looks good to me.

Thanks,

> --- a/tools/testing/selftests/timers/raw_skew.c
> +++ b/tools/testing/selftests/timers/raw_skew.c
> @@ -134,6 +134,11 @@ int main(int argv, char **argc)
>   printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
>  
>   if (llabs(eppm - ppm) > 1000) {
> + if (tx1.offset || tx2.offset ||
> + tx1.freq != tx2.freq || tx1.tick != tx2.tick) {
> + printf("[SKIP]\n");
> + return ksft_exit_skip("The clock was adjusted 
> externally. Shutdown NTPd or other time sync daemons\n");
> + }
>   printf("[FAILED]\n");
>   return ksft_exit_fail();
>   }
> -- 
> 2.7.4
> 

-- 
Miroslav Lichvar


Re: [RFC][PATCH v2] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress

2018-07-09 Thread Miroslav Lichvar
On Thu, Jul 05, 2018 at 10:50:31AM -0700, John Stultz wrote:
> In the past we've warned when ADJ_OFFSET was in progress, usually
> caused by ntpd or some other time adjusting daemon running in non
> steady sate, which can cause the skew calculations to be incorrect
> 
> Thus, this patch sets a flag which we check if the clock was being
> adjusted when we fail so that we don't cause false negatives.

The commit message might need an update as the code no longer sets a
flag. Other than that, it looks good to me.

Thanks,

> --- a/tools/testing/selftests/timers/raw_skew.c
> +++ b/tools/testing/selftests/timers/raw_skew.c
> @@ -134,6 +134,11 @@ int main(int argv, char **argc)
>   printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
>  
>   if (llabs(eppm - ppm) > 1000) {
> + if (tx1.offset || tx2.offset ||
> + tx1.freq != tx2.freq || tx1.tick != tx2.tick) {
> + printf("[SKIP]\n");
> + return ksft_exit_skip("The clock was adjusted 
> externally. Shutdown NTPd or other time sync daemons\n");
> + }
>   printf("[FAILED]\n");
>   return ksft_exit_fail();
>   }
> -- 
> 2.7.4
> 

-- 
Miroslav Lichvar


Re: [RFC][PATCH] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET is in progress

2018-07-04 Thread Miroslav Lichvar
On Tue, Jul 03, 2018 at 12:36:27PM -0700, John Stultz wrote:
> In the past we've warned when ADJ_OFFSET was in progress, usually
> caused by ntpd or some other time adjusting daemon running in non
> steady sate, which can cause the skew calculations to be incorrect

Good idea.

> Thus, this patch sets a flag which we check when we fail so that
> we don't cause false negatives.

I'd suggest to check few more things to better detect when something
is adjusting the clock. The offset should be zero also at the end of
the test and the frequency and tick should be the same as at the
beginning of the test. This will detect daemons that set the frequency
of the clock directly, not using adjtime() or the PLL.

Something like this:

@@ -137,9 +135,10 @@ int main(int argv, char **argc)
printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
 
if (llabs(eppm - ppm) > 1000) {
-   if (adj_offset_bad) {
+   if (tx1.offset || tx2.offset ||
+   tx1.freq != tx2.freq || tx1.tick != tx2.tick) {
printf("[SKIP]\n");
-   return ksft_exit_skip("ADJ_OFFSET in progress. Shutdown 
NTPd or other time steering daemons\n");
+   return ksft_exit_skip("The clock was adjusted. Shutdown 
ntpd or other time steering daemons\n");
}
printf("    [FAILED]\n");
return ksft_exit_fail();


-- 
Miroslav Lichvar


Re: [RFC][PATCH] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET is in progress

2018-07-04 Thread Miroslav Lichvar
On Tue, Jul 03, 2018 at 12:36:27PM -0700, John Stultz wrote:
> In the past we've warned when ADJ_OFFSET was in progress, usually
> caused by ntpd or some other time adjusting daemon running in non
> steady sate, which can cause the skew calculations to be incorrect

Good idea.

> Thus, this patch sets a flag which we check when we fail so that
> we don't cause false negatives.

I'd suggest to check few more things to better detect when something
is adjusting the clock. The offset should be zero also at the end of
the test and the frequency and tick should be the same as at the
beginning of the test. This will detect daemons that set the frequency
of the clock directly, not using adjtime() or the PLL.

Something like this:

@@ -137,9 +135,10 @@ int main(int argv, char **argc)
printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
 
if (llabs(eppm - ppm) > 1000) {
-   if (adj_offset_bad) {
+   if (tx1.offset || tx2.offset ||
+   tx1.freq != tx2.freq || tx1.tick != tx2.tick) {
printf("[SKIP]\n");
-   return ksft_exit_skip("ADJ_OFFSET in progress. Shutdown 
NTPd or other time steering daemons\n");
+   return ksft_exit_skip("The clock was adjusted. Shutdown 
ntpd or other time steering daemons\n");
}
printf("    [FAILED]\n");
return ksft_exit_fail();


-- 
Miroslav Lichvar


[PATCHv2] timekeeping: Update multiplier when NTP frequency is set directly

2018-06-04 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

The update is limited to archs using modern timekeeping
(!ARCH_USES_GETTIMEOFFSET).

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---

Notes:
RFC->v1:
- added a new parameter to force the update of the timekeeper to the current
  NTP tick length only from adjtimex()
- added timekeeping_advance() to keep the parameter local to timekeeping.c
v1->v2:
- replaced bool parameter with enum
- changed timekeeping_advance() to not make any updates for TK_ADV_FREQ with
  !ARCH_USES_GETTIMEOFFSET (an alternative might be to update with zero 
offset)

 kernel/time/timekeeping.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..a08ef3771ab5 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -34,6 +34,14 @@
 #define TK_MIRROR  (1 << 1)
 #define TK_CLOCK_WAS_SET   (1 << 2)
 
+enum timekeeping_adv_mode {
+   /* Update timekeeper when a tick has passed */
+   TK_ADV_TICK,
+
+   /* Update timekeeper on a direct frequency change */
+   TK_ADV_FREQ
+};
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
@@ -2021,11 +2029,11 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
return offset;
 }
 
-/**
- * update_wall_time - Uses the current clocksource to increment the wall time
- *
+/*
+ * timekeeping_advance - Updates the timekeeper to the current time and
+ * current NTP tick length
  */
-void update_wall_time(void)
+static void timekeeping_advance(enum timekeeping_adv_mode mode)
 {
struct timekeeper *real_tk = _core.timekeeper;
struct timekeeper *tk = _timekeeper;
@@ -2042,14 +2050,17 @@ void update_wall_time(void)
 
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
offset = real_tk->cycle_interval;
+
+   if (mode != TK_ADV_TICK)
+   goto out;
 #else
offset = clocksource_delta(tk_clock_read(>tkr_mono),
   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
-#endif
 
/* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
+   if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
goto out;
+#endif
 
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
@@ -2105,6 +2116,15 @@ void update_wall_time(void)
clock_was_set_delayed();
 }
 
+/**
+ * update_wall_time - Uses the current clocksource to increment the wall time
+ *
+ */
+void update_wall_time(void)
+{
+   timekeeping_advance(TK_ADV_TICK);
+}
+
 /**
  * getboottime64 - Return the real time of system boot.
  * @ts:pointer to the timespec64 to be set
@@ -2332,6 +2352,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   timekeeping_advance(TK_ADV_FREQ);
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



[PATCHv2] timekeeping: Update multiplier when NTP frequency is set directly

2018-06-04 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

The update is limited to archs using modern timekeeping
(!ARCH_USES_GETTIMEOFFSET).

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---

Notes:
RFC->v1:
- added a new parameter to force the update of the timekeeper to the current
  NTP tick length only from adjtimex()
- added timekeeping_advance() to keep the parameter local to timekeeping.c
v1->v2:
- replaced bool parameter with enum
- changed timekeeping_advance() to not make any updates for TK_ADV_FREQ with
  !ARCH_USES_GETTIMEOFFSET (an alternative might be to update with zero 
offset)

 kernel/time/timekeeping.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..a08ef3771ab5 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -34,6 +34,14 @@
 #define TK_MIRROR  (1 << 1)
 #define TK_CLOCK_WAS_SET   (1 << 2)
 
+enum timekeeping_adv_mode {
+   /* Update timekeeper when a tick has passed */
+   TK_ADV_TICK,
+
+   /* Update timekeeper on a direct frequency change */
+   TK_ADV_FREQ
+};
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
@@ -2021,11 +2029,11 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
return offset;
 }
 
-/**
- * update_wall_time - Uses the current clocksource to increment the wall time
- *
+/*
+ * timekeeping_advance - Updates the timekeeper to the current time and
+ * current NTP tick length
  */
-void update_wall_time(void)
+static void timekeeping_advance(enum timekeeping_adv_mode mode)
 {
struct timekeeper *real_tk = _core.timekeeper;
struct timekeeper *tk = _timekeeper;
@@ -2042,14 +2050,17 @@ void update_wall_time(void)
 
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
offset = real_tk->cycle_interval;
+
+   if (mode != TK_ADV_TICK)
+   goto out;
 #else
offset = clocksource_delta(tk_clock_read(>tkr_mono),
   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
-#endif
 
/* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
+   if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
goto out;
+#endif
 
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
@@ -2105,6 +2116,15 @@ void update_wall_time(void)
clock_was_set_delayed();
 }
 
+/**
+ * update_wall_time - Uses the current clocksource to increment the wall time
+ *
+ */
+void update_wall_time(void)
+{
+   timekeeping_advance(TK_ADV_TICK);
+}
+
 /**
  * getboottime64 - Return the real time of system boot.
  * @ts:pointer to the timespec64 to be set
@@ -2332,6 +2352,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   timekeeping_advance(TK_ADV_FREQ);
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



Re: [PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly

2018-06-04 Thread Miroslav Lichvar
On Tue, May 29, 2018 at 10:42:05AM -0700, John Stultz wrote:
> On Tue, May 29, 2018 at 3:53 AM, Miroslav Lichvar  wrote:
> > -void update_wall_time(void)
> > +static void timekeeping_advance(bool force_update)
> 
> This is kind of a nit, but mind switching out bool for an enum?  Using
> something like TK_ADV_NORMAL and TK_ADV_FORCE?
> 
> > +void update_wall_time(void)
> > +{
> > +   timekeeping_advance(false);
> > +}
> 
> The enum makes usage like timekeeping_advance(false) a little less
> opaque to the reader ("Wait, don't advance? Let me go look at the
> function").
> 
> We got bitten by this earlier when we had the old
> "timekeeping_update(tk, true, false, true)" usage.

Ok. That make sense. I'll send a v2.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly

2018-06-04 Thread Miroslav Lichvar
On Tue, May 29, 2018 at 10:42:05AM -0700, John Stultz wrote:
> On Tue, May 29, 2018 at 3:53 AM, Miroslav Lichvar  wrote:
> > -void update_wall_time(void)
> > +static void timekeeping_advance(bool force_update)
> 
> This is kind of a nit, but mind switching out bool for an enum?  Using
> something like TK_ADV_NORMAL and TK_ADV_FORCE?
> 
> > +void update_wall_time(void)
> > +{
> > +   timekeeping_advance(false);
> > +}
> 
> The enum makes usage like timekeeping_advance(false) a little less
> opaque to the reader ("Wait, don't advance? Let me go look at the
> function").
> 
> We got bitten by this earlier when we had the old
> "timekeeping_update(tk, true, false, true)" usage.

Ok. That make sense. I'll send a v2.

Thanks,

-- 
Miroslav Lichvar


[PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-29 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---

Notes:
RFC->v1:
- added a new parameter to force the update of the timekeeper to the current
  NTP tick length only from adjtimex()
- added timekeeping_advance() to keep the parameter local to timekeeping.c

 kernel/time/timekeeping.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..5524c07d43e3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2021,11 +2021,11 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
return offset;
 }
 
-/**
- * update_wall_time - Uses the current clocksource to increment the wall time
- *
+/*
+ * timekeeping_advance - Updates the timekeeper to the current time and
+ * current NTP tick length
  */
-void update_wall_time(void)
+static void timekeeping_advance(bool force_update)
 {
struct timekeeper *real_tk = _core.timekeeper;
struct timekeeper *tk = _timekeeper;
@@ -2048,7 +2048,7 @@ void update_wall_time(void)
 #endif
 
/* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
+   if (offset < real_tk->cycle_interval && !force_update)
goto out;
 
/* Do some additional sanity checking */
@@ -2105,6 +2105,15 @@ void update_wall_time(void)
clock_was_set_delayed();
 }
 
+/**
+ * update_wall_time - Uses the current clocksource to increment the wall time
+ *
+ */
+void update_wall_time(void)
+{
+   timekeeping_advance(false);
+}
+
 /**
  * getboottime64 - Return the real time of system boot.
  * @ts:pointer to the timespec64 to be set
@@ -2332,6 +2341,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   timekeeping_advance(true);
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



[PATCHv1] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-29 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---

Notes:
RFC->v1:
- added a new parameter to force the update of the timekeeper to the current
  NTP tick length only from adjtimex()
- added timekeeping_advance() to keep the parameter local to timekeeping.c

 kernel/time/timekeeping.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..5524c07d43e3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2021,11 +2021,11 @@ static u64 logarithmic_accumulation(struct timekeeper 
*tk, u64 offset,
return offset;
 }
 
-/**
- * update_wall_time - Uses the current clocksource to increment the wall time
- *
+/*
+ * timekeeping_advance - Updates the timekeeper to the current time and
+ * current NTP tick length
  */
-void update_wall_time(void)
+static void timekeeping_advance(bool force_update)
 {
struct timekeeper *real_tk = _core.timekeeper;
struct timekeeper *tk = _timekeeper;
@@ -2048,7 +2048,7 @@ void update_wall_time(void)
 #endif
 
/* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
+   if (offset < real_tk->cycle_interval && !force_update)
goto out;
 
/* Do some additional sanity checking */
@@ -2105,6 +2105,15 @@ void update_wall_time(void)
clock_was_set_delayed();
 }
 
+/**
+ * update_wall_time - Uses the current clocksource to increment the wall time
+ *
+ */
+void update_wall_time(void)
+{
+   timekeeping_advance(false);
+}
+
 /**
  * getboottime64 - Return the real time of system boot.
  * @ts:pointer to the timespec64 to be set
@@ -2332,6 +2341,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   timekeeping_advance(true);
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



Re: [PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-24 Thread Miroslav Lichvar
On Wed, May 23, 2018 at 11:05:34AM -0700, John Stultz wrote:
> On Wed, May 23, 2018 at 4:33 AM, Miroslav Lichvar <mlich...@redhat.com> wrote:
> > This removes a hidden non-deterministic delay in setting of the
> > frequency and allows an extremely tight control of the system clock
> > with update rates close to or even exceeding the kernel HZ.

> I feel like we tried this years back, but had to revert it. But its
> been awhile. Am I confusing things?

IIRC we only talked about doing this. Before the recent changes in
timekeeping, namely c2cda2a5 (Don't align NTP frequency adjustments to
ticks), it wouldn't make much sense. There would still be a hidden
delay, it would just be negative (from the applications point of
view).

> > -   /* Check if there's really nothing to do */
> > -   if (offset < real_tk->cycle_interval)
> > -   goto out;
> > -
> 
> Apologies again, as I don't have a lot of context here these days, but
>  this could mean we end up doing unnecessary work on every
> update_wall_time, no?

I'm not sure. If I understand it correctly, update_wall_time() is
normally not called more than once per tick. Only when an update is
late and it happens to include the next tick, the subsequent call
might be unnecessary, right?

> Would a "force" flag be better to pass to update_wall_time() to only
> avoid the short-cut in the non-adjtimex case?

Yes, that makes sense.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-24 Thread Miroslav Lichvar
On Wed, May 23, 2018 at 11:05:34AM -0700, John Stultz wrote:
> On Wed, May 23, 2018 at 4:33 AM, Miroslav Lichvar  wrote:
> > This removes a hidden non-deterministic delay in setting of the
> > frequency and allows an extremely tight control of the system clock
> > with update rates close to or even exceeding the kernel HZ.

> I feel like we tried this years back, but had to revert it. But its
> been awhile. Am I confusing things?

IIRC we only talked about doing this. Before the recent changes in
timekeeping, namely c2cda2a5 (Don't align NTP frequency adjustments to
ticks), it wouldn't make much sense. There would still be a hidden
delay, it would just be negative (from the applications point of
view).

> > -   /* Check if there's really nothing to do */
> > -   if (offset < real_tk->cycle_interval)
> > -   goto out;
> > -
> 
> Apologies again, as I don't have a lot of context here these days, but
>  this could mean we end up doing unnecessary work on every
> update_wall_time, no?

I'm not sure. If I understand it correctly, update_wall_time() is
normally not called more than once per tick. Only when an update is
late and it happens to include the next tick, the subsequent call
might be unnecessary, right?

> Would a "force" flag be better to pass to update_wall_time() to only
> avoid the short-cut in the non-adjtimex case?

Yes, that makes sense.

Thanks,

-- 
Miroslav Lichvar


[PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-23 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

Cc: Thomas Gleixner <t...@linutronix.de>
Cc: John Stultz <john.stu...@linaro.org>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 kernel/time/timekeeping.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..6922dae7317c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2047,10 +2047,6 @@ void update_wall_time(void)
   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 
-   /* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
-   goto out;
-
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
 
@@ -2332,6 +2328,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   update_wall_time();
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



[PATCH RFC] timekeeping: Update multiplier when NTP frequency is set directly

2018-05-23 Thread Miroslav Lichvar
When the NTP frequency is set directly from userspace using the
ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
timekeeper's multiplier instead of waiting for the next tick.

This removes a hidden non-deterministic delay in setting of the
frequency and allows an extremely tight control of the system clock
with update rates close to or even exceeding the kernel HZ.

Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---
 kernel/time/timekeeping.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..6922dae7317c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2047,10 +2047,6 @@ void update_wall_time(void)
   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 
-   /* Check if there's really nothing to do */
-   if (offset < real_tk->cycle_interval)
-   goto out;
-
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
 
@@ -2332,6 +2328,10 @@ int do_adjtimex(struct timex *txc)
write_seqcount_end(_core.seq);
raw_spin_unlock_irqrestore(_lock, flags);
 
+   /* Update the multiplier immediately if frequency was set directly */
+   if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+   update_wall_time();
+
if (tai != orig_tai)
clock_was_set();
 
-- 
2.14.3



Re: possible BUG: selftest about raw_skew failed

2018-04-27 Thread Miroslav Lichvar
On Thu, Apr 26, 2018 at 11:28:29PM +0200, Thomas Gleixner wrote:
> On Sat, 21 Apr 2018, Jeffrin Thalakkottoor wrote:
> > selftests: raw_skew
> > 
> > WARNING: ADJ_OFFSET in progress, this will cause inaccurate results
> > Estimating clock drift: 17.910(est) 16.820(act) [FAILED]

Was ntpd, systemd-timesyncd, or some other NTP/PTP daemon running
shortly before or during the test?

The warning indicates that the clock was slewed by adjtime() or
adjtimex(), which makes the measurement of the frequency less accurate
and the test may fail.

Maybe this test and other tests that measure the frequency of the
system clock should be modified to SKIP when adjtimex() returns a
non-zero offset (or the frequency changes during the test)? 

-- 
Miroslav Lichvar


Re: possible BUG: selftest about raw_skew failed

2018-04-27 Thread Miroslav Lichvar
On Thu, Apr 26, 2018 at 11:28:29PM +0200, Thomas Gleixner wrote:
> On Sat, 21 Apr 2018, Jeffrin Thalakkottoor wrote:
> > selftests: raw_skew
> > 
> > WARNING: ADJ_OFFSET in progress, this will cause inaccurate results
> > Estimating clock drift: 17.910(est) 16.820(act) [FAILED]

Was ntpd, systemd-timesyncd, or some other NTP/PTP daemon running
shortly before or during the test?

The warning indicates that the clock was slewed by adjtime() or
adjtimex(), which makes the measurement of the frequency less accurate
and the test may fail.

Maybe this test and other tests that measure the frequency of the
system clock should be modified to SKIP when adjtimex() returns a
non-zero offset (or the frequency changes during the test)? 

-- 
Miroslav Lichvar


[PATCH] kselftests: timers: freq-step: Update maximum acceptable precision and errors

2018-04-10 Thread Miroslav Lichvar
PTI has a significant impact on precision of the MONOTONIC_RAW clock,
which prevents a lot of computers from running the freq-step test.
Increase the maximum acceptable precision for the test to not be skipped
to 500 nanoseconds.

After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier
directly from NTP tick length") the frequency and time errors should be
much smaller. Reduce the maximum acceptable values for the test to pass
to 0.02 ppm and 50 nanoseconds respectively.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Stephen Boyd <stephen.b...@linaro.org>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 tools/testing/selftests/timers/freq-step.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
index 14a2b77fd012..cd094b49819f 100644
--- a/tools/testing/selftests/timers/freq-step.c
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -29,9 +29,9 @@
 #define SAMPLE_READINGS 10
 #define MEAN_SAMPLE_INTERVAL 0.1
 #define STEP_INTERVAL 1.0
-#define MAX_PRECISION 100e-9
-#define MAX_FREQ_ERROR 10e-6
-#define MAX_STDDEV 1000e-9
+#define MAX_PRECISION 500e-9
+#define MAX_FREQ_ERROR 0.02e-6
+#define MAX_STDDEV 50e-9
 
 #ifndef ADJ_SETOFFSET
   #define ADJ_SETOFFSET 0x0100
-- 
2.14.3



[PATCH] kselftests: timers: freq-step: Update maximum acceptable precision and errors

2018-04-10 Thread Miroslav Lichvar
PTI has a significant impact on precision of the MONOTONIC_RAW clock,
which prevents a lot of computers from running the freq-step test.
Increase the maximum acceptable precision for the test to not be skipped
to 500 nanoseconds.

After commit 78b98e3c5a66 ("timekeeping/ntp: Determine the multiplier
directly from NTP tick length") the frequency and time errors should be
much smaller. Reduce the maximum acceptable values for the test to pass
to 0.02 ppm and 50 nanoseconds respectively.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Signed-off-by: Miroslav Lichvar 
---
 tools/testing/selftests/timers/freq-step.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
index 14a2b77fd012..cd094b49819f 100644
--- a/tools/testing/selftests/timers/freq-step.c
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -29,9 +29,9 @@
 #define SAMPLE_READINGS 10
 #define MEAN_SAMPLE_INTERVAL 0.1
 #define STEP_INTERVAL 1.0
-#define MAX_PRECISION 100e-9
-#define MAX_FREQ_ERROR 10e-6
-#define MAX_STDDEV 1000e-9
+#define MAX_PRECISION 500e-9
+#define MAX_FREQ_ERROR 0.02e-6
+#define MAX_STDDEV 50e-9
 
 #ifndef ADJ_SETOFFSET
   #define ADJ_SETOFFSET 0x0100
-- 
2.14.3



[tip:timers/core] timekeeping/ntp: Determine the multiplier directly from NTP tick length

2018-03-10 Thread tip-bot for Miroslav Lichvar
Commit-ID:  78b98e3c5a66d569a53b8f57b6a698f912794a43
Gitweb: https://git.kernel.org/tip/78b98e3c5a66d569a53b8f57b6a698f912794a43
Author: Miroslav Lichvar <mlich...@redhat.com>
AuthorDate: Fri, 9 Mar 2018 10:42:48 -0800
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Sat, 10 Mar 2018 09:12:41 +0100

timekeeping/ntp: Determine the multiplier directly from NTP tick length

When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP application is correcting the initial offset of the clock, a
large value may accumulate in the NTP error before the multiplier
converges to the correct value. It may then take a very long time (hours
or even days) before the error is corrected. This causes the clock to
have an unstable frequency offset, which has a negative impact on the
stability of synchronization with precise time sources (e.g. NTP/PTP
using hardware timestamping or the PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length and replace the iterative approach. This removes the last
major source of the NTP error. The only remaining source is now limited
resolution of the multiplier, which is corrected by adding 1 to the
multiplier when the system clock is behind the NTP time.

Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
Signed-off-by: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Stephen Boyd <stephen.b...@linaro.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Link: 
http://lkml.kernel.org/r/1520620971-9567-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 138 
 2 files changed, 49 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index d315c3d6725c..7acb953298a7 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -117,6 +117,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c1a0ac17336e..e11760121cb2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1873,85 +1873,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool ne

[tip:timers/core] timekeeping/ntp: Determine the multiplier directly from NTP tick length

2018-03-10 Thread tip-bot for Miroslav Lichvar
Commit-ID:  78b98e3c5a66d569a53b8f57b6a698f912794a43
Gitweb: https://git.kernel.org/tip/78b98e3c5a66d569a53b8f57b6a698f912794a43
Author: Miroslav Lichvar 
AuthorDate: Fri, 9 Mar 2018 10:42:48 -0800
Committer:  Ingo Molnar 
CommitDate: Sat, 10 Mar 2018 09:12:41 +0100

timekeeping/ntp: Determine the multiplier directly from NTP tick length

When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP application is correcting the initial offset of the clock, a
large value may accumulate in the NTP error before the multiplier
converges to the correct value. It may then take a very long time (hours
or even days) before the error is corrected. This causes the clock to
have an unstable frequency offset, which has a negative impact on the
stability of synchronization with precise time sources (e.g. NTP/PTP
using hardware timestamping or the PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length and replace the iterative approach. This removes the last
major source of the NTP error. The only remaining source is now limited
resolution of the multiplier, which is corrected by adding 1 to the
multiplier when the system clock is behind the NTP time.

Signed-off-by: Miroslav Lichvar 
Signed-off-by: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1520620971-9567-3-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 138 
 2 files changed, 49 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index d315c3d6725c..7acb953298a7 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -117,6 +117,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c1a0ac17336e..e11760121cb2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1873,85 +1873,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool negative;
-   u32 adj_scale;
-
-   /* Remove any current error adj from freq calculation */
-   if (tk->ntp_err_mult)
-   xinterval -= tk->cycle_interval;
-
-   tk->ntp_tick = ntp_tick_length();
-
-   /* Calculate current error per tick */
-   tick_error = ntp_tick_length() >> 

[tip:timers/core] timekeeping/ntp: Don't align NTP frequency adjustments to ticks

2018-03-10 Thread tip-bot for Miroslav Lichvar
Commit-ID:  c2cda2a5bda9f1369c9d1ab54a20571c13cf2743
Gitweb: https://git.kernel.org/tip/c2cda2a5bda9f1369c9d1ab54a20571c13cf2743
Author: Miroslav Lichvar <mlich...@redhat.com>
AuthorDate: Fri, 9 Mar 2018 10:42:47 -0800
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Sat, 10 Mar 2018 09:12:41 +0100

timekeeping/ntp: Don't align NTP frequency adjustments to ticks

When the timekeeping multiplier is changed, the NTP error is updated to
correct the clock for the delay between the tick and the update of the
clock. This error is corrected in later updates and the clock appears as
if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
Signed-off-by: John Stultz <john.stu...@linaro.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Stephen Boyd <stephen.b...@linaro.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Link: 
http://lkml.kernel.org/r/1520620971-9567-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd03317e7b57..c1a0ac17336e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1860,8 +1860,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1872,7 +1870,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*


[tip:timers/core] timekeeping/ntp: Don't align NTP frequency adjustments to ticks

2018-03-10 Thread tip-bot for Miroslav Lichvar
Commit-ID:  c2cda2a5bda9f1369c9d1ab54a20571c13cf2743
Gitweb: https://git.kernel.org/tip/c2cda2a5bda9f1369c9d1ab54a20571c13cf2743
Author: Miroslav Lichvar 
AuthorDate: Fri, 9 Mar 2018 10:42:47 -0800
Committer:  Ingo Molnar 
CommitDate: Sat, 10 Mar 2018 09:12:41 +0100

timekeeping/ntp: Don't align NTP frequency adjustments to ticks

When the timekeeping multiplier is changed, the NTP error is updated to
correct the clock for the delay between the tick and the update of the
clock. This error is corrected in later updates and the clock appears as
if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Signed-off-by: Miroslav Lichvar 
Signed-off-by: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1520620971-9567-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd03317e7b57..c1a0ac17336e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1860,8 +1860,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1872,7 +1870,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*


[PATCHv2 0/2] Improve stability of system clock

2018-01-05 Thread Miroslav Lichvar
(resending with fixed CC)

v1->v2
- rebased to current code
- improved commit messages

This patch set improves stability and accuracy of the system clock
when synchronized to precise time sources like the new PTP KVM clock or
NTP/PTP with hardware timestamping. The biggest difference is expected
on mostly idle systems running with NOHZ (i.e. having infrequent updates
of the clock).

The stability is affected by the error which accumulates in the
ntp_error register and its correction, which is performed by adjusting
the timekeeping multiplier. After removing the old vsyscalls, which
added an error due to nanosecond rounding, there are now three sources
of the error:
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multiplier

Instead of trying to correct the error faster, which is difficult to do
without causing instability due to irregular updates of the clock, the
patches remove the first two sources of the error. With the only
remaining source the correction logic can be significantly simplified
and the frequency of the clock is much more stable and accurate.

The freq-step selftest (with PTI disabled to pass the check of the
precision of the MONOTONIC_RAW clock) shows a significant improvement:

Before:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.131 22 147 +0.136  1   3  [OK]
 40960 +0.012 51 347 +0.005  1   3  [OK]
 40960 +3.832  25038  171297 -0.169  1   2  [OK]
 40960 +2.151  13165   89429 +0.113  1   3  [OK]
 40960 -0.095  1   2 -0.095  1   3  [OK]
   640 -0.154  1   6 -0.154  1   3  [OK]
   640 +0.003  1   6 +0.003  1   3  [OK]
   640 -0.131  2   9 -0.131  1   2  [OK]
   640 -0.026 19 125 -0.029  1   3  [OK]
   640 -0.175  1   3 -0.175  1   3  [OK]
10 +0.001  3   9 +0.000  3   6  [OK]
10 +0.002  7  41 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  3   5  [OK]
10 +0.001  5  23 -0.000  2   5  [OK]
10 +0.000  2   5 -0.000  6  15  [OK]

After:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.000  2   5 -0.000  2   5  [OK]
 40960 -0.000  2   5 -0.000  3   8  [OK]
 40960 -0.000  3   7 +0.000  3   6  [OK]
 40960 +0.000  3   6 -0.000  2   6  [OK]
 40960 -0.000  2   5 +0.000  3   7  [OK]
   640 +0.001  2   9 -0.000  2   5  [OK]
   640 -0.000  3   6 +0.000  2   5  [OK]
   640 -0.001  2   4 +0.000  1   3  [OK]
   640 -0.000  2   5 +0.000  2   4  [OK]
   640 +0.000  3   6 -0.000  3   6  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  3  12 +0.000  4  11  [OK]


Miroslav Lichvar (2):
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 140 
 2 files changed, 48 insertions(+), 94 deletions(-)

-- 
2.9.5



[PATCHv2 0/2] Improve stability of system clock

2018-01-05 Thread Miroslav Lichvar
(resending with fixed CC)

v1->v2
- rebased to current code
- improved commit messages

This patch set improves stability and accuracy of the system clock
when synchronized to precise time sources like the new PTP KVM clock or
NTP/PTP with hardware timestamping. The biggest difference is expected
on mostly idle systems running with NOHZ (i.e. having infrequent updates
of the clock).

The stability is affected by the error which accumulates in the
ntp_error register and its correction, which is performed by adjusting
the timekeeping multiplier. After removing the old vsyscalls, which
added an error due to nanosecond rounding, there are now three sources
of the error:
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multiplier

Instead of trying to correct the error faster, which is difficult to do
without causing instability due to irregular updates of the clock, the
patches remove the first two sources of the error. With the only
remaining source the correction logic can be significantly simplified
and the frequency of the clock is much more stable and accurate.

The freq-step selftest (with PTI disabled to pass the check of the
precision of the MONOTONIC_RAW clock) shows a significant improvement:

Before:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.131 22 147 +0.136  1   3  [OK]
 40960 +0.012 51 347 +0.005  1   3  [OK]
 40960 +3.832  25038  171297 -0.169  1   2  [OK]
 40960 +2.151  13165   89429 +0.113  1   3  [OK]
 40960 -0.095  1   2 -0.095  1   3  [OK]
   640 -0.154  1   6 -0.154  1   3  [OK]
   640 +0.003  1   6 +0.003  1   3  [OK]
   640 -0.131  2   9 -0.131  1   2  [OK]
   640 -0.026 19 125 -0.029  1   3  [OK]
   640 -0.175  1   3 -0.175  1   3  [OK]
10 +0.001  3   9 +0.000  3   6  [OK]
10 +0.002  7  41 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  3   5  [OK]
10 +0.001  5  23 -0.000  2   5  [OK]
10 +0.000  2   5 -0.000  6  15  [OK]

After:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.000  2   5 -0.000  2   5  [OK]
 40960 -0.000  2   5 -0.000  3   8  [OK]
 40960 -0.000  3   7 +0.000  3   6  [OK]
 40960 +0.000  3   6 -0.000  2   6  [OK]
 40960 -0.000  2   5 +0.000  3   7  [OK]
   640 +0.001  2   9 -0.000  2   5  [OK]
   640 -0.000  3   6 +0.000  2   5  [OK]
   640 -0.001  2   4 +0.000  1   3  [OK]
   640 -0.000  2   5 +0.000  2   4  [OK]
   640 +0.000  3   6 -0.000  3   6  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  3  12 +0.000  4  11  [OK]


Miroslav Lichvar (2):
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 140 
 2 files changed, 48 insertions(+), 94 deletions(-)

-- 
2.9.5



[PATCHv2 1/2] timekeeping: Don't align frequency adjustments to ticks

2018-01-05 Thread Miroslav Lichvar
When the timekeeping multiplier is changed, the NTP error is updated to
correct the clock for the delay between the tick and the update of the
clock. This error is corrected in later updates and the clock appears as
if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd03317..c1a0ac1 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1860,8 +1860,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1872,7 +1870,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*
-- 
2.9.5



[PATCHv2 2/2] timekeeping: Determine multiplier directly from NTP tick length

2018-01-05 Thread Miroslav Lichvar
When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP application is correcting the initial offset of the clock, a
large value may accumulate in the NTP error before the multiplier
converges to the correct value. It may then take a very long time (hours
or even days) before the error is corrected. This causes the clock to
have an unstable frequency offset, which has a negative impact on the
stability of synchronization with precise time sources (e.g. NTP/PTP
using hardware timestamping or the PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length and replace the iterative approach. This removes the last
major source of the NTP error. The only remaining source is now limited
resolution of the multiplier, which is corrected by adding 1 to the
multiplier when the system clock is behind the NTP time.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 138 
 2 files changed, 49 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index d315c3d..7acb953 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -117,6 +117,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c1a0ac1..e117601 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1873,85 +1873,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool negative;
-   u32 adj_scale;
-
-   /* Remove any current error adj from freq calculation */
-   if (tk->ntp_err_mult)
-   xinterval -= tk->cycle_interval;
-
-   tk->ntp_tick = ntp_tick_length();
-
-   /* Calculate current error per tick */
-   tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-   tick_error -= (xinterval + tk->xtime_remainder);
-
-   /* Don't worry about correcting it if its small */
-   if (likely((tick_error >= 0) && (tick_error <= interval)))
-   return;
-
-   /* preserve the direction of correction */
-   negative = (tick_error < 0);
+   u32 mult;
 
-   /* If any adjustment would pass the max, just return */
-   if (negative && (cu

[PATCHv2 1/2] timekeeping: Don't align frequency adjustments to ticks

2018-01-05 Thread Miroslav Lichvar
When the timekeeping multiplier is changed, the NTP error is updated to
correct the clock for the delay between the tick and the update of the
clock. This error is corrected in later updates and the clock appears as
if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd03317..c1a0ac1 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1860,8 +1860,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1872,7 +1870,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*
-- 
2.9.5



[PATCHv2 2/2] timekeeping: Determine multiplier directly from NTP tick length

2018-01-05 Thread Miroslav Lichvar
When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP application is correcting the initial offset of the clock, a
large value may accumulate in the NTP error before the multiplier
converges to the correct value. It may then take a very long time (hours
or even days) before the error is corrected. This causes the clock to
have an unstable frequency offset, which has a negative impact on the
stability of synchronization with precise time sources (e.g. NTP/PTP
using hardware timestamping or the PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length and replace the iterative approach. This removes the last
major source of the NTP error. The only remaining source is now limited
resolution of the multiplier, which is corrected by adding 1 to the
multiplier when the system clock is behind the NTP time.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 138 
 2 files changed, 49 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index d315c3d..7acb953 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -117,6 +117,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c1a0ac1..e117601 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -332,6 +332,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1799,20 +1800,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1873,85 +1873,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool negative;
-   u32 adj_scale;
-
-   /* Remove any current error adj from freq calculation */
-   if (tk->ntp_err_mult)
-   xinterval -= tk->cycle_interval;
-
-   tk->ntp_tick = ntp_tick_length();
-
-   /* Calculate current error per tick */
-   tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-   tick_error -= (xinterval + tk->xtime_remainder);
-
-   /* Don't worry about correcting it if its small */
-   if (likely((tick_error >= 0) && (tick_error <= interval)))
-   return;
-
-   /* preserve the direction of correction */
-   negative = (tick_error < 0);
+   u32 mult;
 
-   /* If any adjustment would pass the max, just return */
-   if (negative && (cur_adj - 1) <= (base - max))
-   return;
-   if (!negative && (cur_adj + 1) >= (base + m

[PATCHv2 0/2] Improve stability of system clock

2018-01-05 Thread Miroslav Lichvar
v1->v2
- rebased to current code
- improved commit messages

This patch set improves stability and accuracy of the system clock
when synchronized to precise time sources like the new PTP KVM clock or
NTP/PTP with hardware timestamping. The biggest difference is expected
on mostly idle systems running with NOHZ (i.e. having infrequent updates
of the clock).

The stability is affected by the error which accumulates in the
ntp_error register and its correction, which is performed by adjusting
the timekeeping multiplier. After removing the old vsyscalls, which
added an error due to nanosecond rounding, there are now three sources
of the error:
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multiplier

Instead of trying to correct the error faster, which is difficult to do
without causing instability due to irregular updates of the clock, the
patches remove the first two sources of the error. With the only
remaining source the correction logic can be significantly simplified
and the frequency of the clock is much more stable and accurate.

The freq-step selftest (with PTI disabled to pass the check of the
precision of the MONOTONIC_RAW clock) shows a significant improvement:

Before:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.131 22 147 +0.136  1   3  [OK]
 40960 +0.012 51 347 +0.005  1   3  [OK]
 40960 +3.832  25038  171297 -0.169  1   2  [OK]
 40960 +2.151  13165   89429 +0.113  1   3  [OK]
 40960 -0.095  1   2 -0.095  1   3  [OK]
   640 -0.154  1   6 -0.154  1   3  [OK]
   640 +0.003  1   6 +0.003  1   3  [OK]
   640 -0.131  2   9 -0.131  1   2  [OK]
   640 -0.026 19 125 -0.029  1   3  [OK]
   640 -0.175  1   3 -0.175  1   3  [OK]
10 +0.001  3   9 +0.000  3   6  [OK]
10 +0.002  7  41 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  3   5  [OK]
10 +0.001  5  23 -0.000  2   5  [OK]
10 +0.000  2   5 -0.000  6  15  [OK]

After:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.000  2   5 -0.000  2   5  [OK]
 40960 -0.000  2   5 -0.000  3   8  [OK]
 40960 -0.000  3   7 +0.000  3   6  [OK]
 40960 +0.000  3   6 -0.000  2   6  [OK]
 40960 -0.000  2   5 +0.000  3   7  [OK]
   640 +0.001  2   9 -0.000  2   5  [OK]
   640 -0.000  3   6 +0.000  2   5  [OK]
   640 -0.001  2   4 +0.000  1   3  [OK]
   640 -0.000  2   5 +0.000  2   4  [OK]
   640 +0.000  3   6 -0.000  3   6  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  3  12 +0.000  4  11  [OK]


Miroslav Lichvar (2):
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 140 
 2 files changed, 48 insertions(+), 94 deletions(-)

-- 
2.9.5



[PATCHv2 0/2] Improve stability of system clock

2018-01-05 Thread Miroslav Lichvar
v1->v2
- rebased to current code
- improved commit messages

This patch set improves stability and accuracy of the system clock
when synchronized to precise time sources like the new PTP KVM clock or
NTP/PTP with hardware timestamping. The biggest difference is expected
on mostly idle systems running with NOHZ (i.e. having infrequent updates
of the clock).

The stability is affected by the error which accumulates in the
ntp_error register and its correction, which is performed by adjusting
the timekeeping multiplier. After removing the old vsyscalls, which
added an error due to nanosecond rounding, there are now three sources
of the error:
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multiplier

Instead of trying to correct the error faster, which is difficult to do
without causing instability due to irregular updates of the clock, the
patches remove the first two sources of the error. With the only
remaining source the correction logic can be significantly simplified
and the frequency of the clock is much more stable and accurate.

The freq-step selftest (with PTI disabled to pass the check of the
precision of the MONOTONIC_RAW clock) shows a significant improvement:

Before:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.131 22 147 +0.136  1   3  [OK]
 40960 +0.012 51 347 +0.005  1   3  [OK]
 40960 +3.832  25038  171297 -0.169  1   2  [OK]
 40960 +2.151  13165   89429 +0.113  1   3  [OK]
 40960 -0.095  1   2 -0.095  1   3  [OK]
   640 -0.154  1   6 -0.154  1   3  [OK]
   640 +0.003  1   6 +0.003  1   3  [OK]
   640 -0.131  2   9 -0.131  1   2  [OK]
   640 -0.026 19 125 -0.029  1   3  [OK]
   640 -0.175  1   3 -0.175  1   3  [OK]
10 +0.001  3   9 +0.000  3   6  [OK]
10 +0.002  7  41 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  3   5  [OK]
10 +0.001  5  23 -0.000  2   5  [OK]
10 +0.000  2   5 -0.000  6  15  [OK]

After:

Checking response to frequency step:
  Step   1st interval  2nd interval
 FreqDev Max   FreqDev Max
 40960 +0.000  2   5 -0.000  2   5  [OK]
 40960 -0.000  2   5 -0.000  3   8  [OK]
 40960 -0.000  3   7 +0.000  3   6  [OK]
 40960 +0.000  3   6 -0.000  2   6  [OK]
 40960 -0.000  2   5 +0.000  3   7  [OK]
   640 +0.001  2   9 -0.000  2   5  [OK]
   640 -0.000  3   6 +0.000  2   5  [OK]
   640 -0.001  2   4 +0.000  1   3  [OK]
   640 -0.000  2   5 +0.000  2   4  [OK]
   640 +0.000  3   6 -0.000  3   6  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 -0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  2   5 +0.000  2   5  [OK]
10 +0.000  3  12 +0.000  4  11  [OK]


Miroslav Lichvar (2):
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 140 
 2 files changed, 48 insertions(+), 94 deletions(-)

-- 
2.9.5



[tip:timers/urgent] timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-11-14 Thread tip-bot for Miroslav Lichvar
Commit-ID:  aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5
Gitweb: https://git.kernel.org/tip/aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5
Author: Miroslav Lichvar <mlich...@redhat.com>
AuthorDate: Mon, 13 Nov 2017 14:51:31 -0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 14 Nov 2017 11:20:25 +0100

timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD

As of d4d1fc61eb38f (ia64: Update fsyscall gettime to use modern
vsyscall_update)the last user of CONFIG_GENERIC_TIME_VSYSCALL_OLD
have been updated, the legacy support for old-style vsyscall
implementations can be removed from the timekeeping code.

(Thanks again to Tony Luck for helping remove the last user!)

[jstultz: Commit message rework]

Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
Signed-off-by: John Stultz <john.stu...@linaro.org>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Stephen Boyd <stephen.b...@linaro.org>
Link: 
https://lkml.kernel.org/r/1510613491-16695-1-git-send-email-john.stu...@linaro.org

---
 include/linux/timekeeper_internal.h |  7 --
 kernel/time/Kconfig |  4 
 kernel/time/timekeeping.c   | 45 -
 3 files changed, 56 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 7e90111..d315c3d 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -136,13 +136,6 @@ struct timekeeper {
 extern void update_vsyscall(struct timekeeper *tk);
 extern void update_vsyscall_tz(void);
 
-#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD)
-
-extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
-   struct clocksource *c, u32 mult,
-   u64 cycle_last);
-extern void update_vsyscall_tz(void);
-
 #else
 
 static inline void update_vsyscall(struct timekeeper *tk)
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index d689a95..e776fc8 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE
 config GENERIC_TIME_VSYSCALL
bool
 
-# Timekeeping vsyscall support
-config GENERIC_TIME_VSYSCALL_OLD
-   bool
-
 # Old style timekeeping
 config ARCH_USES_GETTIMEOFFSET
bool
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 198afa7..cd03317 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -557,45 +557,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
update_fast_timekeeper(_dummy, _fast_raw);
 }
 
-#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD
-#warning Please contact your maintainers, as GENERIC_TIME_VSYSCALL_OLD 
compatibity will disappear soon.
-
-static inline void update_vsyscall(struct timekeeper *tk)
-{
-   struct timespec xt, wm;
-
-   xt = timespec64_to_timespec(tk_xtime(tk));
-   wm = timespec64_to_timespec(tk->wall_to_monotonic);
-   update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult,
-   tk->tkr_mono.cycle_last);
-}
-
-static inline void old_vsyscall_fixup(struct timekeeper *tk)
-{
-   s64 remainder;
-
-   /*
-   * Store only full nanoseconds into xtime_nsec after rounding
-   * it up and add the remainder to the error difference.
-   * XXX - This is necessary to avoid small 1ns inconsistnecies caused
-   * by truncating the remainder in vsyscalls. However, it causes
-   * additional work to be done in timekeeping_adjust(). Once
-   * the vsyscall implementations are converted to use xtime_nsec
-   * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD
-   * users are removed, this can be killed.
-   */
-   remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 
1);
-   if (remainder != 0) {
-   tk->tkr_mono.xtime_nsec -= remainder;
-   tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift;
-   tk->ntp_error += remainder << tk->ntp_error_shift;
-   tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << 
tk->ntp_error_shift;
-   }
-}
-#else
-#define old_vsyscall_fixup(tk)
-#endif
-
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk, bool was_set)
@@ -2164,12 +2125,6 @@ void update_wall_time(void)
timekeeping_adjust(tk, offset);
 
/*
-* XXX This can be killed once everyone converts
-* to the new update_vsyscall.
-*/
-   old_vsyscall_fixup(tk);
-
-   /*
 * Finally, make sure that after the rounding
 * xtime_nsec isn't larger than NSEC_PER_SEC
 */


[tip:timers/urgent] timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-11-14 Thread tip-bot for Miroslav Lichvar
Commit-ID:  aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5
Gitweb: https://git.kernel.org/tip/aea3706cfc4d952ed6d32b6d5845b5ecd99ed7f5
Author: Miroslav Lichvar 
AuthorDate: Mon, 13 Nov 2017 14:51:31 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 14 Nov 2017 11:20:25 +0100

timekeeping: Remove CONFIG_GENERIC_TIME_VSYSCALL_OLD

As of d4d1fc61eb38f (ia64: Update fsyscall gettime to use modern
vsyscall_update)the last user of CONFIG_GENERIC_TIME_VSYSCALL_OLD
have been updated, the legacy support for old-style vsyscall
implementations can be removed from the timekeeping code.

(Thanks again to Tony Luck for helping remove the last user!)

[jstultz: Commit message rework]

Signed-off-by: Miroslav Lichvar 
Signed-off-by: John Stultz 
Signed-off-by: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Tony Luck 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Link: 
https://lkml.kernel.org/r/1510613491-16695-1-git-send-email-john.stu...@linaro.org

---
 include/linux/timekeeper_internal.h |  7 --
 kernel/time/Kconfig |  4 
 kernel/time/timekeeping.c   | 45 -
 3 files changed, 56 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 7e90111..d315c3d 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -136,13 +136,6 @@ struct timekeeper {
 extern void update_vsyscall(struct timekeeper *tk);
 extern void update_vsyscall_tz(void);
 
-#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD)
-
-extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
-   struct clocksource *c, u32 mult,
-   u64 cycle_last);
-extern void update_vsyscall_tz(void);
-
 #else
 
 static inline void update_vsyscall(struct timekeeper *tk)
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index d689a95..e776fc8 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE
 config GENERIC_TIME_VSYSCALL
bool
 
-# Timekeeping vsyscall support
-config GENERIC_TIME_VSYSCALL_OLD
-   bool
-
 # Old style timekeeping
 config ARCH_USES_GETTIMEOFFSET
bool
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 198afa7..cd03317 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -557,45 +557,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
update_fast_timekeeper(_dummy, _fast_raw);
 }
 
-#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD
-#warning Please contact your maintainers, as GENERIC_TIME_VSYSCALL_OLD 
compatibity will disappear soon.
-
-static inline void update_vsyscall(struct timekeeper *tk)
-{
-   struct timespec xt, wm;
-
-   xt = timespec64_to_timespec(tk_xtime(tk));
-   wm = timespec64_to_timespec(tk->wall_to_monotonic);
-   update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult,
-   tk->tkr_mono.cycle_last);
-}
-
-static inline void old_vsyscall_fixup(struct timekeeper *tk)
-{
-   s64 remainder;
-
-   /*
-   * Store only full nanoseconds into xtime_nsec after rounding
-   * it up and add the remainder to the error difference.
-   * XXX - This is necessary to avoid small 1ns inconsistnecies caused
-   * by truncating the remainder in vsyscalls. However, it causes
-   * additional work to be done in timekeeping_adjust(). Once
-   * the vsyscall implementations are converted to use xtime_nsec
-   * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD
-   * users are removed, this can be killed.
-   */
-   remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 
1);
-   if (remainder != 0) {
-   tk->tkr_mono.xtime_nsec -= remainder;
-   tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift;
-   tk->ntp_error += remainder << tk->ntp_error_shift;
-   tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << 
tk->ntp_error_shift;
-   }
-}
-#else
-#define old_vsyscall_fixup(tk)
-#endif
-
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk, bool was_set)
@@ -2164,12 +2125,6 @@ void update_wall_time(void)
timekeeping_adjust(tk, offset);
 
/*
-* XXX This can be killed once everyone converts
-* to the new update_vsyscall.
-*/
-   old_vsyscall_fixup(tk);
-
-   /*
 * Finally, make sure that after the rounding
 * xtime_nsec isn't larger than NSEC_PER_SEC
 */


Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Miroslav Lichvar
On Wed, Oct 04, 2017 at 05:16:31PM -0700, John Stultz wrote:
> On Wed, Oct 4, 2017 at 9:11 AM, Gabriel Beddingfield <g...@nestlabs.com> 
> wrote:
> > We found that the problem is an interaction between the NTP code and
> > what I call the "delta_delta hack." (see [1] and [2]) This code
> > allocates a static variable in a function that contains an offset from
> > the system time to the persistent/rtc clock. It uses that time to
> > fudge the suspend timestamp so that on resume the sleep time will be
> > compensated. It's kind of a statistical hack that assumes things will
> > average out. It seems to have two main assumptions:
> >
> >   1. The persistent/rtc clock has only single-second precision
> >   2. The system does not frequently suspend/resume.
> >   3. If delta_delta is less than 2 seconds, these assumptions are "true"
> >
> > Because the delta_delta hack is trying to maintain an offset from the
> > system time to the persistent/rtc clock, any minor NTP corrections
> > that have occurred since the last suspend will be discarded. However,
> > the NTP subsystem isn't notified that this is happening -- and so it
> > causes some level of instability in its PLL logic.

This is interesting. What polling interval was ntpd using? If I
understand it correctly, with a high-resolution persistent clock the
delta-delta compensation should be very small and shouldn't disrupt
ntpd. Does this instability disappear when ntpd is not controlling the
clock (i.e. "disable ntp" in ntp.conf)?

> We should also figure out how to best handle ntpd in userspace dealing
> with frequent suspend/resume cycles. This is problematic, as the
> closest analogy is trying driving on the road while frequently falling
> asleep.  This is not something I think ntpd handles well.  I suspect
> our options are that any ntp adjustments being made might be made for
> far too long (causing potentially massive over-correction) or not at
> all, and not at all seems slightly better in my mind.

Yeah, controlling the clock in such conditions will be difficult. The
kernel/ntp PLL requires periodic updates. There is some code in
ntp_update_offset() that reduces the frequency adjustment when PLL
updates are missing, but I'm not actually sure if it works correctly
with suspend.

-- 
Miroslav Lichvar


Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Miroslav Lichvar
On Wed, Oct 04, 2017 at 05:16:31PM -0700, John Stultz wrote:
> On Wed, Oct 4, 2017 at 9:11 AM, Gabriel Beddingfield  
> wrote:
> > We found that the problem is an interaction between the NTP code and
> > what I call the "delta_delta hack." (see [1] and [2]) This code
> > allocates a static variable in a function that contains an offset from
> > the system time to the persistent/rtc clock. It uses that time to
> > fudge the suspend timestamp so that on resume the sleep time will be
> > compensated. It's kind of a statistical hack that assumes things will
> > average out. It seems to have two main assumptions:
> >
> >   1. The persistent/rtc clock has only single-second precision
> >   2. The system does not frequently suspend/resume.
> >   3. If delta_delta is less than 2 seconds, these assumptions are "true"
> >
> > Because the delta_delta hack is trying to maintain an offset from the
> > system time to the persistent/rtc clock, any minor NTP corrections
> > that have occurred since the last suspend will be discarded. However,
> > the NTP subsystem isn't notified that this is happening -- and so it
> > causes some level of instability in its PLL logic.

This is interesting. What polling interval was ntpd using? If I
understand it correctly, with a high-resolution persistent clock the
delta-delta compensation should be very small and shouldn't disrupt
ntpd. Does this instability disappear when ntpd is not controlling the
clock (i.e. "disable ntp" in ntp.conf)?

> We should also figure out how to best handle ntpd in userspace dealing
> with frequent suspend/resume cycles. This is problematic, as the
> closest analogy is trying driving on the road while frequently falling
> asleep.  This is not something I think ntpd handles well.  I suspect
> our options are that any ntp adjustments being made might be made for
> far too long (causing potentially massive over-correction) or not at
> all, and not at all seems slightly better in my mind.

Yeah, controlling the clock in such conditions will be difficult. The
kernel/ntp PLL requires periodic updates. There is some code in
ntp_update_offset() that reduces the frequency adjustment when PLL
updates are missing, but I'm not actually sure if it works correctly
with suspend.

-- 
Miroslav Lichvar


Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission

2017-09-19 Thread Miroslav Lichvar
On Mon, Sep 18, 2017 at 09:41:15AM +0200, Richard Cochran wrote:
> This series is an early RFC that introduces a new socket option
> allowing time based transmission of packets.  This option will be
> useful in implementing various real time protocols over Ethernet,
> including but not limited to P802.1Qbv, which is currently finding
> its way into 802.1Q.

If I understand it correctly, this also allows us to make a PTP/NTP
"one-step" clock with HW that doesn't support it directly.

> * Open questions about SO_TXTIME semantics
> 
>   - What should the kernel do if the dialed Tx time is in the past?
> Should the packet be sent ASAP, or should we throw an error?

Dropping the packet with an error would make more sense to me.

>   - What should the timescale be for the dialed Tx time?  Should the
> kernel select UTC when using the SW Qdisc and the HW time
> otherwise?  Or should the socket option include a clockid_t?

I think for applications that don't (want to) bind their socket to a
specific interface it would be useful if the cmsg specified clockid_t
or maybe if_index. If the packet would be sent using a different
PHC/interface, it should be dropped.

>   | | plain preempt_rt | so_txtime | txtime @ 250 us |
>   |-+--+---+-|
>   | min:|+1.940800e+04 | +4.72e+02 |   +4.72e+02 |
>   | max:|+7.556000e+04 | +5.68e+02 |   +5.76e+02 |
>   | pk-pk:  |+5.615200e+04 | +9.60e+01 |   +1.04e+02 |
>   | mean:   |+3.292776e+04 | +5.072274e+02 |   +5.073602e+02 |
>   | stddev: |+6.514709e+03 | +1.310849e+01 |   +1.507144e+01 |
>   | count:  |   60 |60 | 240 |
> 
>   Using so_txtime, the peak to peak jitter is about 100 nanoseconds,

Nice!

-- 
Miroslav Lichvar


Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission

2017-09-19 Thread Miroslav Lichvar
On Mon, Sep 18, 2017 at 09:41:15AM +0200, Richard Cochran wrote:
> This series is an early RFC that introduces a new socket option
> allowing time based transmission of packets.  This option will be
> useful in implementing various real time protocols over Ethernet,
> including but not limited to P802.1Qbv, which is currently finding
> its way into 802.1Q.

If I understand it correctly, this also allows us to make a PTP/NTP
"one-step" clock with HW that doesn't support it directly.

> * Open questions about SO_TXTIME semantics
> 
>   - What should the kernel do if the dialed Tx time is in the past?
> Should the packet be sent ASAP, or should we throw an error?

Dropping the packet with an error would make more sense to me.

>   - What should the timescale be for the dialed Tx time?  Should the
> kernel select UTC when using the SW Qdisc and the HW time
> otherwise?  Or should the socket option include a clockid_t?

I think for applications that don't (want to) bind their socket to a
specific interface it would be useful if the cmsg specified clockid_t
or maybe if_index. If the packet would be sent using a different
PHC/interface, it should be dropped.

>   | | plain preempt_rt | so_txtime | txtime @ 250 us |
>   |-+--+---+-|
>   | min:|+1.940800e+04 | +4.72e+02 |   +4.72e+02 |
>   | max:|+7.556000e+04 | +5.68e+02 |   +5.76e+02 |
>   | pk-pk:  |+5.615200e+04 | +9.60e+01 |   +1.04e+02 |
>   | mean:   |+3.292776e+04 | +5.072274e+02 |   +5.073602e+02 |
>   | stddev: |+6.514709e+03 | +1.310849e+01 |   +1.507144e+01 |
>   | count:  |   60 |60 | 240 |
> 
>   Using so_txtime, the peak to peak jitter is about 100 nanoseconds,

Nice!

-- 
Miroslav Lichvar


Re: [RFC][PATCH 2/2] selftests: timers: freq-step: Fix build warning

2017-08-15 Thread Miroslav Lichvar
On Mon, Aug 14, 2017 at 02:01:36PM -0700, John Stultz wrote:
> Fixes the following build warning:
> freq-step.c: In function ‘main’:
> freq-step.c:271:1: warning: control reaches end of non-void function 
> [-Wreturn-type]

> @@ -268,4 +268,6 @@ int main(int argc, char **argv)
>   ksft_exit_fail();
>  
>   ksft_exit_pass();
> +
> + return 0;
>  }

It seems most tests use "return ksft_exit_pass();". Would that be
preferred over separate return? I don't have a preference.

Both patches in this set look good to me.

Thanks,

-- 
Miroslav Lichvar


Re: [RFC][PATCH 2/2] selftests: timers: freq-step: Fix build warning

2017-08-15 Thread Miroslav Lichvar
On Mon, Aug 14, 2017 at 02:01:36PM -0700, John Stultz wrote:
> Fixes the following build warning:
> freq-step.c: In function ‘main’:
> freq-step.c:271:1: warning: control reaches end of non-void function 
> [-Wreturn-type]

> @@ -268,4 +268,6 @@ int main(int argc, char **argv)
>   ksft_exit_fail();
>  
>   ksft_exit_pass();
> +
> + return 0;
>  }

It seems most tests use "return ksft_exit_pass();". Would that be
preferred over separate return? I don't have a preference.

Both patches in this set look good to me.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH] kselftest/timers: fix build failure

2017-07-25 Thread Miroslav Lichvar
On Tue, Jul 25, 2017 at 01:20:37PM +0530, Santosh Sivaraj wrote:
> "message" parameter was not passed to ksft_exit_skip.

It looks like this was a bad merge of 76739256 and 54f57baa, which
added the new parameter to ksft_exit_skip().

> @@ -231,7 +231,7 @@ static void init_test(void)
>  
>   if (precision > MAX_PRECISION) {
>   printf("[SKIP]\n");
> - ksft_exit_skip();
> + ksft_exit_skip("[SKIP]\n");

To avoid printing "skip" multiple times, I'd suggest to change the
argument of ksft_exit_skip to NULL.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH] kselftest/timers: fix build failure

2017-07-25 Thread Miroslav Lichvar
On Tue, Jul 25, 2017 at 01:20:37PM +0530, Santosh Sivaraj wrote:
> "message" parameter was not passed to ksft_exit_skip.

It looks like this was a bad merge of 76739256 and 54f57baa, which
added the new parameter to ksft_exit_skip().

> @@ -231,7 +231,7 @@ static void init_test(void)
>  
>   if (precision > MAX_PRECISION) {
>   printf("[SKIP]\n");
> - ksft_exit_skip();
> + ksft_exit_skip("[SKIP]\n");

To avoid printing "skip" multiple times, I'd suggest to change the
argument of ksft_exit_skip to NULL.

Thanks,

-- 
Miroslav Lichvar


[PATCH] kselftests: timers: Fix inconsistency-check to not ignore first timestamp

2017-06-13 Thread Miroslav Lichvar
When the first timestamp in the list of clock readings was later than
the second timestamp and all other timestamps were in order, the
inconsistency was not reported because the index of the out-of-order
timestamp was equal to the default value.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 tools/testing/selftests/timers/inconsistency-check.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/timers/inconsistency-check.c 
b/tools/testing/selftests/timers/inconsistency-check.c
index caf1bc9..74c60e8 100644
--- a/tools/testing/selftests/timers/inconsistency-check.c
+++ b/tools/testing/selftests/timers/inconsistency-check.c
@@ -118,7 +118,7 @@ int consistency_test(int clock_type, unsigned long seconds)
start_str = ctime();
 
while (seconds == -1 || now - then < seconds) {
-   inconsistent = 0;
+   inconsistent = -1;
 
/* Fill list */
for (i = 0; i < CALLS_PER_LOOP; i++)
@@ -130,7 +130,7 @@ int consistency_test(int clock_type, unsigned long seconds)
inconsistent = i;
 
/* display inconsistency */
-   if (inconsistent) {
+   if (inconsistent >= 0) {
unsigned long long delta;
 
printf("\%s\n", start_str);
-- 
2.9.3



[PATCH] kselftests: timers: Fix inconsistency-check to not ignore first timestamp

2017-06-13 Thread Miroslav Lichvar
When the first timestamp in the list of clock readings was later than
the second timestamp and all other timestamps were in order, the
inconsistency was not reported because the index of the out-of-order
timestamp was equal to the default value.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 tools/testing/selftests/timers/inconsistency-check.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/timers/inconsistency-check.c 
b/tools/testing/selftests/timers/inconsistency-check.c
index caf1bc9..74c60e8 100644
--- a/tools/testing/selftests/timers/inconsistency-check.c
+++ b/tools/testing/selftests/timers/inconsistency-check.c
@@ -118,7 +118,7 @@ int consistency_test(int clock_type, unsigned long seconds)
start_str = ctime();
 
while (seconds == -1 || now - then < seconds) {
-   inconsistent = 0;
+   inconsistent = -1;
 
/* Fill list */
for (i = 0; i < CALLS_PER_LOOP; i++)
@@ -130,7 +130,7 @@ int consistency_test(int clock_type, unsigned long seconds)
inconsistent = i;
 
/* display inconsistency */
-   if (inconsistent) {
+   if (inconsistent >= 0) {
unsigned long long delta;
 
printf("\%s\n", start_str);
-- 
2.9.3



Re: [PATCH 2/3 v3] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

2017-06-13 Thread Miroslav Lichvar
On Thu, Jun 08, 2017 at 04:44:21PM -0700, John Stultz wrote:
> Due to how the MONOTONIC_RAW accumulation logic was handled,
> there is the potential for a 1ns discontinuity when we do
> accumulations. This small discontinuity has for the most part
> gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
> in their vDSO clock_gettime implementation, we've seen failures
> with the inconsistency-check test in kselftest.
> 
> This patch addresses the issue by using the same sub-ns
> accumulation handling that CLOCK_MONOTONIC uses, which avoids
> the issue for in-kernel users.

The patch looks good to me and in testing on x86_64 I didn't see any
issues with the CLOCK_MONOTONIC_RAW clock.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH 2/3 v3] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

2017-06-13 Thread Miroslav Lichvar
On Thu, Jun 08, 2017 at 04:44:21PM -0700, John Stultz wrote:
> Due to how the MONOTONIC_RAW accumulation logic was handled,
> there is the potential for a 1ns discontinuity when we do
> accumulations. This small discontinuity has for the most part
> gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
> in their vDSO clock_gettime implementation, we've seen failures
> with the inconsistency-check test in kselftest.
> 
> This patch addresses the issue by using the same sub-ns
> accumulation handling that CLOCK_MONOTONIC uses, which avoids
> the issue for in-kernel users.

The patch looks good to me and in testing on x86_64 I didn't see any
issues with the CLOCK_MONOTONIC_RAW clock.

Thanks,

-- 
Miroslav Lichvar


[PATCH] kselftests: timers: Add test for frequency step

2017-06-09 Thread Miroslav Lichvar
This test checks the response of the system clock to frequency
steps made with adjtimex(). The frequency error and stability of
the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock
is measured in two intervals following the step. The test fails if
values from the second interval exceed specified limits.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Richard Cochran <richardcoch...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 tools/testing/selftests/timers/Makefile|   5 +-
 tools/testing/selftests/timers/freq-step.c | 268 +
 2 files changed, 271 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/timers/freq-step.c

diff --git a/tools/testing/selftests/timers/Makefile 
b/tools/testing/selftests/timers/Makefile
index 5fa1d7e9..5801bbe 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -1,6 +1,6 @@
 BUILD_FLAGS = -DKTEST
 CFLAGS += -O3 -Wl,-no-as-needed -Wall $(BUILD_FLAGS)
-LDFLAGS += -lrt -lpthread
+LDFLAGS += -lrt -lpthread -lm
 
 # these are all "safe" tests that don't modify
 # system time or require escalated privileges
@@ -8,7 +8,7 @@ TEST_GEN_PROGS = posix_timers nanosleep nsleep-lat 
set-timer-lat mqueue-lat \
 inconsistency-check raw_skew threadtest rtctest
 
 TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick 
change_skew \
- skew_consistency clocksource-switch leap-a-day \
+ skew_consistency clocksource-switch freq-step leap-a-day \
  leapcrash set-tai set-2038 set-tz
 
 
@@ -24,6 +24,7 @@ run_destructive_tests: run_tests
./change_skew
./skew_consistency
./clocksource-switch
+   ./freq-step
./leap-a-day -s -i 10
./leapcrash
./set-tz
diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
new file mode 100644
index 000..e8c6183
--- /dev/null
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -0,0 +1,268 @@
+/*
+ * This test checks the response of the system clock to frequency
+ * steps made with adjtimex(). The frequency error and stability of
+ * the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock
+ * is measured in two intervals following the step. The test fails if
+ * values from the second interval exceed specified limits.
+ *
+ * Copyright (C) Miroslav Lichvar <mlich...@redhat.com>  2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+#define SAMPLES 100
+#define SAMPLE_READINGS 10
+#define MEAN_SAMPLE_INTERVAL 0.1
+#define STEP_INTERVAL 1.0
+#define MAX_PRECISION 100e-9
+#define MAX_FREQ_ERROR 10e-6
+#define MAX_STDDEV 1000e-9
+
+struct sample {
+   double offset;
+   double time;
+};
+
+static time_t mono_raw_base;
+static time_t mono_base;
+static long user_hz;
+static double precision;
+static double mono_freq_offset;
+
+static double diff_timespec(struct timespec *ts1, struct timespec *ts2)
+{
+   return ts1->tv_sec - ts2->tv_sec + (ts1->tv_nsec - ts2->tv_nsec) / 1e9;
+}
+
+static double get_sample(struct sample *sample)
+{
+   double delay, mindelay = 0.0;
+   struct timespec ts1, ts2, ts3;
+   int i;
+
+   for (i = 0; i < SAMPLE_READINGS; i++) {
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+   clock_gettime(CLOCK_MONOTONIC, );
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+
+   ts1.tv_sec -= mono_raw_base;
+   ts2.tv_sec -= mono_base;
+   ts3.tv_sec -= mono_raw_base;
+
+   delay = diff_timespec(, );
+   if (delay <= 1e-9) {
+   i--;
+   continue;
+   }
+
+   if (!i || delay < mindelay) {
+   sample->offset = diff_timespec(, );
+   sample->offset -= delay / 2.0;
+   sample->time = ts1.tv_sec + ts1.tv_nsec / 1e9;
+   mindelay = delay;
+   }
+   }
+
+   return mindelay;
+}
+
+static void reset_ntp_error(void)
+{
+   struct timex txc;
+
+   txc.modes = ADJ_SETOFFSET;
+   txc.time.tv_sec = 0;
+   txc.time.tv_usec = 0;
+
+   if (adjtimex() < 0) {
+   perror("[FAIL] adjtimex");
+   ksft_exit_fail();
+ 

[PATCH] kselftests: timers: Add test for frequency step

2017-06-09 Thread Miroslav Lichvar
This test checks the response of the system clock to frequency
steps made with adjtimex(). The frequency error and stability of
the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock
is measured in two intervals following the step. The test fails if
values from the second interval exceed specified limits.

Cc: John Stultz 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Miroslav Lichvar 
---
 tools/testing/selftests/timers/Makefile|   5 +-
 tools/testing/selftests/timers/freq-step.c | 268 +
 2 files changed, 271 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/timers/freq-step.c

diff --git a/tools/testing/selftests/timers/Makefile 
b/tools/testing/selftests/timers/Makefile
index 5fa1d7e9..5801bbe 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -1,6 +1,6 @@
 BUILD_FLAGS = -DKTEST
 CFLAGS += -O3 -Wl,-no-as-needed -Wall $(BUILD_FLAGS)
-LDFLAGS += -lrt -lpthread
+LDFLAGS += -lrt -lpthread -lm
 
 # these are all "safe" tests that don't modify
 # system time or require escalated privileges
@@ -8,7 +8,7 @@ TEST_GEN_PROGS = posix_timers nanosleep nsleep-lat 
set-timer-lat mqueue-lat \
 inconsistency-check raw_skew threadtest rtctest
 
 TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick 
change_skew \
- skew_consistency clocksource-switch leap-a-day \
+ skew_consistency clocksource-switch freq-step leap-a-day \
  leapcrash set-tai set-2038 set-tz
 
 
@@ -24,6 +24,7 @@ run_destructive_tests: run_tests
./change_skew
./skew_consistency
./clocksource-switch
+   ./freq-step
./leap-a-day -s -i 10
./leapcrash
./set-tz
diff --git a/tools/testing/selftests/timers/freq-step.c 
b/tools/testing/selftests/timers/freq-step.c
new file mode 100644
index 000..e8c6183
--- /dev/null
+++ b/tools/testing/selftests/timers/freq-step.c
@@ -0,0 +1,268 @@
+/*
+ * This test checks the response of the system clock to frequency
+ * steps made with adjtimex(). The frequency error and stability of
+ * the CLOCK_MONOTONIC clock relative to the CLOCK_MONOTONIC_RAW clock
+ * is measured in two intervals following the step. The test fails if
+ * values from the second interval exceed specified limits.
+ *
+ * Copyright (C) Miroslav Lichvar   2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+#define SAMPLES 100
+#define SAMPLE_READINGS 10
+#define MEAN_SAMPLE_INTERVAL 0.1
+#define STEP_INTERVAL 1.0
+#define MAX_PRECISION 100e-9
+#define MAX_FREQ_ERROR 10e-6
+#define MAX_STDDEV 1000e-9
+
+struct sample {
+   double offset;
+   double time;
+};
+
+static time_t mono_raw_base;
+static time_t mono_base;
+static long user_hz;
+static double precision;
+static double mono_freq_offset;
+
+static double diff_timespec(struct timespec *ts1, struct timespec *ts2)
+{
+   return ts1->tv_sec - ts2->tv_sec + (ts1->tv_nsec - ts2->tv_nsec) / 1e9;
+}
+
+static double get_sample(struct sample *sample)
+{
+   double delay, mindelay = 0.0;
+   struct timespec ts1, ts2, ts3;
+   int i;
+
+   for (i = 0; i < SAMPLE_READINGS; i++) {
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+   clock_gettime(CLOCK_MONOTONIC, );
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+
+   ts1.tv_sec -= mono_raw_base;
+   ts2.tv_sec -= mono_base;
+   ts3.tv_sec -= mono_raw_base;
+
+   delay = diff_timespec(, );
+   if (delay <= 1e-9) {
+   i--;
+   continue;
+   }
+
+   if (!i || delay < mindelay) {
+   sample->offset = diff_timespec(, );
+   sample->offset -= delay / 2.0;
+   sample->time = ts1.tv_sec + ts1.tv_nsec / 1e9;
+   mindelay = delay;
+   }
+   }
+
+   return mindelay;
+}
+
+static void reset_ntp_error(void)
+{
+   struct timex txc;
+
+   txc.modes = ADJ_SETOFFSET;
+   txc.time.tv_sec = 0;
+   txc.time.tv_usec = 0;
+
+   if (adjtimex() < 0) {
+   perror("[FAIL] adjtimex");
+   ksft_exit_fail();
+   }
+}
+
+static void set_frequency(double freq)
+{
+   struct timex txc;
+   int tick_offset;
+
+   tick_offset = 1e6 *

Re: [PATCH RFC 0/3] Improve stability of system clock

2017-06-08 Thread Miroslav Lichvar
On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote:
> >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar <mlich...@redhat.com> 
> >> wrote:
> >> > Is there a better way to run the timekeeping code in an userspace
> >> > application? I suspect it would need something like the Linux Kernel
> >> > Library project.

> So a few years ago I mentioned this at a testing session at I think
> Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter
> (or iptables?) simulator code that never made it upstream. However,
> now that kselftests are integrated with the kernel this could change.
> At least that's my memory of the discussion.
> 
> Anyway, I still think its worth trying to submit. Worse case its a
> huge pain and we pull it back out?

I've tried something different. I've reimplemented the simulated test
as an ordinary user-space application using the CLOCK_MONOTONIC and
CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give
results instantly, and it's not as precise as the original test, but
it can clearly show the difference that this patchset makes.

Before:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm
   base   step   freqdev max   freqdev max
  15.24  40960  -0.042731852  +0.01  2   3
 228.26  40960  +6.07  33089  225914  -0.09  2   4
  55.42  40960 -11.90  46413  232834  +0.01  1   3
 237.65  40960  -4.75  25574  173479  +0.05  1   3
 240.63  40960  -0.088465758  +0.05  2   3
 205.52640  -0.117815231  -0.01  2   4
 246.81640  +0.218095486  +0.08  1   3
 164.04640  +0.162821920  +0.11  2   3
 171.32640  -0.084082756  -0.01  2   3
 243.04640  -0.033492377  +0.03  2   3
 179.91 10  +0.07 28  62  -0.00  2   6
  45.44 10  +0.10 18 119  +0.00  6  29
 204.30 10  -0.00 21 122  -0.00  4   9
  76.18 10  +0.03 39  85  -0.00  3   5
 158.18 10  -0.02 26  94  -0.00  4   9

After:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm
   base   step   freqdev max   freqdev max
  93.98  40960  +0.00  3   7  +0.00  4   8
 117.95  40960  +0.00  3   9  -0.00  3   8
 230.44  40960  -0.00  4   9  -0.00  2   5
 240.56  40960  +0.00  3   6  -0.00  3   7
 228.39  40960  +0.00  3   7  -0.00  3   9
 237.85640  -0.00  4  10  +0.00  3   8
 250.74640  +0.00  4  11  -0.00  3   7
 249.06640  +0.00  4   9  -0.00  3   9
 114.98640  -0.00  3   8  +0.00  3   8
 120.59640  +0.00  3   7  +0.00  3   7
 190.66 10  -0.00  3   7  +0.00  3   7
 228.83 10  +0.00  3   7  -0.00  3   6
  18.91 10  +0.00  3   8  -0.00  3   8
  12.39 10  +0.00  3   8  +0.00  4   8
  12.01 10  +0.00  4   9  -0.00  4   9

Each line has statistics from 100 samples collected in 0.1 second
interval.

The frequency error in the second "freq" column with values up to 0.11
ppm shows the problem with the clock very slowly correcting a large
NTP error.

I can add some limits for the measured errors and submit it as new a
kselftest. If the measured precision is too large (e.g. >100ns), the
test can return "skip" in order to avoid false negatives.

If adjtimex() had an option to return the NTP error directly, or it
was possible to read the two clocks at the same time, the test could
be much more sensitive and observe shorter intervals (spanning fewer
clock updates).

-- 
Miroslav Lichvar


Re: [PATCH RFC 0/3] Improve stability of system clock

2017-06-08 Thread Miroslav Lichvar
On Fri, May 19, 2017 at 05:35:38PM -0700, John Stultz wrote:
> >> On Wed, May 17, 2017 at 10:22 AM, Miroslav Lichvar  
> >> wrote:
> >> > Is there a better way to run the timekeeping code in an userspace
> >> > application? I suspect it would need something like the Linux Kernel
> >> > Library project.

> So a few years ago I mentioned this at a testing session at I think
> Linux Plubmers' and Rusty (CC'ed) commented that he had some netfilter
> (or iptables?) simulator code that never made it upstream. However,
> now that kselftests are integrated with the kernel this could change.
> At least that's my memory of the discussion.
> 
> Anyway, I still think its worth trying to submit. Worse case its a
> huge pain and we pull it back out?

I've tried something different. I've reimplemented the simulated test
as an ordinary user-space application using the CLOCK_MONOTONIC and
CLOCK_MONOTONIC_RAW clocks. It's not deterministic, it doesn't give
results instantly, and it's not as precise as the original test, but
it can clearly show the difference that this patchset makes.

Before:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: 0.00 ppm
   base   step   freqdev max   freqdev max
  15.24  40960  -0.042731852  +0.01  2   3
 228.26  40960  +6.07  33089  225914  -0.09  2   4
  55.42  40960 -11.90  46413  232834  +0.01  1   3
 237.65  40960  -4.75  25574  173479  +0.05  1   3
 240.63  40960  -0.088465758  +0.05  2   3
 205.52640  -0.117815231  -0.01  2   4
 246.81640  +0.218095486  +0.08  1   3
 164.04640  +0.162821920  +0.11  2   3
 171.32640  -0.084082756  -0.01  2   3
 243.04640  -0.033492377  +0.03  2   3
 179.91 10  +0.07 28  62  -0.00  2   6
  45.44 10  +0.10 18 119  +0.00  6  29
 204.30 10  -0.00 21 122  -0.00  4   9
  76.18 10  +0.03 39  85  -0.00  3   5
 158.18 10  -0.02 26  94  -0.00  4   9

After:
Clock precision: 46 ns
CLOCK_MONOTONIC_RAW frequency offset: -0.00 ppm
   base   step   freqdev max   freqdev max
  93.98  40960  +0.00  3   7  +0.00  4   8
 117.95  40960  +0.00  3   9  -0.00  3   8
 230.44  40960  -0.00  4   9  -0.00  2   5
 240.56  40960  +0.00  3   6  -0.00  3   7
 228.39  40960  +0.00  3   7  -0.00  3   9
 237.85640  -0.00  4  10  +0.00  3   8
 250.74640  +0.00  4  11  -0.00  3   7
 249.06640  +0.00  4   9  -0.00  3   9
 114.98640  -0.00  3   8  +0.00  3   8
 120.59640  +0.00  3   7  +0.00  3   7
 190.66 10  -0.00  3   7  +0.00  3   7
 228.83 10  +0.00  3   7  -0.00  3   6
  18.91 10  +0.00  3   8  -0.00  3   8
  12.39 10  +0.00  3   8  +0.00  4   8
  12.01 10  +0.00  4   9  -0.00  4   9

Each line has statistics from 100 samples collected in 0.1 second
interval.

The frequency error in the second "freq" column with values up to 0.11
ppm shows the problem with the clock very slowly correcting a large
NTP error.

I can add some limits for the measured errors and submit it as new a
kselftest. If the measured precision is too large (e.g. >100ns), the
test can return "skip" in order to avoid false negatives.

If adjtimex() had an option to return the NTP error directly, or it
was possible to read the two clocks at the same time, the test could
be much more sensitive and observe shorter intervals (spanning fewer
clock updates).

-- 
Miroslav Lichvar


Re: [PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
On Wed, May 17, 2017 at 10:02:00AM -0700, John Stultz wrote:
> On Wed, May 17, 2017 at 9:57 AM, Miroslav Lichvar <mlich...@redhat.com> wrote:
> > On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote:
> >> Could you submit your linux-tktest infrastructure to the kselftests dir?
> >
> > I can, but it's a mess that breaks frequently as the timekeeping and
> > other kernel code changes. Are you sure you want that in the kernel
> > tree? :)
> 
> Being a mess is a slight concern, but as for breaking, if its
> in-kernel, then folks can't make changes that break it, right?

It duplicates/stubs quite a few kernel functions that are needed to
compile and link the timekeeping.c file into an executable. See
linux-tktest/missing.c. If their signature changes, or new functions
are needed, it will break.

Is there a better way to run the timekeeping code in an userspace
application? I suspect it would need something like the Linux Kernel
Library project.

-- 
Miroslav Lichvar


Re: [PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
On Wed, May 17, 2017 at 10:02:00AM -0700, John Stultz wrote:
> On Wed, May 17, 2017 at 9:57 AM, Miroslav Lichvar  wrote:
> > On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote:
> >> Could you submit your linux-tktest infrastructure to the kselftests dir?
> >
> > I can, but it's a mess that breaks frequently as the timekeeping and
> > other kernel code changes. Are you sure you want that in the kernel
> > tree? :)
> 
> Being a mess is a slight concern, but as for breaking, if its
> in-kernel, then folks can't make changes that break it, right?

It duplicates/stubs quite a few kernel functions that are needed to
compile and link the timekeeping.c file into an executable. See
linux-tktest/missing.c. If their signature changes, or new functions
are needed, it will break.

Is there a better way to run the timekeeping code in an userspace
application? I suspect it would need something like the Linux Kernel
Library project.

-- 
Miroslav Lichvar


Re: [PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote:
> So thanks for sending these out. I still need to look them over in
> depth, but can I make another ask here?  :)
> 
> Could you submit your linux-tktest infrastructure to the kselftests dir?

I can, but it's a mess that breaks frequently as the timekeeping and
other kernel code changes. Are you sure you want that in the kernel
tree? :)

-- 
Miroslav Lichvar


Re: [PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
On Wed, May 17, 2017 at 09:30:31AM -0700, John Stultz wrote:
> So thanks for sending these out. I still need to look them over in
> depth, but can I make another ask here?  :)
> 
> Could you submit your linux-tktest infrastructure to the kselftests dir?

I can, but it's a mess that breaks frequently as the timekeeping and
other kernel code changes. Are you sure you want that in the kernel
tree? :)

-- 
Miroslav Lichvar


[PATCH RFC 1/3] timekeeping: Remove support for old vsyscalls

2017-05-17 Thread Miroslav Lichvar
As the last users of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated,
the support can be removed from the timekeeping code.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 include/linux/timekeeper_internal.h |  7 --
 kernel/time/Kconfig |  4 
 kernel/time/timekeeping.c   | 44 -
 3 files changed, 55 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 110f453..b7ae5b0 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -132,13 +132,6 @@ struct timekeeper {
 extern void update_vsyscall(struct timekeeper *tk);
 extern void update_vsyscall_tz(void);
 
-#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD)
-
-extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
-   struct clocksource *c, u32 mult,
-   u64 cycle_last);
-extern void update_vsyscall_tz(void);
-
 #else
 
 static inline void update_vsyscall(struct timekeeper *tk)
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 4008d9f..55d61a3 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE
 config GENERIC_TIME_VSYSCALL
bool
 
-# Timekeeping vsyscall support
-config GENERIC_TIME_VSYSCALL_OLD
-   bool
-
 # Old style timekeeping
 config ARCH_USES_GETTIMEOFFSET
bool
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8fd77c6..ff542dd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -487,44 +487,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
update_fast_timekeeper(_dummy, _fast_raw);
 }
 
-#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD
-
-static inline void update_vsyscall(struct timekeeper *tk)
-{
-   struct timespec xt, wm;
-
-   xt = timespec64_to_timespec(tk_xtime(tk));
-   wm = timespec64_to_timespec(tk->wall_to_monotonic);
-   update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult,
-   tk->tkr_mono.cycle_last);
-}
-
-static inline void old_vsyscall_fixup(struct timekeeper *tk)
-{
-   s64 remainder;
-
-   /*
-   * Store only full nanoseconds into xtime_nsec after rounding
-   * it up and add the remainder to the error difference.
-   * XXX - This is necessary to avoid small 1ns inconsistnecies caused
-   * by truncating the remainder in vsyscalls. However, it causes
-   * additional work to be done in timekeeping_adjust(). Once
-   * the vsyscall implementations are converted to use xtime_nsec
-   * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD
-   * users are removed, this can be killed.
-   */
-   remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 
1);
-   if (remainder != 0) {
-   tk->tkr_mono.xtime_nsec -= remainder;
-   tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift;
-   tk->ntp_error += remainder << tk->ntp_error_shift;
-   tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << 
tk->ntp_error_shift;
-   }
-}
-#else
-#define old_vsyscall_fixup(tk)
-#endif
-
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk, bool was_set)
@@ -2065,12 +2027,6 @@ void update_wall_time(void)
timekeeping_adjust(tk, offset);
 
/*
-* XXX This can be killed once everyone converts
-* to the new update_vsyscall.
-*/
-   old_vsyscall_fixup(tk);
-
-   /*
 * Finally, make sure that after the rounding
 * xtime_nsec isn't larger than NSEC_PER_SEC
 */
-- 
2.9.3



[PATCH RFC 1/3] timekeeping: Remove support for old vsyscalls

2017-05-17 Thread Miroslav Lichvar
As the last users of CONFIG_GENERIC_TIME_VSYSCALL_OLD have been updated,
the support can be removed from the timekeeping code.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 include/linux/timekeeper_internal.h |  7 --
 kernel/time/Kconfig |  4 
 kernel/time/timekeeping.c   | 44 -
 3 files changed, 55 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 110f453..b7ae5b0 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -132,13 +132,6 @@ struct timekeeper {
 extern void update_vsyscall(struct timekeeper *tk);
 extern void update_vsyscall_tz(void);
 
-#elif defined(CONFIG_GENERIC_TIME_VSYSCALL_OLD)
-
-extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
-   struct clocksource *c, u32 mult,
-   u64 cycle_last);
-extern void update_vsyscall_tz(void);
-
 #else
 
 static inline void update_vsyscall(struct timekeeper *tk)
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 4008d9f..55d61a3 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -21,10 +21,6 @@ config CLOCKSOURCE_VALIDATE_LAST_CYCLE
 config GENERIC_TIME_VSYSCALL
bool
 
-# Timekeeping vsyscall support
-config GENERIC_TIME_VSYSCALL_OLD
-   bool
-
 # Old style timekeeping
 config ARCH_USES_GETTIMEOFFSET
bool
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8fd77c6..ff542dd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -487,44 +487,6 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
update_fast_timekeeper(_dummy, _fast_raw);
 }
 
-#ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD
-
-static inline void update_vsyscall(struct timekeeper *tk)
-{
-   struct timespec xt, wm;
-
-   xt = timespec64_to_timespec(tk_xtime(tk));
-   wm = timespec64_to_timespec(tk->wall_to_monotonic);
-   update_vsyscall_old(, , tk->tkr_mono.clock, tk->tkr_mono.mult,
-   tk->tkr_mono.cycle_last);
-}
-
-static inline void old_vsyscall_fixup(struct timekeeper *tk)
-{
-   s64 remainder;
-
-   /*
-   * Store only full nanoseconds into xtime_nsec after rounding
-   * it up and add the remainder to the error difference.
-   * XXX - This is necessary to avoid small 1ns inconsistnecies caused
-   * by truncating the remainder in vsyscalls. However, it causes
-   * additional work to be done in timekeeping_adjust(). Once
-   * the vsyscall implementations are converted to use xtime_nsec
-   * (shifted nanoseconds), and CONFIG_GENERIC_TIME_VSYSCALL_OLD
-   * users are removed, this can be killed.
-   */
-   remainder = tk->tkr_mono.xtime_nsec & ((1ULL << tk->tkr_mono.shift) - 
1);
-   if (remainder != 0) {
-   tk->tkr_mono.xtime_nsec -= remainder;
-   tk->tkr_mono.xtime_nsec += 1ULL << tk->tkr_mono.shift;
-   tk->ntp_error += remainder << tk->ntp_error_shift;
-   tk->ntp_error -= (1ULL << tk->tkr_mono.shift) << 
tk->ntp_error_shift;
-   }
-}
-#else
-#define old_vsyscall_fixup(tk)
-#endif
-
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk, bool was_set)
@@ -2065,12 +2027,6 @@ void update_wall_time(void)
timekeeping_adjust(tk, offset);
 
/*
-* XXX This can be killed once everyone converts
-* to the new update_vsyscall.
-*/
-   old_vsyscall_fixup(tk);
-
-   /*
 * Finally, make sure that after the rounding
 * xtime_nsec isn't larger than NSEC_PER_SEC
 */
-- 
2.9.3



[PATCH RFC 2/3] timekeeping: Don't align frequency adjustments to ticks

2017-05-17 Thread Miroslav Lichvar
When the timekeeping multiplier is adjusted, the NTP error is adjusted
to correct the clock for the misalignment of the update to the start of
the tick. This error is corrected in later updates and the clock appears
as if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ff542dd..5ae6f27 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1760,8 +1760,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1772,7 +1770,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*
-- 
2.9.3



[PATCH RFC 2/3] timekeeping: Don't align frequency adjustments to ticks

2017-05-17 Thread Miroslav Lichvar
When the timekeeping multiplier is adjusted, the NTP error is adjusted
to correct the clock for the misalignment of the update to the start of
the tick. This error is corrected in later updates and the clock appears
as if the frequency was changed exactly on the tick.

Remove this correction to keep the point where the frequency is
effectively changed at the time of the update. This removes a major
source of the NTP error.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 kernel/time/timekeeping.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ff542dd..5ae6f27 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1760,8 +1760,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 *  xtime_nsec_2 = xtime_nsec_1 - offset
 * Which simplfies to:
 *  xtime_nsec -= offset
-*
-* XXX - TODO: Doc ntp_error calculation.
 */
if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) {
/* NTP adjustment caused clocksource mult overflow */
@@ -1772,7 +1770,6 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->tkr_mono.mult += mult_adj;
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
-   tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 }
 
 /*
-- 
2.9.3



[PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
This is an attempt to improve stability and accuracy of the system clock
with very accurate time sources like the new PTP KVM clock or NTP/PTP
using hardware timestamping. It affects mainly kernels running with
NOHZ. It requires updating of the old ia64 and powerpc vsyscalls.

The main problem is that the error accumulated in the ntp_error register
takes too long to correct and this cannot be easily fixed. There are
four sources of the error:
- rounding of time for old vsyscalls
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multipler

Instead of trying to correct the error faster, the patches remove the
first three sources. With the only remaining source the correction logic
can be simplified and the frequency of the clock is much more stable and
accurate.

Simulations of a frequency step in linux-tktest (values are in ppm and
nanoseconds):

Before:

nohz on [1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   1.472221341.32217.8   0.06322   0.2   0.5
30   0.20799 849.52448.7   0.06311   0.2   0.6
100  0.04101 492.12895.2   0.06311   0.2   0.5
300  0.05660 295.53026.1   0.02064  28.3 108.9
1000 0.01994 409.82732.1   0.00355  13.7  52.2
3000 0.00477 469.13238.9   0.00070  11.0  40.9
10.00081 377.33791.6   0.00013   9.4  36.2
30.00016 259.94055.7   0.4   8.9  34.1
10   0.3 159.04177.2   0.0  13.7  58.4

nohz off[1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   3.55062   6.2  10.8   0.05730   0.0   0.0
30   0.44672   4.5  14.1   0.05724   0.2   0.5
100  0.03649   2.7  17.4   0.05711   0.2   0.5
300  0.05815   1.7  18.7   0.06313   0.2   0.5
1000 0.06270   1.0  19.1   0.06315   0.2   0.5
3000 0.05720   1.9  19.9   0.02065   1.1   4.1
10.01947  13.5  41.0   0.00339   0.5   1.7
30.00448  17.5  75.9   0.00065   0.3   1.0
10   0.00078  14.2 101.7   0.00012   0.2   0.7

After:

nohz on [1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   0.01584   9.0  14.2   0.02937   2.7   7.2
30   0.00426  10.9  22.4   0.00481   6.5  19.2
100  0.00077  11.6  26.3   0.00074   9.0  26.9
300  0.00013  12.4  29.9   0.00018   8.7  29.3
1000 0.3  12.6  31.8   0.3   8.7  32.1
3000 0.1  12.6  33.3   0.1   9.1  33.4
10.0  12.9  34.0   0.0   9.0  34.1
30.0  12.8  34.5   0.0   9.0  34.5
10   0.0  16.5  51.2   0.0  13.7  58.5

nohz off[1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   0.10309   0.1   0.1   0.12717   0.0   0.1
30   0.04269   0.1   0.3   0.02592   0.1   0.4
100  0.00629   0.3   0.5   0.00521   0.2   0.5
300  0.00109   0.3   0.6   0.00099   0.2   0.5
1000 0.00019   0.3   0.6   0.00022   0.2   0.6
3000 0.2   0.3   0.6   0.2   0.2   0.6
10.0   0.3   0.6   0.0   0.2   0.6
30.0   0.3   0.6   0.0   0.2   0.6
10   0.0   0.3   0.6   0.0   0.2   0.6

Miroslav Lichvar (3):
  timekeeping: Remove support for old vsyscalls
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   9 +-
 kernel/time/Kconfig |   4 -
 kernel/time/timekeeping.c   | 184 +---
 3 files changed, 48 insertions(+), 149 deletions(-)

-- 
2.9.3



[PATCH RFC 0/3] Improve stability of system clock

2017-05-17 Thread Miroslav Lichvar
This is an attempt to improve stability and accuracy of the system clock
with very accurate time sources like the new PTP KVM clock or NTP/PTP
using hardware timestamping. It affects mainly kernels running with
NOHZ. It requires updating of the old ia64 and powerpc vsyscalls.

The main problem is that the error accumulated in the ntp_error register
takes too long to correct and this cannot be easily fixed. There are
four sources of the error:
- rounding of time for old vsyscalls
- alignment of frequency adjustments to ticks
- iterative correction of the multiplier
- limited resolution of the multipler

Instead of trying to correct the error faster, the patches remove the
first three sources. With the only remaining source the correction logic
can be simplified and the frequency of the clock is much more stable and
accurate.

Simulations of a frequency step in linux-tktest (values are in ppm and
nanoseconds):

Before:

nohz on [1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   1.472221341.32217.8   0.06322   0.2   0.5
30   0.20799 849.52448.7   0.06311   0.2   0.6
100  0.04101 492.12895.2   0.06311   0.2   0.5
300  0.05660 295.53026.1   0.02064  28.3 108.9
1000 0.01994 409.82732.1   0.00355  13.7  52.2
3000 0.00477 469.13238.9   0.00070  11.0  40.9
10.00081 377.33791.6   0.00013   9.4  36.2
30.00016 259.94055.7   0.4   8.9  34.1
10   0.3 159.04177.2   0.0  13.7  58.4

nohz off[1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   3.55062   6.2  10.8   0.05730   0.0   0.0
30   0.44672   4.5  14.1   0.05724   0.2   0.5
100  0.03649   2.7  17.4   0.05711   0.2   0.5
300  0.05815   1.7  18.7   0.06313   0.2   0.5
1000 0.06270   1.0  19.1   0.06315   0.2   0.5
3000 0.05720   1.9  19.9   0.02065   1.1   4.1
10.01947  13.5  41.0   0.00339   0.5   1.7
30.00448  17.5  75.9   0.00065   0.3   1.0
10   0.00078  14.2 101.7   0.00012   0.2   0.7

After:

nohz on [1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   0.01584   9.0  14.2   0.02937   2.7   7.2
30   0.00426  10.9  22.4   0.00481   6.5  19.2
100  0.00077  11.6  26.3   0.00074   9.0  26.9
300  0.00013  12.4  29.9   0.00018   8.7  29.3
1000 0.3  12.6  31.8   0.3   8.7  32.1
3000 0.1  12.6  33.3   0.1   9.1  33.4
10.0  12.9  34.0   0.0   9.0  34.1
30.0  12.8  34.5   0.0   9.0  34.5
10   0.0  16.5  51.2   0.0  13.7  58.5

nohz off[1, samples/2]   [samples/2 + 1, samples]
samples freq   dev   max  freq   dev   max
10   0.10309   0.1   0.1   0.12717   0.0   0.1
30   0.04269   0.1   0.3   0.02592   0.1   0.4
100  0.00629   0.3   0.5   0.00521   0.2   0.5
300  0.00109   0.3   0.6   0.00099   0.2   0.5
1000 0.00019   0.3   0.6   0.00022   0.2   0.6
3000 0.2   0.3   0.6   0.2   0.2   0.6
10.0   0.3   0.6   0.0   0.2   0.6
30.0   0.3   0.6   0.0   0.2   0.6
10   0.0   0.3   0.6   0.0   0.2   0.6

Miroslav Lichvar (3):
  timekeeping: Remove support for old vsyscalls
  timekeeping: Don't align frequency adjustments to ticks
  timekeeping: Determine multiplier directly from NTP tick length

 include/linux/timekeeper_internal.h |   9 +-
 kernel/time/Kconfig |   4 -
 kernel/time/timekeeping.c   | 184 +---
 3 files changed, 48 insertions(+), 149 deletions(-)

-- 
2.9.3



[PATCH RFC 3/3] timekeeping: Determine multiplier directly from NTP tick length

2017-05-17 Thread Miroslav Lichvar
When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP implementation corrects the initial offset of the clock, a large
value may accumulate in the NTP error before the multiplier converges to
the correct value. It may then take a very long time (hours or even
days) before the error is corrected. This creates a small unstable
frequency offset and prevents stable synchronization of the clock with
very stable time sources (e.g. NTP/PTP using hardware timestamping or
PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length in order to replace the iterative approach and remove the
last major source of the NTP error. The only remaining source of the
error is now limited resolution of the multiplier, which can be
effectively corrected by adding 0 or 1 to the multiplier according to
the sign of the error.

Cc: John Stultz <john.stu...@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Richard Cochran <richardcoch...@gmail.com>
Signed-off-by: Miroslav Lichvar <mlich...@redhat.com>
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 137 
 2 files changed, 48 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index b7ae5b0..e9a46e5 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -113,6 +113,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5ae6f27..b4cc606 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -289,6 +289,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1699,20 +1700,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1773,85 +1773,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool negative;
-   u32 adj_scale;
-
-   /* Remove any current error adj from freq calculation */
-   if (tk->ntp_err_mult)
-   xinterval -= tk->cycle_interval;
-
-   tk->ntp_tick = ntp_tick_length();
-
-   /* Calculate current error per tick */
-   tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-   tick_error -= (xinterval + tk->xtime_remainder);
-
-   /* Don't worry about correcting it if its small */
-   if (likely((tick_error >= 0) && (tick_error <= interval)))
-   return;
-
-   /* preserve the direction of correction */
-   negative = (tick_error < 0);
+   u32 mult;
 
-   /* If any adjustment would pass the max, just return */
-   if (negative && (cu

[PATCH RFC 3/3] timekeeping: Determine multiplier directly from NTP tick length

2017-05-17 Thread Miroslav Lichvar
When the length of the NTP tick changes significantly, e.g. when an
NTP/PTP implementation corrects the initial offset of the clock, a large
value may accumulate in the NTP error before the multiplier converges to
the correct value. It may then take a very long time (hours or even
days) before the error is corrected. This creates a small unstable
frequency offset and prevents stable synchronization of the clock with
very stable time sources (e.g. NTP/PTP using hardware timestamping or
PTP KVM clock).

Use division to determine the correct multiplier directly from the NTP
tick length in order to replace the iterative approach and remove the
last major source of the NTP error. The only remaining source of the
error is now limited resolution of the multiplier, which can be
effectively corrected by adding 0 or 1 to the multiplier according to
the sign of the error.

Cc: John Stultz 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 include/linux/timekeeper_internal.h |   2 +
 kernel/time/timekeeping.c   | 137 
 2 files changed, 48 insertions(+), 91 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index b7ae5b0..e9a46e5 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -113,6 +113,8 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+   /* Flag used to avoid updating NTP twice with same second */
+   u32 skip_second_overflow;
 #ifdef CONFIG_DEBUG_TIMEKEEPING
longlast_warning;
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5ae6f27..b4cc606 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -289,6 +289,7 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+   tk->skip_second_overflow = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1699,20 +1700,19 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 s64 offset,
-bool negative,
-int adj_scale)
+s32 mult_adj)
 {
s64 interval = tk->cycle_interval;
-   s32 mult_adj = 1;
 
-   if (negative) {
-   mult_adj = -mult_adj;
+   if (mult_adj == 0) {
+   return;
+   } else if (mult_adj == -1) {
interval = -interval;
-   offset  = -offset;
+   offset = -offset;
+   } else if (mult_adj != 1) {
+   interval *= mult_adj;
+   offset *= mult_adj;
}
-   mult_adj <<= adj_scale;
-   interval <<= adj_scale;
-   offset <<= adj_scale;
 
/*
 * So the following can be confusing.
@@ -1773,85 +1773,35 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
 }
 
 /*
- * Calculate the multiplier adjustment needed to match the frequency
- * specified by NTP
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
  */
-static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
-   s64 offset)
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-   s64 interval = tk->cycle_interval;
-   s64 xinterval = tk->xtime_interval;
-   u32 base = tk->tkr_mono.clock->mult;
-   u32 max = tk->tkr_mono.clock->maxadj;
-   u32 cur_adj = tk->tkr_mono.mult;
-   s64 tick_error;
-   bool negative;
-   u32 adj_scale;
-
-   /* Remove any current error adj from freq calculation */
-   if (tk->ntp_err_mult)
-   xinterval -= tk->cycle_interval;
-
-   tk->ntp_tick = ntp_tick_length();
-
-   /* Calculate current error per tick */
-   tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-   tick_error -= (xinterval + tk->xtime_remainder);
-
-   /* Don't worry about correcting it if its small */
-   if (likely((tick_error >= 0) && (tick_error <= interval)))
-   return;
-
-   /* preserve the direction of correction */
-   negative = (tick_error < 0);
+   u32 mult;
 
-   /* If any adjustment would pass the max, just return */
-   if (negative && (cur_adj - 1) <= (base - max))
-   return;
-   if (!negative && (cur_adj + 1) >= (base + m

Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)

2017-05-17 Thread Miroslav Lichvar
On Fri, May 12, 2017 at 10:26:12AM -0700, John Stultz wrote:
> On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar <mlich...@redhat.com> wrote:
> > I see this with real PHCs and PTP/NTP synchronization too. It's very
> > confusing when the timekeeping changes so much for no apparent reason.
> > If we can't remove the old vsyscalls yet, I was thinking maybe a new
> > flag could be added to adjtimex to report the error, so applications
> > can at least detect this problem and consider stepping the clock in
> > order to reset the error?
> >
> > Thoughts?
> 
> I'd rather not have short-term hacks that applications have to adapt.
> So I think we should drop the old vsyscall method in the near term.
> Sorry this sort of fell off my radar.

Ok. Sounds good.

> Do you have an updated set of patches you want to get ready to address
> the issue? We can get those reviewed while we increase the pressure on
> dropping the OLD_VSYSCALL implementations.

I'll send an RFC series shortly.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)

2017-05-17 Thread Miroslav Lichvar
On Fri, May 12, 2017 at 10:26:12AM -0700, John Stultz wrote:
> On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar  wrote:
> > I see this with real PHCs and PTP/NTP synchronization too. It's very
> > confusing when the timekeeping changes so much for no apparent reason.
> > If we can't remove the old vsyscalls yet, I was thinking maybe a new
> > flag could be added to adjtimex to report the error, so applications
> > can at least detect this problem and consider stepping the clock in
> > order to reset the error?
> >
> > Thoughts?
> 
> I'd rather not have short-term hacks that applications have to adapt.
> So I think we should drop the old vsyscall method in the near term.
> Sorry this sort of fell off my radar.

Ok. Sounds good.

> Do you have an updated set of patches you want to get ready to address
> the issue? We can get those reviewed while we increase the pressure on
> dropping the OLD_VSYSCALL implementations.

I'll send an RFC series shortly.

Thanks,

-- 
Miroslav Lichvar


Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)

2017-05-12 Thread Miroslav Lichvar
On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> > I spent some time trying to figure out a workaround for the nanosecond
> > rounding, but I didn't find anything that wouldn't complicate the mult
> > adjustment logic and bring back the problems which the direct division
> > approach is supposed to solve.
> >
> > It seems it may be a while before the old vsyscalls are fixed. How
> > about including only the first two patches from this set for now?

> So thanks for the ping here. If you're happy with the first two as an
> initial step, I'd be willing to try to push those in. The only trouble
> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
> has cooked up. While there isn't likely to be direct conflicts in the
> changes, I get nervous about mixing too many changes in subtle code at once.

I'm sorry for digging up this skeleton. Are we any closer to being
able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented
simplifying the steering logic of the internal clock?

The two patches added in 3.17 helped, but there is still the problem
that it takes too long for the kernel clock to converge to the
internal NTP time when a large error was accumulated, e.g. a large
frequency adjustment was made to correct the initial offset of the
clock.

It seems the error can reach milliseconds and take hours or even days
to be corrected. As the rate at which the error is decreasing is not
constant (due to random clock updates), it doesn't seem to be possible
to synchronize the clock with better stability than few tens or
hundreds of nanoseconds.

With the new PTP KVM clock the problem can be easily observed. Here is
a graph of the offset and frequency as measured by chronyd when
configured to synchronize the guest's clock to the host using the
virtual PHC. In the middle is the point when the NTP error reached
zero. The apparent frequency jumped by about 50 ppb and the offset
improved by an order of magnitude.

https://mlichvar.fedorapeople.org/tmp/kvm_phc.png

I see this with real PHCs and PTP/NTP synchronization too. It's very
confusing when the timekeeping changes so much for no apparent reason.
If we can't remove the old vsyscalls yet, I was thinking maybe a new
flag could be added to adjtimex to report the error, so applications
can at least detect this problem and consider stepping the clock in
order to reset the error?

Thoughts?

-- 
Miroslav Lichvar


Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)

2017-05-12 Thread Miroslav Lichvar
On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> > I spent some time trying to figure out a workaround for the nanosecond
> > rounding, but I didn't find anything that wouldn't complicate the mult
> > adjustment logic and bring back the problems which the direct division
> > approach is supposed to solve.
> >
> > It seems it may be a while before the old vsyscalls are fixed. How
> > about including only the first two patches from this set for now?

> So thanks for the ping here. If you're happy with the first two as an
> initial step, I'd be willing to try to push those in. The only trouble
> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
> has cooked up. While there isn't likely to be direct conflicts in the
> changes, I get nervous about mixing too many changes in subtle code at once.

I'm sorry for digging up this skeleton. Are we any closer to being
able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented
simplifying the steering logic of the internal clock?

The two patches added in 3.17 helped, but there is still the problem
that it takes too long for the kernel clock to converge to the
internal NTP time when a large error was accumulated, e.g. a large
frequency adjustment was made to correct the initial offset of the
clock.

It seems the error can reach milliseconds and take hours or even days
to be corrected. As the rate at which the error is decreasing is not
constant (due to random clock updates), it doesn't seem to be possible
to synchronize the clock with better stability than few tens or
hundreds of nanoseconds.

With the new PTP KVM clock the problem can be easily observed. Here is
a graph of the offset and frequency as measured by chronyd when
configured to synchronize the guest's clock to the host using the
virtual PHC. In the middle is the point when the NTP error reached
zero. The apparent frequency jumped by about 50 ppb and the offset
improved by an order of magnitude.

https://mlichvar.fedorapeople.org/tmp/kvm_phc.png

I see this with real PHCs and PTP/NTP synchronization too. It's very
confusing when the timekeeping changes so much for no apparent reason.
If we can't remove the old vsyscalls yet, I was thinking maybe a new
flag could be added to adjtimex to report the error, so applications
can at least detect this problem and consider stepping the clock in
order to reset the error?

Thoughts?

-- 
Miroslav Lichvar


Re: [patch 4/5] PTP: add PTP_SYS_OFFSET emulation via cross timestamps infrastructure

2017-01-24 Thread Miroslav Lichvar
On Tue, Jan 24, 2017 at 06:32:43AM +0100, Richard Cochran wrote:
> On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote:
> > Can you please describe what problem exists with this scheme?
> 
> This new kernel code exists just because chrony doesn't implement the
> PRECISE ioctl.  Instead of adding new "fake" modes, just teach chrony
> about the better method.

The latest development code of chrony now supports the PRECISE ioctl.

I did some tests with an e1000e NIC (i219) and it seemed the stability
was slightly worse than with the non-PRECISE ioctl, but there was a
400-500ns offset between the two, so it should be much more accurate
(we finally have something that avoids the asymmetry on the PCI-e
bus?). Configuring the refclock with a shorter dpoll should compensate
for the decrease in stability. In any case, there is a "nocrossts"
option to not use the PRECISE ioctl even if it's supported.

-- 
Miroslav Lichvar


Re: [patch 4/5] PTP: add PTP_SYS_OFFSET emulation via cross timestamps infrastructure

2017-01-24 Thread Miroslav Lichvar
On Tue, Jan 24, 2017 at 06:32:43AM +0100, Richard Cochran wrote:
> On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote:
> > Can you please describe what problem exists with this scheme?
> 
> This new kernel code exists just because chrony doesn't implement the
> PRECISE ioctl.  Instead of adding new "fake" modes, just teach chrony
> about the better method.

The latest development code of chrony now supports the PRECISE ioctl.

I did some tests with an e1000e NIC (i219) and it seemed the stability
was slightly worse than with the non-PRECISE ioctl, but there was a
400-500ns offset between the two, so it should be much more accurate
(we finally have something that avoids the asymmetry on the PCI-e
bus?). Configuring the refclock with a shorter dpoll should compensate
for the decrease in stability. In any case, there is a "nocrossts"
option to not use the PRECISE ioctl even if it's supported.

-- 
Miroslav Lichvar


  1   2   >