Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-08 Thread Thomas Gleixner
On Mon, 8 Jun 2015, Thomas Gleixner wrote:
> On Mon, 8 Jun 2015, John Stultz wrote:
> > Now, It could be possible to do a lighter weight version of my patch,
> > which just does the adjustment only for the hrtimer_interrupt code
> > (leaving the rest of the read paths alone).
> 
> Yes, that should work. As long as I can keep the cached values in the
> hrtimer cpu bases and the whole thing keeps the clock_was_set_seq
> logic intact.
> 
> If we do not do the conditional version, then on every hrtimer
> interrupt we write THREE cachelines for nothing.
> 
> And if we cannot cache the offsets, then we end up calling into the
> timekeeping code for every timer which is not CLOCK_MONOTONIC based
> and retrieve the offset. That hurts especially on 32bit machines,
> because we need to protect the readout with the timekeeper sequence
> counter.

Below is a patch which just has the required data in the right place
and the changes to ktime_get_update_offsets_now().

It's simpler than your version as:

- there is no requirement to do the add in the first place as we
  know the monotonic time at which the leap second happens.

- we can precalculate the leap adjusted offset and just replace
  the non adjusted offset for the window.

When the whole thing is over you just need to increment the
clock_was_set_seq counter so the next timer interrupt will cache the
normal values again.

Thanks,

tglx

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e1f5a1136554..ecd193c23676 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -90,6 +90,9 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   ktime_t next_leap_ktime;
+   ktime_t offs_real_leap_adjusted;
+
struct timespec64   raw_time;
 
/* The following members are for timekeeping internal use */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 90ed5db67c1d..0de85bf9e331 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1952,15 +1952,21 @@ ktime_t ktime_get_update_offsets_now(unsigned int 
*cwsseq, ktime_t *offs_real,
 
base = tk->tkr_mono.base;
nsecs = timekeeping_get_ns(&tk->tkr_mono);
+   base = ktime_add_ns(base, nsecs);
+
if (*cwsseq != tk->clock_was_set_seq) {
*cwsseq = tk->clock_was_set_seq;
*offs_real = tk->offs_real;
*offs_boot = tk->offs_boot;
*offs_tai = tk->offs_tai;
}
+
+   if (base.tv64 >= tk->next_leap_ktime.tv64)
+   *offs_real = tk->offs_real_leap_adjusted;
+
} while (read_seqcount_retry(&tk_core.seq, seq));
 
-   return ktime_add_ns(base, nsecs);
+   return base;
 }
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-08 Thread Thomas Gleixner
On Mon, 8 Jun 2015, John Stultz wrote:
> On Sat, Jun 6, 2015 at 2:44 AM, Thomas Gleixner  wrote:
> > On Fri, 5 Jun 2015, Peter Zijlstra wrote:
> >
> >> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> >> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> >> > > That leaves the question; for who is this exact second edge important?
> >> >
> >> > Distributed applications using the UTC time scale.
> >> >
> >> > Many control applications are done with a 1 millisecond period.
> >> > Having the time wrong by a second for 10 or 100 loops is bad news.
> >>
> >> Firstly I would strongly suggest such applications not use UTC because
> >> of this, I think TAI was invented for just this reason.
> >>
> >> Secondly, how would John's patches help with this? Usespace loops
> >> reading time would be using the VDSO and would still not get the right
> >> time, and timers would be subject to the same IRQ latency that a hrtimer
> >> based leap second insert would, and would still very much not be in-sync
> >> across the cluster.
> >
> > So the only thing which is fixed are in kernel users and therefor
> > hrtimers.
> 
> Well, for vdso systems, hrtimers and adjtimex (which is the only
> interface that provides enough information to understand where in a
> leapsecond you actually are).
> 
> And again, vdsos are fixable, but I hesitated due to my concerns about
> the extra performance overhead, the smaller benefit it provides
> relative to not having timers expiring early.

Right, and I'm concerned about the extra overhead of your patch. Just
look at the cache layout.

Current:

struct timekeeper {
   struct tk_read_basetkr_mono; /* 056 */
   struct tk_read_basetkr_raw;  /*5656 */
   /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
   u64xtime_sec;/*   112 8 */
   long unsigned int  ktime_sec;/*   120 8 */
   /* --- cacheline 2 boundary (128 bytes) --- */
   struct timespecwall_to_monotonic;/*   12816 */
   ktime_toffs_real;/*   144 8 */
   ktime_toffs_boot;/*   152 8 */
   ktime_toffs_tai; /*   160 8 */
   s32tai_offset;   /*   168 4 */
   unsigned int   clock_was_set_seq;/*   172 4 */
   struct timespecraw_time; /*   17616 */
   /* --- cacheline 3 boundary (192 bytes) --- */
   cycle_tcycle_interval;   /*   192 8 */
   u64xtime_interval;   /*   200 8 */
   s64xtime_remainder;  /*   208 8 */
   u32raw_interval; /*   216 4 */

   /* XXX 4 bytes hole, try to pack */

   u64ntp_tick; /*   224 8 */
   s64ntp_error;/*   232 8 */
   u32ntp_error_shift;  /*   240 4 */
   u32ntp_err_mult; /*   244 4 */
};

Four cachelines where everything which is considered hotpath is in the
first two cache lines.

With your change this becomes:

struct timekeeper {
   struct tk_read_basetkr_mono; /* 056 */
   struct tk_read_basetkr_raw;  /*5656 */
   /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
   u64xtime_sec;/*   112 8 */
   long unsigned int  ktime_sec;/*   120 8 */
   /* --- cacheline 2 boundary (128 bytes) --- */
   struct timespecwall_to_monotonic;/*   12816 */
   ktime_toffs_real;/*   144 8 */
   ktime_toffs_boot;/*   152 8 */
   ktime_toffs_tai; /*   160 8 */
   s32tai_offset;   /*   168 4 */
   unsigned int   clock_was_set_seq;/*   172 4 */
   struct timespecraw_time; /*   17616 */
   /* --- cacheline 3 boundary (192 bytes) --- */
   cycle_tcycle_interval;   /*   192 8 */
   u64xtime_interval;   /*   200 8 */
   s64xtime_remainder;  /*   208 8 */
   u32raw_interval; /*   216 4 */

   /* XXX 4 bytes hole, try to pack */

   u64ntp_tick; /*   224 8 */
   s64ntp_error;/*   232 8 */
   u32ntp_error_shift;  /*   240 4 */
   u32   

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-08 Thread John Stultz
On Sat, Jun 6, 2015 at 2:44 AM, Thomas Gleixner  wrote:
> On Fri, 5 Jun 2015, Peter Zijlstra wrote:
>
>> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
>> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
>> > > That leaves the question; for who is this exact second edge important?
>> >
>> > Distributed applications using the UTC time scale.
>> >
>> > Many control applications are done with a 1 millisecond period.
>> > Having the time wrong by a second for 10 or 100 loops is bad news.
>>
>> Firstly I would strongly suggest such applications not use UTC because
>> of this, I think TAI was invented for just this reason.
>>
>> Secondly, how would John's patches help with this? Usespace loops
>> reading time would be using the VDSO and would still not get the right
>> time, and timers would be subject to the same IRQ latency that a hrtimer
>> based leap second insert would, and would still very much not be in-sync
>> across the cluster.
>
> So the only thing which is fixed are in kernel users and therefor
> hrtimers.

Well, for vdso systems, hrtimers and adjtimex (which is the only
interface that provides enough information to understand where in a
leapsecond you actually are).

And again, vdsos are fixable, but I hesitated due to my concerns about
the extra performance overhead, the smaller benefit it provides
relative to not having timers expiring early.

> That means the whole leap mess added into the gettime fast path is
> just constant overhead for that corner case.
>
> We can be smarter than that and just block hrtimer delivery for clock
> realtime timers across the leap edge. There should be ways to do that
> non intrusive if we think hard enough about it.

This approach seems like it would add more work to the timer-add
function (to check leapstate and adjust the expiration), which might
be a heavier use case (we adjust for each timer) then the similar
logic done in the update_base_offsets_now() at hrtimer_interrupt time
(adjust for each hrtimer_interrupt).

Now, It could be possible to do a lighter weight version of my patch,
which just does the adjustment only for the hrtimer_interrupt code
(leaving the rest of the read paths alone).  If that is something
you'd prefer.  I'll have to think a bit to ensure the internal
inconsistency isn't otherwise problematic.

Though I suspect fixing adjtimex is worth it as well, since its really
the only interface that can provide a sane view of the leapsecond, and
isn't normally considered a hot path.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-06 Thread Thomas Gleixner
On Fri, 5 Jun 2015, Peter Zijlstra wrote:

> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> > > That leaves the question; for who is this exact second edge important?
> > 
> > Distributed applications using the UTC time scale.
> > 
> > Many control applications are done with a 1 millisecond period.
> > Having the time wrong by a second for 10 or 100 loops is bad news.
> 
> Firstly I would strongly suggest such applications not use UTC because
> of this, I think TAI was invented for just this reason.
> 
> Secondly, how would John's patches help with this? Usespace loops
> reading time would be using the VDSO and would still not get the right
> time, and timers would be subject to the same IRQ latency that a hrtimer
> based leap second insert would, and would still very much not be in-sync
> across the cluster.

So the only thing which is fixed are in kernel users and therefor
hrtimers. 

That means the whole leap mess added into the gettime fast path is
just constant overhead for that corner case.

We can be smarter than that and just block hrtimer delivery for clock
realtime timers across the leap edge. There should be ways to do that
non intrusive if we think hard enough about it.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread John Stultz
On Fri, Jun 5, 2015 at 7:12 AM, Richard Cochran
 wrote:
> On Fri, Jun 05, 2015 at 11:10:08AM +0200, Peter Zijlstra wrote:
>> Firstly I would strongly suggest such applications not use UTC because
>> of this, I think TAI was invented for just this reason.
>
> So I wonder whether the bug in the original post affects TAI timers as
> well...

No, I don't believe so. The TAI timeline doesn't have a
discontinuity(ie: each second is properly represented over a
leapsecond), so it wouldn't have the same issue with the specifics of
when that discontinuity is applied.   Our TAI adjustment is done
atomically with the leap adjustment, so late or not it shouldn't have
problematic behavior.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread John Stultz
On Fri, Jun 5, 2015 at 12:29 AM, Peter Zijlstra  wrote:
> On Thu, Jun 04, 2015 at 05:08:11PM -0700, John Stultz wrote:
>> I'm not sure how this follows. Leap smearing is a different behavior
>> and I'd like to support it (as a separate clockid) as well, but adding
>> that support doesn't change that the existing behavior applying the
>> leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
>> isn't strictly correct.
>
> So the big problem of course is that your patches do not handle the VDSO
> time read, and that is the biggest source of userspace time.
>
> So userspace will still see incorrect time, only in-kernel users (timers
> being your prime example) get the leap second at the 'right' place.

So yes. And as I mentioned in the patch, this is somewhat of a
tradeoff, since adding extra conditionals in the highly optimized vdso
gettime is probably more controversial. Further, since only adjtimex()
provides both the time and the leap-state, its the only interface
which can differentiate between 23:59:59 and 23:59:60, I'm not as sure
leap-insertion on exactly the leap edge is as critical for users not
using adjtimex.

Fixing the issue in the vdso would likely be a following discussion
(which I think is harder to weigh). But this patch ensures the
kernel's internal state is correct, so we don't fire timers early.

> Also note that your argument that timers will now get the correct time
> is subject to the very same timer interrupt jitter as driving the leap
> second stuff from an hrtimer itself -- they're all timers.

So. The key here is that timers set for *after* the leapsecond, will
not fire before.  That's the problem. Timers aren't supposed to fire
early, and that's the invariant we're currently not following. So if
you set a timer for midnight after the leapsecond, it may expire ~a
second early.

Now, since there's no real representation of the leap-second, you
can't set timers for that second, at least until the leap-second has
been applied.

For timers that are set just prior to the leap, its possible they
could think they were expired early, since they may be delayed and not
run until after the leapsecond is applied. So simple gettime calls
might see the time as "before" the expiration time. However, if they
use adjtimex they can see they're in the 23:59:59 w/ TIME_OOP, and
things were correct.

> That leaves the question; for who is this exact second edge important?

I don't have specific cases in mind.

Again, I argued against this sort of change when Richard suggested it
earlier. The rational that its a discontinuity, and it shouldn't
matter if that discontinuity is slightly late, since most folks aren't
using adjtimex() and carefully noting the time state flags to see if a
leap is in progress.

However, when Prarit brought up the early timer expiration issue, it
is harder for me to rationalize ignoring. Timers should not fire
early.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Richard Cochran
On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> That leaves the question; for who is this exact second edge important?

Another one: data loggers for scientific applications using UTC time
stamps.

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Richard Cochran
On Fri, Jun 05, 2015 at 11:10:08AM +0200, Peter Zijlstra wrote:
> Firstly I would strongly suggest such applications not use UTC because
> of this, I think TAI was invented for just this reason.

So I wonder whether the bug in the original post affects TAI timers as
well...

> Secondly, how would John's patches help with this? Usespace loops
> reading time would be using the VDSO and would still not get the right
> time, and timers would be subject to the same IRQ latency that a hrtimer
> based leap second insert would, and would still very much not be in-sync
> across the cluster.

But we have a tick based insertion.  (IIRC, it used to be hrtimer
based, but that was buggy, too).

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Thomas Gleixner
On Fri, 5 Jun 2015, Prarit Bhargava wrote:

> 
> 
> On 06/05/2015 05:04 AM, Richard Cochran wrote:
> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> >> That leaves the question; for who is this exact second edge important?
> > 
> > Distributed applications using the UTC time scale.
> > 
> > Many control applications are done with a 1 millisecond period.
> > Having the time wrong by a second for 10 or 100 loops is bad news.

Control applications with a 1ms period running on CLOCK_REALTIME are
broken by definition.

> > 
> 
> Anything starting @ UTC midnight (I think that would be Beijing, China?) ...
> being one second off is not good.
> 
> P.
> 
> > Thanks,
> > Richard
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Prarit Bhargava


On 06/05/2015 05:04 AM, Richard Cochran wrote:
> On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
>> That leaves the question; for who is this exact second edge important?
> 
> Distributed applications using the UTC time scale.
> 
> Many control applications are done with a 1 millisecond period.
> Having the time wrong by a second for 10 or 100 loops is bad news.
> 

Anything starting @ UTC midnight (I think that would be Beijing, China?) ...
being one second off is not good.

P.

> Thanks,
> Richard
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Peter Zijlstra
On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> > That leaves the question; for who is this exact second edge important?
> 
> Distributed applications using the UTC time scale.
> 
> Many control applications are done with a 1 millisecond period.
> Having the time wrong by a second for 10 or 100 loops is bad news.

Firstly I would strongly suggest such applications not use UTC because
of this, I think TAI was invented for just this reason.

Secondly, how would John's patches help with this? Usespace loops
reading time would be using the VDSO and would still not get the right
time, and timers would be subject to the same IRQ latency that a hrtimer
based leap second insert would, and would still very much not be in-sync
across the cluster.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Richard Cochran
On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> That leaves the question; for who is this exact second edge important?

Distributed applications using the UTC time scale.

Many control applications are done with a 1 millisecond period.
Having the time wrong by a second for 10 or 100 loops is bad news.

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-05 Thread Peter Zijlstra
On Thu, Jun 04, 2015 at 05:08:11PM -0700, John Stultz wrote:
> I'm not sure how this follows. Leap smearing is a different behavior
> and I'd like to support it (as a separate clockid) as well, but adding
> that support doesn't change that the existing behavior applying the
> leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
> isn't strictly correct.

So the big problem of course is that your patches do not handle the VDSO
time read, and that is the biggest source of userspace time.

So userspace will still see incorrect time, only in-kernel users (timers
being your prime example) get the leap second at the 'right' place.

Also note that your argument that timers will now get the correct time
is subject to the very same timer interrupt jitter as driving the leap
second stuff from an hrtimer itself -- they're all timers.

That leaves the question; for who is this exact second edge important?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-04 Thread John Stultz
On Wed, Jun 3, 2015 at 11:48 PM, Ingo Molnar  wrote:
> * John Stultz  wrote:
>>
>> But more importantly, this change to the read path prevents timers that may 
>> be
>> expired before update_wall_time timer runs (most likely on other cpus) from
>> being expired early. Since the time read that is used by the hrtimer 
>> expiration
>> logic is adjusted properly right on that edge.
>
> But with leap second smearing we'd have the same benefits and some more.

I'm not sure how this follows. Leap smearing is a different behavior
and I'd like to support it (as a separate clockid) as well, but adding
that support doesn't change that the existing behavior applying the
leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
isn't strictly correct.

>> > Especially as second smearing appears to be the way superior future method 
>> > of
>> > handling leap seconds.
>>
>> So here the problem is it depends on the user. For probably most users, who
>> really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I 
>> think
>> leap-smears causing any change to other clockids would be surprising). 
>> However,
>> there are some users who expect posix UTC leapsecond behavior. Either because
>> they're positioning telescopes doing things that do depend on strict solar 
>> time,
>> or because they are required (in some cases by law) to use UTC.
>
> That usecase is perfectly OK: they can use the old leap second logic.

Which again, the current leap second logic has slight behavioral
issues the patch I'm proposing is trying to fix.


> What I argue is that we should not add significant complexity to logic that is
> fragile already as-is, and which runs at most only once per year.
>
>> I don't think we can just abandon/break those users, for leap-smearing. So I
>> don't know if we can get away from that complexity.
>
> That's a false dichotomy - I'm not suggesting to 'abandon' them.
>
> I'm suggesting one of these options:
>
>   - Keep the current leap second code as-is, because it's proven. Concentrate 
> on
> leap second smearing instead that is superior and which might emerge as a
> future standard.

I understand that you're saying "focus on the future", which is good
advice.  But if the objection is complexity, adding leap-smearing
isn't going to reduce complexity.

I'd also argue the current leap second code isn't quite "proven". Or
at least it didn't prove itself well last time around. Those major
issues have been addressed, but the strict correctness issue where the
leap second is done at timer time, instead of the second edge is not
currently addressed.

>   - or make leap second insertion much more obvious and easier to verify 
> (i.e. not
> a +100 LOC delta!)

I can work on simplifying and compressing some of the state to try to
reduce the line count and make it easier to verify. But again,
applying the leap-second at the exact second edge means we have to
mimic the state transition in the various read paths, which is going
to still be a non-trivial patch.


>   - or make leap second handling a part of some other existing facility that 
> is
> used much more frequently: for example have you considered making it a
> special, kernel-internal case of settimeofday()? If settimeofday was
> implemented in a fashion to set the time at a given 'time edge', then
> thousands of systems would test that facility day in and out. Leap second
> insertion would simply be a special settimeofday() call that sets the time
> back by one second at midnight June 30 or so. Normal settimeofday() would 
> be a
> call that sets the time back (or forward) at the next seconds edge right 
> now -
> but it would fundamentally use the same facility.
>
> and yes, this variant would necessarily complicate settimeofday(), but 
> that's
> OK, as it's used so frequently so we have adequate testing of it.

I will have to think more about this, but initially it doesn't seem to
make much sense to me (again, I still worry that the specifics of the
issue aren't clear). settimeofday() calls happen immediately, and
under the timekeeping write lock. The issues here with leap seconds is
that we are supposed to make a second repeat exactly on the second
boundary. Since nothing asynchronous can be reliably done at a
specific time, the only way to have correct behavior is to handle the
leap adjustment in the read path, basically checking if we have
crossed that edge, and if so, setting the time we return back by one
second. For the adjtime case its more complicated because we also have
to return other state data (leap-in-progress TIME_OOP flag, or after
the leapsecond is over, the TIME_WAIT flag) which we have to adjust on
that edge as well.

The larger point of finding a way to ensure the code paths are tested
well in an everyday fashion is a good one. I'm just not sure how to
make that happen.

(One could argue having ntp apply the leapsecond from userspace via
settimeofday() rather then 

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-03 Thread Ingo Molnar

* John Stultz  wrote:

> > [...]
> >
> > but the long term trend still dominates. Look at this graph of measurements 
> > of the
> > Earth's rotation:
> >
> >   http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg
> >
> > See how the mean (the green line) was always above zero in the measured 
> > past. The
> > monotonically increasing nature comes from that.
> >
> > and given how many problems we had with leap second insertion, on millions 
> > of
> > installed systems, guess the likelihood of there being a leap second 
> > deleted? How
> > many OSs that can do leap second insertion are unable to do leap second 
> > deletion?
> >
> > Also note that leap second deletion means a jump in time backward. Daylight 
> > saving
> > time is already causing problems with that.
> 
> Err.. Other way around. Leap-second deletion is a jump in time forward 
> (jumping 
> from 23:59:58 to 00:00:00, skipping 23:59:59). Which is simpler to deal with. 
> [...]

Indeed!

Still my point remains: many OSs that can handle leap second insertion don't 
handle leap second deletion, so what are the chances that even a temporary blip 
in 
earth's rotation (which will be offset by the long term trend within a year or 
so 
after the event) will cause a conservative international standards body to 
declare 
an unprecedented leap second deletion? Fairly low I'd say.

> [...] And luckily (at least for us) daylight savings is done in userspace (as 
> UTC, including leapseconds, ideally would be from the kernel providing TAI 
> time).
> 
> But yes, I agree that the leap deletion logic is likely to never run outside 
> of 
> testing.

Which is a red flag.

> > [...]
> >
> > So why make the code more fragile, more complex, just to solve a scenario 
> > that 
> > cannot really be done perfectly?
> 
> So here I worry I didn't communicate clearly enough what the patch does. :(
> 
> Its not about making interrupts more accurate around the leapsecond, its 
> about 
> applying the leapsecond transition in the read-path precisely at the 
> leapsecond 
> edge (rather then a short while later when the timer fires and we update the 
> timekeeping structures).
> 
> But more importantly, this change to the read path prevents timers that may 
> be 
> expired before update_wall_time timer runs (most likely on other cpus) from 
> being expired early. Since the time read that is used by the hrtimer 
> expiration 
> logic is adjusted properly right on that edge.

But with leap second smearing we'd have the same benefits and some more.

> > Especially as second smearing appears to be the way superior future method 
> > of 
> > handling leap seconds.
> 
> So here the problem is it depends on the user. For probably most users, who 
> really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I 
> think 
> leap-smears causing any change to other clockids would be surprising). 
> However, 
> there are some users who expect posix UTC leapsecond behavior. Either because 
> they're positioning telescopes doing things that do depend on strict solar 
> time, 
> or because they are required (in some cases by law) to use UTC.

That usecase is perfectly OK: they can use the old leap second logic.

What I argue is that we should not add significant complexity to logic that is 
fragile already as-is, and which runs at most only once per year.

> I don't think we can just abandon/break those users, for leap-smearing. So I 
> don't know if we can get away from that complexity.

That's a false dichotomy - I'm not suggesting to 'abandon' them.

I'm suggesting one of these options:

  - Keep the current leap second code as-is, because it's proven. Concentrate 
on 
leap second smearing instead that is superior and which might emerge as a 
future standard.

  - or make leap second insertion much more obvious and easier to verify (i.e. 
not
a +100 LOC delta!)

  - or make leap second handling a part of some other existing facility that is
used much more frequently: for example have you considered making it a
special, kernel-internal case of settimeofday()? If settimeofday was
implemented in a fashion to set the time at a given 'time edge', then
thousands of systems would test that facility day in and out. Leap second
insertion would simply be a special settimeofday() call that sets the time
back by one second at midnight June 30 or so. Normal settimeofday() would 
be a
call that sets the time back (or forward) at the next seconds edge right 
now -
but it would fundamentally use the same facility.

and yes, this variant would necessarily complicate settimeofday(), but 
that's
OK, as it's used so frequently so we have adequate testing of it.

A third facility would effectiely be merged with this as well: when NTP
adjusts large offsets (when running with -g, etc.) it will use 
settimeofday(),
right?

I don't think we've exhausted all of these options.

Thanks,

Ingo
--
To unsubs

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-03 Thread John Stultz
On Wed, Jun 3, 2015 at 2:04 AM, Ingo Molnar  wrote:
>
> * John Stultz  wrote:
>
>> > Instead of having these super rare special events, how about implementing 
>> > leap
>> > second smearing instead? That's far less radical and a lot easier to test 
>> > as
>> > well, as it's a continuous mechanism. It will also confuse user-space a lot
>> > less, because there are no sudden time jumps.
>>
>> So yea. Leap smearing/slewing is an attractive solution. The first issue is 
>> that
>> there's no standard yet for the range of time that the slew occurs (or even 
>> if
>> the slew is linear or a curve). The second is I don't think we can actually 
>> get
>> away from supporting UTC w/ leap, as applications may depend on precision. 
>> Also
>> things like NTP sync w/ mixed systems would be problematic, as NTPd and 
>> others
>> need to become savvy of which mode they are working with.
>
> Supporting it minimally is fine - supporting it with clearly unmaintainable
> complexity is not.
>
> So as long as we offer good smearing of the leap second (with a configurable
> parameter for how long the period should be), people in need of better leap
> second handling can take that.
>
>> The leap smearing method of only doing it in private networks and 
>> controlling it
>> by the NTP server is becoming more widespread, but it has its own problems,
>> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't 
>> yet
>> frequency steerable separately from the other clockids, this method ends up
>> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well.
>
> All real time clock derived clocks should smear in sync as well.

Eeerrr.. So CLOCK_TAI is UTC without leapseconds, to smear TAI would
be wrong. Similarly, CLOCK_MONOTONIC/BOOTTIME probably shouldn't be
smeared either (but those are defined less strictly).

>> I'd like to try to get something working in the kernel so we could support
>> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow
>> applications that care to migrate explicitly to the one they care about.
>> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS 
>> so
>> that most applications that don't care can just ignore it.  But finding time 
>> to
>> do this has been hard (if anyone is interested in working on it, I'd be 
>> excited
>> to hear!).
>
> There should definitely be a Kconfig option to just map all relevant clocks to
> smeared seconds. Hopefully this ends up being the standard in a few years and 
> we
> can pin down the exact parameters as well.
>
> Having separate clockids for mixed uses would be fine as well. Maybe.
>
>> But if you think this patch is complicated, creating a new separately steered
>> clockid is not going to be trvial (as there will be lots of ugly edge cases,
>> like what if a leap second is cancelled mid-way through the slewing 
>> adjustment,
>> etc).
>
> Well, I think the main advantage of leap second smearing is that it's not a
> binary, but a continuous interface, and so it's way easier to test than 
> 'sudden'
> leap second insertions.
>
> In fact we could essentially implement leap second smearing via the usual 
> adjtimex
> mechanisms: as far as the time code is concerned it does not matter why a 
> gradual
> adjustment occurs, only the rate of change and the method of convergence is an
> open parameter.
>
> In fact I'd suggest we implement even original leap seconds by doing a 
> high-rate
> 'smearing' in the final X minutes leading up to the leap second, where 'X' 
> could
> be 1 by default. This way we could eliminate leap seconds as a separate 
> logical
> entity mostly.
>
> This should be far more gentle to applications as well than sudden jumps, and
> timers will just work fine as well.

Well, again the problem with high-rate smearing as you describe is
that it would affect CLOCK_MONOTONIC as well, which could cause
periodic timers used for sampling, etc (imagine recording audio, etc)
to slow as well, possibly causing application problems. This is why
the smeared leap-seconds are usually done across a day at a slow rate.

To allow for CLOCK_REALTIME to be frequency adjusted separately from
CLOCK_MONOTONIC/CLOCK_TAI, which would would have the least unwanted
side-effects, we're probably going to have to manage it separately
(like we do w/ MONOTONIC_RAW time). But again, this creates a lot more
complexity.


>> > Secondly, why is there a directional flag? I thought leap seconds can only 
>> > be
>> > inserted.
>>
>> A leap delete isn't likely to occur, but its supported by the adjtimex
>> interface. And given the irregularity of the earths rotation, I'm not sure 
>> I'd
>> rule it out completely.
>
> Well, the long term trend is clear and unambiguous: the rotation of Earth is
> slowing down (the main component of which is losing angular momentum to the 
> Moon),
> hence the days are getting longer and we have to insert a leap second every 
> second
> year or so.
>
> The short term trends (discounting massive asteorid s

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-03 Thread Ingo Molnar

* John Stultz  wrote:

> > Instead of having these super rare special events, how about implementing 
> > leap 
> > second smearing instead? That's far less radical and a lot easier to test 
> > as 
> > well, as it's a continuous mechanism. It will also confuse user-space a lot 
> > less, because there are no sudden time jumps.
> 
> So yea. Leap smearing/slewing is an attractive solution. The first issue is 
> that 
> there's no standard yet for the range of time that the slew occurs (or even 
> if 
> the slew is linear or a curve). The second is I don't think we can actually 
> get 
> away from supporting UTC w/ leap, as applications may depend on precision. 
> Also 
> things like NTP sync w/ mixed systems would be problematic, as NTPd and 
> others 
> need to become savvy of which mode they are working with.

Supporting it minimally is fine - supporting it with clearly unmaintainable 
complexity is not.

So as long as we offer good smearing of the leap second (with a configurable 
parameter for how long the period should be), people in need of better leap
second handling can take that.

> The leap smearing method of only doing it in private networks and controlling 
> it 
> by the NTP server is becoming more widespread, but it has its own problems, 
> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't 
> yet 
> frequency steerable separately from the other clockids, this method ends up 
> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well.

All real time clock derived clocks should smear in sync as well.

> I'd like to try to get something working in the kernel so we could support 
> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow 
> applications that care to migrate explicitly to the one they care about. 
> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS 
> so 
> that most applications that don't care can just ignore it.  But finding time 
> to 
> do this has been hard (if anyone is interested in working on it, I'd be 
> excited 
> to hear!).

There should definitely be a Kconfig option to just map all relevant clocks to 
smeared seconds. Hopefully this ends up being the standard in a few years and 
we 
can pin down the exact parameters as well.

Having separate clockids for mixed uses would be fine as well. Maybe.

> But if you think this patch is complicated, creating a new separately steered 
> clockid is not going to be trvial (as there will be lots of ugly edge cases, 
> like what if a leap second is cancelled mid-way through the slewing 
> adjustment, 
> etc).

Well, I think the main advantage of leap second smearing is that it's not a 
binary, but a continuous interface, and so it's way easier to test than 
'sudden' 
leap second insertions.

In fact we could essentially implement leap second smearing via the usual 
adjtimex 
mechanisms: as far as the time code is concerned it does not matter why a 
gradual 
adjustment occurs, only the rate of change and the method of convergence is an 
open parameter.

In fact I'd suggest we implement even original leap seconds by doing a 
high-rate 
'smearing' in the final X minutes leading up to the leap second, where 'X' 
could 
be 1 by default. This way we could eliminate leap seconds as a separate logical 
entity mostly.

This should be far more gentle to applications as well than sudden jumps, and 
timers will just work fine as well.

> > Secondly, why is there a directional flag? I thought leap seconds can only 
> > be 
> > inserted.
> 
> A leap delete isn't likely to occur, but its supported by the adjtimex 
> interface. And given the irregularity of the earths rotation, I'm not sure 
> I'd 
> rule it out completely.

Well, the long term trend is clear and unambiguous: the rotation of Earth is 
slowing down (the main component of which is losing angular momentum to the 
Moon), 
hence the days are getting longer and we have to insert a leap second every 
second 
year or so.

The short term trends (discounting massive asteorid strikes, at which point 
leap 
seconds will be the least of our problems) are somewhat chaotic:

 - glaciation (which shifts water mass assymetrically)

 - global warming (one component of which is thermal expansion, which expands 
   oceans assymetrically and shifts water mass - the other component is changing
   climatology: different oceanic currents, etc. - which all shift mass around)

 - tectonics (slow rearrangement of mass plus earthquakes).

 - even slower scale rearrangement of mass (mantle plumes, etc.)

but the long term trend still dominates. Look at this graph of measurements of 
the 
Earth's rotation:

  http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg

See how the mean (the green line) was always above zero in the measured past. 
The 
monotonically increasing nature comes from that.

and given how many problems we had with leap second insertion, on millions of 
installed systems, guess the likelihood of there being a le

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-02 Thread John Stultz
On Tue, Jun 2, 2015 at 2:01 AM, Ingo Molnar  wrote:
>
> * John Stultz  wrote:
>
>> Currently, leapsecond adjustments are done at tick time.
>>
>> As a result, the leapsecond was applied at the first timer
>> tick *after* the leapsecond (~1-10ms late depending on HZ),
>> rather then exactly on the second edge.
>>
>> This was in part historical from back when we were always
>> tick based, but correcting this since has been avoided since
>> it adds extra conditional checks in the gettime fastpath,
>> which has performance overhead.
>>
>> However, it was recently pointed out that ABS_TIME
>> CLOCK_REALTIME timers set for right after the leapsecond
>> could fire a second early, since some timers may be expired
>> before we trigger the timekeeping timer, which then applies
>> the leapsecond.
>>
>> This isn't quite as bad as it sounds, since behaviorally
>> it is similar to what is possible w/ ntpd made leapsecond
>> adjustments done w/o using the kernel discipline. Where
>> due to latencies, timers may fire just prior to the
>> settimeofday call. (Also, one should note that all
>> applications using CLOCK_REALTIME timers should always be
>> careful, since they are prone to quirks from settimeofday()
>> disturbances.)
>>
>> However, the purpose of having the kernel do the leap adjustment
>> is to avoid such latencies, so I think this is worth fixing.
>>
>> So in order to properly keep those timers from firing a second
>> early, this patch modifies the gettime accessors to do the
>> extra checks to apply the leapsecond adjustment on the second
>> edge. This prevents the timer core from expiring timers too
>> early.
>>
>> This patch does not handle VDSO time implementations, so
>> userspace using vdso gettime will still see the leapsecond
>> applied at the first timer tick after the leapsecond.
>> This is a bit of a tradeoff, since the performance impact
>> would be greatest to VDSO implementations, and since vdso
>> interfaces don't provide the TIME_OOP flag, one can't
>> distinquish the leapsecond from a time discontinuity (such
>> as settimeofday), so correcting the VDSO may not be as
>> important there.
>>
>> Apologies to Richard Cochran, who pushed for such a change
>> years ago, which I resisted due to the concerns about the
>> performance overhead.
>>
>> While I suspect this isn't extremely critical, folks who
>> care about strict leap-second correctness will likely
>> want to watch this, and it will likely be a -stable candidate.
>>
>> Cc: Prarit Bhargava 
>> Cc: Daniel Bristot de Oliveira 
>> Cc: Richard Cochran 
>> Cc: Jan Kara 
>> Cc: Jiri Bohac 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Shuah Khan 
>> Originally-suggested-by: Richard Cochran 
>> Reported-by: Daniel Bristot de Oliveira 
>> Reported-by: Prarit Bhargava 
>> Signed-off-by: John Stultz 
>> ---
>>  include/linux/time64.h  |  1 +
>>  include/linux/timekeeper_internal.h |  7 +++
>>  kernel/time/ntp.c   | 73 +---
>>  kernel/time/ntp_internal.h  |  1 +
>>  kernel/time/timekeeping.c   | 97 
>> -
>>  5 files changed, 159 insertions(+), 20 deletions(-)
>
> So I don't like the complexity of this at all: why do we add over 100 lines of
> code for something that occurs (literally) once in a blue moon?

So yea. I very much felt the same way before the early timer
expiration issue came up.


> ... and for that reason I'm not surprised at all that it broke in non-obvious
> ways.
>
> Instead of having these super rare special events, how about implementing leap
> second smearing instead? That's far less radical and a lot easier to test as 
> well,
> as it's a continuous mechanism. It will also confuse user-space a lot less,
> because there are no sudden time jumps.

So yea. Leap smearing/slewing is an attractive solution. The first
issue is that there's no standard yet for the range of time that the
slew occurs (or even if the slew is linear or a curve). The second is
I don't think we can actually get away from supporting UTC w/ leap, as
applications may depend on precision. Also things like NTP sync w/
mixed systems would be problematic, as NTPd and others need to become
savvy of which mode they are working with.

The  leap smearing method of only doing it in private networks and
controlling it by the NTP server is becoming more widespread, but it
has its own problems, since it doesn't handle CLOCK_TAI properly, and
since CLOCK_REALTIME isn't yet frequency steerable separately from the
other clockids, this method ends up slowing down CLOCK_TAI and
CLOCK_MONOTONIC as well.

I'd like to try to get something working in the kernel so we could
support CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids,
then allow applications that care to migrate explicitly to the one
they care about. Possibly allowing CLOCK_REALTIME to be compile-time
directed to CLOCK_UTCSLS so that most applications that don't care can
just ignore it.  But finding time to 

Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-02 Thread John Stultz
On Tue, Jun 2, 2015 at 2:21 AM, Peter Zijlstra  wrote:
> On Tue, Jun 02, 2015 at 11:01:54AM +0200, Ingo Molnar wrote:
>> > Currently, leapsecond adjustments are done at tick time.
>> >
>> > As a result, the leapsecond was applied at the first timer
>> > tick *after* the leapsecond (~1-10ms late depending on HZ),
>> > rather then exactly on the second edge.
>
> So why not program a hrtimer to expire at the exact right time? Then
> even the VDSO gets its time 'right' and you avoid touching all the
> fast paths.

Well, the hrtimer won't always expire at the exact right time (due to
irq latency, possible clockevent drift/error, etc) , so that would
close the window a bit, but wouldn't eliminate the issue.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-02 Thread Peter Zijlstra
On Tue, Jun 02, 2015 at 11:01:54AM +0200, Ingo Molnar wrote:
> > Currently, leapsecond adjustments are done at tick time.
> > 
> > As a result, the leapsecond was applied at the first timer
> > tick *after* the leapsecond (~1-10ms late depending on HZ),
> > rather then exactly on the second edge.

So why not program a hrtimer to expire at the exact right time? Then
even the VDSO gets its time 'right' and you avoid touching all the
fast paths.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-06-02 Thread Ingo Molnar

* John Stultz  wrote:

> Currently, leapsecond adjustments are done at tick time.
> 
> As a result, the leapsecond was applied at the first timer
> tick *after* the leapsecond (~1-10ms late depending on HZ),
> rather then exactly on the second edge.
> 
> This was in part historical from back when we were always
> tick based, but correcting this since has been avoided since
> it adds extra conditional checks in the gettime fastpath,
> which has performance overhead.
> 
> However, it was recently pointed out that ABS_TIME
> CLOCK_REALTIME timers set for right after the leapsecond
> could fire a second early, since some timers may be expired
> before we trigger the timekeeping timer, which then applies
> the leapsecond.
> 
> This isn't quite as bad as it sounds, since behaviorally
> it is similar to what is possible w/ ntpd made leapsecond
> adjustments done w/o using the kernel discipline. Where
> due to latencies, timers may fire just prior to the
> settimeofday call. (Also, one should note that all
> applications using CLOCK_REALTIME timers should always be
> careful, since they are prone to quirks from settimeofday()
> disturbances.)
> 
> However, the purpose of having the kernel do the leap adjustment
> is to avoid such latencies, so I think this is worth fixing.
> 
> So in order to properly keep those timers from firing a second
> early, this patch modifies the gettime accessors to do the
> extra checks to apply the leapsecond adjustment on the second
> edge. This prevents the timer core from expiring timers too
> early.
> 
> This patch does not handle VDSO time implementations, so
> userspace using vdso gettime will still see the leapsecond
> applied at the first timer tick after the leapsecond.
> This is a bit of a tradeoff, since the performance impact
> would be greatest to VDSO implementations, and since vdso
> interfaces don't provide the TIME_OOP flag, one can't
> distinquish the leapsecond from a time discontinuity (such
> as settimeofday), so correcting the VDSO may not be as
> important there.
> 
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.
> 
> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.
> 
> Cc: Prarit Bhargava 
> Cc: Daniel Bristot de Oliveira 
> Cc: Richard Cochran 
> Cc: Jan Kara 
> Cc: Jiri Bohac 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Shuah Khan 
> Originally-suggested-by: Richard Cochran 
> Reported-by: Daniel Bristot de Oliveira 
> Reported-by: Prarit Bhargava 
> Signed-off-by: John Stultz 
> ---
>  include/linux/time64.h  |  1 +
>  include/linux/timekeeper_internal.h |  7 +++
>  kernel/time/ntp.c   | 73 +---
>  kernel/time/ntp_internal.h  |  1 +
>  kernel/time/timekeeping.c   | 97 
> -
>  5 files changed, 159 insertions(+), 20 deletions(-)

So I don't like the complexity of this at all: why do we add over 100 lines of 
code for something that occurs (literally) once in a blue moon?

... and for that reason I'm not surprised at all that it broke in non-obvious 
ways.

Instead of having these super rare special events, how about implementing leap 
second smearing instead? That's far less radical and a lot easier to test as 
well, 
as it's a continuous mechanism. It will also confuse user-space a lot less, 
because there are no sudden time jumps.

Secondly, why is there a directional flag? I thought leap seconds can only be 
inserted.

So all in one, the leap second code is fragile and complex - lets re-think the 
whole topic instead of complicating it even more ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

2015-05-31 Thread Richard Cochran
On Fri, May 29, 2015 at 01:24:28PM -0700, John Stultz wrote:
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.

For the record, I got the idea from Michel Hack of IBM.

> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.

I think this is a step in the right direction.  If the 'next_leap_sec'
is made available to the vdso, then the 1-10 ms time error could also
be prevented there.

I have some comments, but, as is, feel free to add my ack.

Acked-by: Richard Cochran 

> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 472591e..6e15fbb 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c

> @@ -359,6 +364,33 @@ u64 ntp_tick_length(void)
>   return tick_length;
>  }
>  
> +/**
> + * get_leap_state - Returns the NTP leap state
> + * @next_leap_sec:   Next leapsecond in time64_t
> + * @next_leap_ktime: Next leapsecond in ktime_t
> + *
> + * Provides NTP leapsecond state. Returns direction
> + * of the leapsecond adjustment as an integer.
> + */
> +int get_leap_state(time64_t *next_leap_sec, ktime_t *next_leap_ktime)
> +{
> + int dir;
> +
> + if ((time_state == TIME_INS) && (time_status & STA_INS)) {

This can be reduced to just one test on (time_state == TIME_INS).
If user spaces clears STA_INS, then you can immediately cancel the
leap second.

> + dir = -1;
> + *next_leap_sec = ntp_next_leap_sec;
> + *next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> + } else if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> + dir = 1;
> + *next_leap_sec = ntp_next_leap_sec;
> + *next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> + } else {
> + dir = 0;
> + *next_leap_sec = TIME64_MAX;
> + next_leap_ktime->tv64 = KTIME_MAX;
> + }
> + return dir;
> +}
>  
>  /*
>   * this routine handles the overflow of the microsecond field

> @@ -382,15 +414,21 @@ int second_overflow(unsigned long secs)
>*/
>   switch (time_state) {
>   case TIME_OK:
> - if (time_status & STA_INS)
> + if (time_status & STA_INS) {

The user sets STA_INS via adjtimex, but we don't change to TIME_INS
until the next tick.  Why not change immediately?  Then this funtion
would only need to check for TIME_INS && (secs % 86400 == 0)  and the
very unlikey TIME_DEL.

>   time_state = TIME_INS;
> - else if (time_status & STA_DEL)
> + ntp_next_leap_sec = secs + SECS_PER_DAY -
> + (secs % SECS_PER_DAY);
> + } else if (time_status & STA_DEL) {
>   time_state = TIME_DEL;
> + ntp_next_leap_sec = secs + SECS_PER_DAY -
> +  ((secs+1) % SECS_PER_DAY);
> + }
>   break;
>   case TIME_INS:
> - if (!(time_status & STA_INS))
> + if (!(time_status & STA_INS)) {
> + ntp_next_leap_sec = TIME64_MAX;
>   time_state = TIME_OK;
> - else if (secs % 86400 == 0) {
> + } else if (secs % SECS_PER_DAY == 0) {
>   leap = -1;
>   time_state = TIME_OOP;
>   printk_deferred(KERN_NOTICE

> @@ -711,6 +752,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 
> *ts, s32 *time_tai)
>   if (!(time_status & STA_NANO))
>   txc->time.tv_usec /= NSEC_PER_USEC;
>  
> + /* Handle leapsec adjustments */

This block and its commnet rather confused me.  What this code
actually does is fix up the time value returned to the caller of
adjtimex, but only in the 1-10 millisecond window before the leap
second tick.

> + if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) {
> + if ((time_state == TIME_INS) && (time_status & STA_INS)) {
> + result = TIME_OOP;
> + txc->tai++;
> + txc->time.tv_sec--;
> + }
> + if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> + result = TIME_WAIT;
> + txc->tai--;
> + txc->time.tv_sec++;
> + }
> + if ((time_state == TIME_OOP) &&
> + (ts->tv_sec == ntp_next_leap_sec)) {
> + result = TIME_WAIT;
> + }
> + }
> +
>   return result;
>  }

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/