Re: sys_fuzzMime-Version: 1.0

2017-01-26 Thread Achim Gratz

An Intel Haswell E3-1225v3 w/ Intel GbE:

[0.00] clocksource: refined-jiffies: mask: 0x max_cycles: 
0x, max_idle_ns: 7645519600211568 ns
[0.00] clocksource: hpet: mask: 0x max_cycles: 0x, 
max_idle_ns: 133484882848 ns
[0.00] hpet clockevent registered
[0.463562] clocksource: jiffies: mask: 0x max_cycles: 0x, 
max_idle_ns: 764504178510 ns
[0.534516] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM 
Segments MSI]
[0.628467] clocksource: Switched to clocksource hpet
[0.641618] clocksource: acpi_pm: mask: 0xff max_cycles: 0xff, 
max_idle_ns: 2085701024 ns
[1.615081] rtc_cmos 00:02: setting system clock to 2017-01-26 17:55:29 UTC 
(1485453329)
[2.528564] tsc: Refined TSC clocksource calibration: 3192.606 MHz
[2.528597] clocksource: tsc: mask: 0x max_cycles: 
0x2e050166e04, max_idle_ns: 440795273449 ns
[3.684547] clocksource: Switched to clocksource tsc
[4.527055] PTP clock support registered
[4.698744] e1000e :00:19.0 :00:19.0 (uninitialized): registered PHC 
clock

The PTP capability isn't used since I don't have a clockmaster or any
other PTP capable interface at home.

My rasPi B+:

[0.29] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 
2147483647500ns
[0.74] clocksource: timer: mask: 0x max_cycles: 0x, 
max_idle_ns: 1911260446275 ns
[0.184683] clocksource: jiffies: mask: 0x max_cycles: 0x, 
max_idle_ns: 1911260446275 ns
[0.280399] clocksource: Switched to clocksource timer

The rasPi 2B:

[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[0.11] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 
4398046511078ns
[0.076999] clocksource: jiffies: mask: 0x max_cycles: 0x, 
max_idle_ns: 1911260446275 ns
[0.203692] clocksource: Switched to clocksource arch_sys_counter



Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

DIY Stuff:
http://Synth.Stromeko.net/DIY.html

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Frank Nicholas

> On Jan 24, 2017, at 4:59 PM, Gary E. Miller  wrote:
> 
> 
> Some of these older systems, like G5 Macintosh, may be a good test.
> 
> Prolly should test in some VM's too.
> 

I have a Mac mini G4 & 2 x Power Mac G5’s I’m willing to install any OS on for 
someone to use for testing or buildbot'ing. I also have a small ESXi cluster, 
with plenty of resources available.  Anyone is welcome to have full 
access/control on the Mac’s or VM’s.

If anyone’s interested, let me know what OS you want installed where, and I’ll 
pull them out of the closet.

Thanks,
Frank

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Gary E. Miller
Yo Kurt!

On Wed, 25 Jan 2017 18:25:09 +0100
Kurt Roeckx  wrote:

> All my real amd64 boxes show:
> [0.540511] Switched to clocksource hpet
> [3.327348] Switched to clocksource tsc

Ditto for mine,

My RasPI Ar
[0.689988] clocksource: Switched to clocksource timer

My RasPi 2 and 3:

[0.207033] clocksource: Switched to clocksource arch_sys_counter

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpfgmXfqIxNB.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Kurt Roeckx
On Wed, Jan 25, 2017 at 02:12:05AM -0800, Hal Murray wrote:
> I think kernels went through 3 stages:
> 
> Old old kernels were very coarse.  They bumped the clock on an interrupt.  
> End of story.
> 
> Old kernels use the TSC (or equivalent) to interpolate between interrupts.  
> (A comment from Mills clued me in.  I was plotting temperature vs drift.  It 
> got a lot cleaner when I moved the temperature probe from the CPU crystal 
> over to the RTC/TOY clock crystal.  I haven't looked for the code.)
> 
> Current kernels don't use interrupts for keeping time.  It's all done with 
> the TSC.

All my real amd64 boxes show:
[0.540511] Switched to clocksource hpet
[3.327348] Switched to clocksource tsc

But my armel shows:
[5.722800] Switching to clocksource orion_clocksource

A KVM guest shows:
[0.392136] Switched to clocksource kvm-clock


Kurt

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Eric S. Raymond
Achim Gratz :
> Eric S. Raymond writes:
> > Is "unbiased and has a (relatively) white spectrum" equivalent to
> > looking like symmetrical digital white noise around actual UTC, if you
> > knew what it was?
> 
> Yes, if you knew the error exactly, then looking at it as a signal in
> its own right.  The task of the PLL is to steer the error to zero and
> the filtering that allows it to do this without undue overshoot or even
> oscillations oscillations necessarily has a few assumptions about the
> possible forms of error signal baked in.

I'd expect that on mathematical first principles, even though I don't
clearly understand how the "steering" works.  To steer you have to have
priors, some model of what "well-formed" looks like.

> "Unbiased" means that various forms of averaging should converge to
> zero.  "Relatively White Spectrum" means that there shouldn't be any
> concentrations of energy at specific frequencies within the loop
> bandwidth of the PLL (equivalently that the Fourier spectrum in that
> bandwidth is "flat").

Right, I got that part.  I do have some grasp of Fourier transforms
and frequency spectra, albeit mostly theoretical rather than
practical.  (I was a mathematician before I was a software engineer.)

>Together these two conditions ensure, among other
> things, that the average error converges to zero smoothly and that the
> autocorrelation for the error signal stays close to zero for all time
> lags.
> 
> Viewed from the other side: if you had a biased error signal, the PLL
> would converge to a fixed offset to UTC that was representative of that
> bias.  If the spectrum was not white, then the PLL would develop a
> time-variable offset around UTC (which could end up as an oscillation).

OK, that was *useful*.  I had grasped the implications of bias, but I hadn't
clearly visualized  how a non-white error spectrum would cash out in the
time domain.  But it makes perfect sense to me now, yeah.  Your oscillating
error will correspond to where there's density in the error spectrum.

Thanks.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Eric S. Raymond
Achim Gratz :
> > Therefore I *deduce* that the PLL correction (the one NTP does, not
> > the in-kernel one Hal tells us is associated with PPS) requires a
> > monotonically increasing clock.  It's the simplest explanation for the
> > way libntp/systime.c works, and it explains *everything* that has puzzled
> > me about that code.
> 
> The thing the PLL (more specifically the loop filter) should care about
> is that the error estimate it makes is unbiased and has a (relatively)
> white spectrum.  That's exactly what doesn't happen when you have a
> clock that jumps and you try to read it several times inbetween those
> jumps.

Is "unbiased and has a (relatively) white spectrum" equivalent to
looking like symmetrical digital white noise around actual UTC, if you
knew what it was?

(I'm asking this question because my inituitions about analog-level
signal processing are still weak.)
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Eric S. Raymond
Kurt Roeckx :
> All my real amd64 boxes show:
> [0.540511] Switched to clocksource hpet
> [3.327348] Switched to clocksource tsc
> 
> But my armel shows:
> [5.722800] Switching to clocksource orion_clocksource
> 
> A KVM guest shows:
> [0.392136] Switched to clocksource kvm-clock

Sorry, I don't know how to interpret this.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-25 Thread Hal Murray

e...@thyrsus.com said:
>> Mark/Eric: Can you guarantee that we will never run on
>> a system with a crappy clock?  In this context, crappy means
>> one that takes big steps.

> OK, now that I think I understand this issue I'm going to say "Yes, we can
> assume this".

> All x86 machines back to the Pentium (1993) have a hardware cycle counter;
> it's called the TSC. As an interesting detail, this was a 64-bit register
> even when the primary word size was 32 bits.

> All ARM processors back to the ARM6 (1992) have one as well. A little web
> searching finds clear indications of cycle counters on the UltraSparc (SPARC
> V9), Alpha, MIPS, PowerPC, IA64 and PA-RISC.

On ARM, you can't read it from user land unless a mode bit is set.  Last time 
I tried, it wasn't set.  I found directions on how to set it, but that 
required building a kernel module and I never got that far.

> I also hunted for information on dedicated smartphone processors. I found
> clear indication of a cycle counter on the Qualcomm Snapdragon and clouded
> ones for Apple A-series processors.  The Nvidia Tegra, MediaTek, HiSilicon
> and Samsung HyExynos chips are all recent ARM variants and can therefore be
> assumed to have an ARM %tick register.

> Reading between the lines, it looks to me like this hardware feature became
> ubiquitous in the early 1990s and that one of the drivers was
> hardware-assisted crypto.  It is therefore *highly* unlikely to be omitted
> from any new design, even in low-power embedded.  And if you have a TSC,
> sampling it is a trivial handful of assembler instructions.

What does that have to do with crypto?

I've never used it for anything other than timing.


> I think I can take it from here. -- 

Just to make sure we are all on the same wavelength...

User code never reads that register.  Modern kernels use it for timekeeping.

I think kernels went through 3 stages:

Old old kernels were very coarse.  They bumped the clock on an interrupt.  
End of story.

Old kernels use the TSC (or equivalent) to interpolate between interrupts.  
(A comment from Mills clued me in.  I was plotting temperature vs drift.  It 
got a lot cleaner when I moved the temperature probe from the CPU crystal 
over to the RTC/TOY clock crystal.  I haven't looked for the code.)

Current kernels don't use interrupts for keeping time.  It's all done with 
the TSC.

There is an interesting worm in this area.  Most PCs fuzz the CPU frequency 
to meet EMI regulations.  There used to be a separate clock-generator chip: 
crystal in, all-the-clocks-you-need out.  It's all in the big-chip now, but 
you can get specs for the old chips.  The logic controlling the PLL 
deliberately down modulated the CPU frequency by a 1/2% or so at a (handwave) 
30KHz rate.

--

Just because the hardware has a TSC (or equivalent), doesn't mean that the 
software uses it.  I wouldn't be all that surprised if the OS for an IoT size 
device still had an old-old clock routine.

We should write a hack program to collect data and make pretty histograms or 
whatever.  If we are smart enough, we can probably make it scream and shout 
if it ever finds an old-old/coarse clock.  If we are lucky, we can run that 
early in the install path.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Gary E. Miller
Yo Fred!

On Tue, 24 Jan 2017 18:48:43 -0800 (PST)
Fred Wright  wrote:

> > > If one is dithering, the amount of dither should be based on the
> > > clock's actual resolution, *not* the time required to read it.
> > > In a sampled system, one would add dither equal to the
> > > quantization interval, in order to produce results statistically
> > > similar to sampling with infinite resolution.  For time values,
> > > one would add dither equal to the clock's counting period, to
> > > produce results statistically similar to a clock running at
> > > infinite frequency.  
> >
> > Possibly, but that is not how it works now.  And would it be an
> > improvement?  Bring on the experiments!  
> 
> I didn't say that it's worthwhile on modern systems; in fact I said
> exactly the opposite further down.  But if one *is* going to dither,
> then the clock period is the correct amount.  That's a peak-to-peak
> value, so if one is adding signed dither, then the magnitude should
> be half that.

Well, that is not what NTP does.  NTP dithers on the smallest time to
read the clock consecutively (sys_fuzz), then NTP ensures the time is
unique and increasing.  Given that the clock period is usually way
longer than the sys_fuzz, I think sys_fuzz is the thing to dither on.

> > > That's not uncommon, but it's a really bad idea.  Demanding that a
> > > clock always return unique values is an unwarranted extension of
> > > the job description of a clock.  
> >
> > Well then, you just said the current NTP implementation is a bad
> > idea.  
> 
> No, what I said is that it's a bad idea for an *OS time function* to
> corrupt the value in the name of uniqueness.  That's what Hal was
> talking about.

I don't agree, but getting off topic.

> > In practice, with nano Second resolution clocks doing
> > CLOCK_MONOTOMIC is not hard.  
> 
> Not necessarily (assuming you're actually talking about uniqueness
> rather than mere monotonicity), for a couple of reasons:

Getting lost in words that mean different things depending on the
context: POSIX, NTP, etc.

So, not gonna put a word on it, since the word is not important.  What
matters is that NTP always uses unique and increasing time.

> 1) Most clock counters don't really run at 1GHz, so they don't really
> have nanosecond resolution.  (in spite of what clock_getres() may
> say).

Ah, back to the dictionary! clock_gettime() does resolve to nano Second.
Just read the man page.

   struct timespec {
   time_t   tv_sec;/* seconds */
   long tv_nsec;   /* nanoseconds */
   };

Them things there are nano Seconds.

The reolution tells you nothing about how much it increments every time
that it increments.  Integer resolution, in this context, is just the
smallest increment of time that can be represented, it says nothing
about the precision, accuracy, increment or anthing else about the time
that is represented.

I can see this very easily on my RasPi's.  NTP reads the clock to
one nano Second, but it increments by much larger amounts.

This shows the effect:

https://pi4.rellim.com/day/#local_clock_time_offset_histogram

> 2) Even if the clock really did run at 1GHz, if it could be read in
> under 1ns it would still be "coarse".  I'm not aware of any systems
> that can *currently* do that, but it's certainly not beyond the realm
> of possibility.  Assuming that machines will never be faster than X
> is one of those not-future-proof assumptions like Y2K.

Not gonna worry about things that don't happen yet or soon.

> Note that "monotonic" does not necessarily mean unique.

I'll try to bring this back on topic again: NTP cares nothing about
MONOTONIC.  The word in a NOP in NTP land.  No point bike shedding it.

> > > The proper way to derive unique values
> > > from a clock is to wrap it with something that fudges *its*
> > > values as needed, without inflicting lies on the clock itself.  
> >
> > Sorta circular since NTP reads the system clock, applies fudge, then
> > adjusts the sysclock t match.  
> 
> Umm, I think you're assuming that "fudges" above means some kind of
> NTP time adjustment.

Yup, that thing that NTP does, whether or not we understand it, it
clearly does it.


> I used it in the generic "fudge factor" sense,
> in this case meaning whatever adjustment is needed to ensure
> uniqueness.

Does not change my comment on your comment.

> Suppose one has:

You basically duplicated the Lamport rules, which is what NTP already
does, but w/o the sys_fuzz thing that NTP does.

> > > Also note that in some contexts it's reasonable to extend the
> > > resolution of a "coarse" clock (without breaking "fine" clocks) by
> > > reading the clock in a loop until the value changes.  This
> > > approach is completely neutered by a uniqueness kludge.  
> >
> > I do not see how that helps NTP, just adds latency.  
> 
> Of course.  But in *some contexts* it's useful, and it's broken if
> the OS insists on corrupting the 

Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Fred Wright

On Tue, 24 Jan 2017, Gary E. Miller wrote:
> On Tue, 24 Jan 2017 15:22:20 -0800 (PST)
> Fred Wright  wrote:

> > If one is dithering, the amount of dither should be based on the
> > clock's actual resolution, *not* the time required to read it.  In a
> > sampled system, one would add dither equal to the quantization
> > interval, in order to produce results statistically similar to
> > sampling with infinite resolution.  For time values, one would add
> > dither equal to the clock's counting period, to produce results
> > statistically similar to a clock running at infinite frequency.
>
> Possibly, but that is not how it works now.  And would it be an
> improvement?  Bring on the experiments!

I didn't say that it's worthwhile on modern systems; in fact I said
exactly the opposite further down.  But if one *is* going to dither, then
the clock period is the correct amount.  That's a peak-to-peak value, so
if one is adding signed dither, then the magnitude should be half that.

> > > There is an additional worm in this can.  Some OSes with crappy
> > > clocks bumped the clock by a tiny bit each time you read it so that
> > > all clock-reads returned different results and you could use it for
> > > making unique IDs.
> >
> > That's not uncommon, but it's a really bad idea.  Demanding that a
> > clock always return unique values is an unwarranted extension of the
> > job description of a clock.
>
> Well then, you just said the current NTP implementation is a bad idea.

No, what I said is that it's a bad idea for an *OS time function* to
corrupt the value in the name of uniqueness.  That's what Hal was talking
about.

> In practice, with nano Second resolution clocks doing CLOCK_MONOTOMIC
> is not hard.

Not necessarily (assuming you're actually talking about uniqueness rather
than mere monotonicity), for a couple of reasons:

1) Most clock counters don't really run at 1GHz, so they don't really have
nanosecond resolution.  (in spite of what clock_getres() may say).

2) Even if the clock really did run at 1GHz, if it could be read in under
1ns it would still be "coarse".  I'm not aware of any systems that can
*currently* do that, but it's certainly not beyond the realm of
possibility.  Assuming that machines will never be faster than X is one of
those not-future-proof assumptions like Y2K.

Note that "monotonic" does not necessarily mean unique.  Mathematically,
it means that values are either strictly nondecreasing or strictly
nonincreasing.  In the context of time, only the former interpretation
makes sense, but it doesn't prohibit repeated values.  Uniqueness and
monotonicity are orthogonal properties.

Nothing in the POSIX spec says that CLOCK_MONOTONIC values are guaranteed
to be unique.  See:

http://pubs.opengroup.org/onlinepubs/9699919799/

It doesn't really say much of anything, except that the epoch is arbitrary
and that it isn't adjusted by clock_settime().  The absence of backward
steps from the latter is where the monotonicity comes from.

> > The proper way to derive unique values
> > from a clock is to wrap it with something that fudges *its* values as
> > needed, without inflicting lies on the clock itself.
>
> Sorta circular since NTP reads the system clock, applies fudge, then
> adjusts the sysclock t match.

Umm, I think you're assuming that "fudges" above means some kind of NTP
time adjustment.  I used it in the generic "fudge factor" sense, in this
case meaning whatever adjustment is needed to ensure uniqueness.

Suppose one has:

clock_val_t get_time(void);

Then (ignoring thread safety) one could have something like:

clock_val_t get_unique_time(void)
{
static clock_val_t last_time = 0;
clock_val_t new_time = get_time();
return new_time > last_time ? (last_time = new_time) : ++last_time;
}

The result is both unique and monotonic, and differs from the actual time
by the minimum amount necessary to meet those conditions.

That code of course assumes that clock_val_t is an integer, and gets
messier with multi-component time representations like "struct timespec".

> > Also note that in some contexts it's reasonable to extend the
> > resolution of a "coarse" clock (without breaking "fine" clocks) by
> > reading the clock in a loop until the value changes.  This approach
> > is completely neutered by a uniqueness kludge.
>
> I do not see how that helps NTP, just adds latency.

Of course.  But in *some contexts* it's useful, and it's broken if the OS
insists on corrupting the time in the name of uniqueness.

> > The clock_getres() function is supposed to report the actual clock
> > resolution, which is what should determine the amount of dither, but
> > in practice it's rarely correctly implemented.  E.g., in the Linux
> > cases I've tested, it ignores the hardware properties and just retuns
> > 1ns.
>
> And it probably can not even determine the hardware properties.

It knows perfectly well what the actual (or at least nominal) oscillator

Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Eric S. Raymond
Hal Murray :
> 
> g...@rellim.com said:
> > Makes no sense to me.  Adding randomness helps when you have hysteresis,
> > stiction, friction, lash and some other things, but none of those apply to
> > NTP.
> 
> The NTP case is roughly stiction.  Remember the age of this code.  It was 
> working long before CPUs had instructions to read a cycle counter.  Back 
> then, the system clock was updated on the scheduler interrupt.  There was no 
> interpolation between ticks.

*blink*   I think I just achieved enlightenment.  Gary, Hal, please
review the following carefully to ensure that I haven't updated my
beliefs wrongly.

Stiction in this context = "adjacent clock reads could get back the
same value", is that right?  Suddenly a whole bunch of things, like
the implications of only updating the clock on a scheduler interrupt,
make sense.

And now I think I get (a) why Mills fuzzed the clock, and (b) why the
code is so careful about checking for clock stepback.  If your working
assumption is that the clock will only update on a scheduler tick, and
your PLL correction requires you to have a monotonically increasing
clock, stiction is *bad*.  You have no choice but to fuzz the clock,
and the probabilistically least risky way to do it is by around half
the tick interval, but because random is random you cannot guarantee
when you have to do it twice between ticks that your second
pseudosample will be greater than your first.  You need what the code
calls a "Lamport violation" check to throw out bad pseudosamples.

Therefore I *deduce* that the PLL correction (the one NTP does, not
the in-kernel one Hal tells us is associated with PPS) requires a
monotonically increasing clock.  It's the simplest explanation for the
way libntp/systime.c works, and it explains *everything* that has puzzled
me about that code.

I love this project - it makes me learn new things.

> Mark/Eric: Can you guarantee that we will never run on a system with a crappy 
> clock?  In this context, crappy means one that takes big steps.

OK, now that I think I understand this issue I'm going to say "Yes, we
can assume this".

All x86 machines back to the Pentium (1993) have a hardware cycle
counter; it's called the TSC. As an interesting detail, this was a
64-bit register even when the primary word size was 32 bits.

All ARM processors back to the ARM6 (1992) have one as well. A little
web searching finds clear indications of cycle counters on the
UltraSparc (SPARC V9), Alpha, MIPS, PowerPC, IA64 and PA-RISC.

I also hunted for information on dedicated smartphone processors.
I found clear indication of a cycle counter on the Qualcomm Snapdragon
and clouded ones for Apple A-series processors.  The Nvidia Tegra, MediaTek,
HiSilicon and Samsung HyExynos chips are all recent ARM variants and can
therefore be assumed to have an ARM %tick register.

Reading between the lines, it looks to me like this hardware feature
became ubiquitous in the early 1990s and that one of the drivers was
hardware-assisted crypto.  It is therefore *highly* unlikely to be
omitted from any new design, even in low-power embedded.  And if you
have a TSC, sampling it is a trivial handful of assembler
instructions.

> I thinnk that all Gary's test proved is that his system doesn't have a crappy 
> clock.

Yes. Agreed.

> If we are serious about getting rid of that code, I'll put investigating that 
> area higher on my list.  I  think we have more important things to do.

I think I can take it from here.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Gary E. Miller
Yo Fred!

On Tue, 24 Jan 2017 15:22:20 -0800 (PST)
Fred Wright  wrote:

> On Tue, 24 Jan 2017, Gary E. Miller wrote:
> 
> > Last week we had a discussion on sys_fuzz and the value of adding
> > random noise to some measurements.  The code defi2nes sys_fuzz asL
> >
> > "* The sys_fuzz variable measures the minimum time to read the
> > system
> >  * clock, regardless of its precision."
> >
> > Rondomness of half the sys_fuzz is then added to some values, like
> > this:
> >
> > fuzz = ntp_random() * 2. / FRAC * sys_fuzz
> >
> > Makes no sense to me.  Adding randomness helps when you have
> > hysteresis, stiction, friction, lash and some other things, but
> > none of those apply to NTP.  
> 
> Basing it on the time to *read* the clock definitely makes no sense,
> although I suspect one would have to dig back fairly far in the
> history to determine the source of that confusion.

Just look at that commit, and compare to the bug report.  The confusion
is obvious there...

> If one is dithering, the amount of dither should be based on the
> clock's actual resolution, *not* the time required to read it.  In a
> sampled system, one would add dither equal to the quantization
> interval, in order to produce results statistically similar to
> sampling with infinite resolution.  For time values, one would add
> dither equal to the clock's counting period, to produce results
> statistically similar to a clock running at infinite frequency.

Possibly, but that is not how it works now.  And would it be an
improvement?  Bring on the experiments!

> > There is an additional worm in this can.  Some OSes with crappy
> > clocks bumped the clock by a tiny bit each time you read it so that
> > all clock-reads returned different results and you could use it for
> > making unique IDs.  
> 
> That's not uncommon, but it's a really bad idea.  Demanding that a
> clock always return unique values is an unwarranted extension of the
> job description of a clock.

Well then, you just said the current NTP implementation is a bad idea.

In practice, with nano Second resolution clocks doing CLOCK_MONOTOMIC
is not hard.

> The proper way to derive unique values
> from a clock is to wrap it with something that fudges *its* values as
> needed, without inflicting lies on the clock itself.

Sorta circular since NTP reads the system clock, applies fudge, then
adjusts the sysclock t match.

> Also note that in some contexts it's reasonable to extend the
> resolution of a "coarse" clock (without breaking "fine" clocks) by
> reading the clock in a loop until the value changes.  This approach
> is completely neutered by a uniqueness kludge.

I do not see how that helps NTP, just adds latency.

> 1) If it's a "coarse" clock, then dithering destroys monotonicity.

Did you read the bug report?  That is exactly what was happening, and
worse, Thus the fix.

> 2) Determining the proper amount of dither isn't necessarily easy.

Yup.

> The clock_getres() function is supposed to report the actual clock
> resolution, which is what should determine the amount of dither, but
> in practice it's rarely correctly implemented.  E.g., in the Linux
> cases I've tested, it ignores the hardware properties and just retuns
> 1ns.

And it probably can not even determine the hardware properties.

> I'm not convinced that sub-microsecond dithering is worthwhile,
> anyway. If the dithering code is retained at all, it might make sense
> to have a configure test that reads clock_getres(), and only enables
> dithering support if the result is more than a microsecond.  That
> test would be unaffected by the aforementioned lies in
> clock_getres().  Though there'd need to be a way to force dithering
> on for testing, since it's unlikely that any test platforms would use
> it naturally.  And those sorts of configure tests are problematic for
> cross-building.

Even the clock_getres() man page warns the return values may return
"bogus results".  Next...

> BTW, if the only use for randomness is for computational dithering,
> and not for security, then there's no need for crypto-quality
> randomness.

So far that looks like the case, plus adding a nonce in the LSB's
of timestamps.

But that will not last, the autokey replacement should be here
'soon'.  This year or next.

> In that case, why not just read /dev/urandom directly
> and dispense with the whole libsodium mess?

All ntpd uses libsodium for is to read /dev/urandom, and for the many
cases that /dev/urandom does not exist, or better exists, then some
other way.  I'd rather see it go too, but I see no easy path to get
there.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgp6E8rMkP7mt.pgp
Description: OpenPGP digital 

Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Fred Wright

On Tue, 24 Jan 2017, Gary E. Miller wrote:

> Last week we had a discussion on sys_fuzz and the value of adding
> random noise to some measurements.  The code defi2nes sys_fuzz asL
>
> "* The sys_fuzz variable measures the minimum time to read the system
>  * clock, regardless of its precision."
>
> Rondomness of half the sys_fuzz is then added to some values, like this:
>
> fuzz = ntp_random() * 2. / FRAC * sys_fuzz
>
> Makes no sense to me.  Adding randomness helps when you have hysteresis,
> stiction, friction, lash and some other things, but none of those apply
> to NTP.

Basing it on the time to *read* the clock definitely makes no sense,
although I suspect one would have to dig back fairly far in the history to
determine the source of that confusion.

If one is dithering, the amount of dither should be based on the clock's
actual resolution, *not* the time required to read it.  In a sampled
system, one would add dither equal to the quantization interval, in order
to produce results statistically similar to sampling with infinite
resolution.  For time values, one would add dither equal to the clock's
counting period, to produce results statistically similar to a clock
running at infinite frequency.

On Tue, 24 Jan 2017, Hal Murray wrote:

> The NTP case is roughly stiction.  Remember the age of this code.  It was
> working long before CPUs had instructions to read a cycle counter.  Back
> then, the system clock was updated on the scheduler interrupt.  There was no
> interpolation between ticks.

Indeed.  The interrupt was often derived from the power line, making the
clock resolution 16.7ms or 20ms.  With such crummy resolution, applying
some "whitening" looks attractive.

> Mark/Eric: Can you guarantee that we will never run on a system with a crappy
> clock?  In this context, crappy means one that takes big steps.

There are two different time intervals involved - the interval between
successive time values, and the time required to read the clock.  I'd use
the term "coarse" to describe a clock where the former is larger than the
latter, such that it's possible to read the same value more than once.

If you mean "big steps" in the absolute sense, then for some meaning of
"big", the term "crappy" is warranted. :-) But note that a clock can be
"coarse" without being "crappy".  For example, a clock running at 10MHz
isn't particularly "crappy", but if it can be read in 50ns, then it's
still "coarse".

> There is an additional worm in this can.  Some OSes with crappy clocks bumped
> the clock by a tiny bit each time you read it so that all clock-reads
> returned different results and you could use it for making unique IDs.

That's not uncommon, but it's a really bad idea.  Demanding that a clock
always return unique values is an unwarranted extension of the job
description of a clock.  The proper way to derive unique values from a
clock is to wrap it with something that fudges *its* values as needed,
without inflicting lies on the clock itself.  Any clock classified as
"coarse" by the above definition is corrupted by a uniqueness requirement,
whether "crappy" or not.

Also note that in some contexts it's reasonable to extend the resolution
of a "coarse" clock (without breaking "fine" clocks) by reading the clock
in a loop until the value changes.  This approach is completely neutered
by a uniqueness kludge.


Getting back to the original issue, if dithering is warranted, then there
are a couple of pitfalls:

1) If it's a "coarse" clock, then dithering destroys monotonicity.  In
*some* (mainly statistical) contexts, non-monotonic time values may be
perfectly OK, but in any context involving intervals they can be
disastrous.  So one would probably need to keep both dithered and
undithered time values.

2) Determining the proper amount of dither isn't necessarily easy.  The
clock_getres() function is supposed to report the actual clock resolution,
which is what should determine the amount of dither, but in practice it's
rarely correctly implemented.  E.g., in the Linux cases I've tested, it
ignores the hardware properties and just retuns 1ns.

I'm not convinced that sub-microsecond dithering is worthwhile, anyway.
If the dithering code is retained at all, it might make sense to have a
configure test that reads clock_getres(), and only enables dithering
support if the result is more than a microsecond.  That test would be
unaffected by the aforementioned lies in clock_getres().  Though there'd
need to be a way to force dithering on for testing, since it's unlikely
that any test platforms would use it naturally.  And those sorts of
configure tests are problematic for cross-building.

BTW, if the only use for randomness is for computational dithering, and
not for security, then there's no need for crypto-quality randomness.  In
that case, why not just read /dev/urandom directly and dispense with the
whole libsodium mess?

Fred Wright
___
devel mailing 

Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Gary E. Miller
Yo Hal!

On Tue, 24 Jan 2017 13:46:59 -0800
Hal Murray  wrote:

> g...@rellim.com said:
> > Makes no sense to me.  Adding randomness helps when you have
> > hysteresis, stiction, friction, lash and some other things, but
> > none of those apply to NTP.  
> 
> The NTP case is roughly stiction.  Remember the age of this code.  It
> was working long before CPUs had instructions to read a cycle
> counter.  Back then, the system clock was updated on the scheduler
> interrupt.  There was no interpolation between ticks.

You gotta squint real hard to see that as stiction, but not worth 
debating the proper word.  So it may have mattered back then, but
do we need to carry this legacy code.

> Mark/Eric: Can you guarantee that we will never run on a system with
> a crappy clock?  In this context, crappy means one that takes big
> steps.

The has nothing to do with clock steps, this has to do with how
fast the clock can be read.

> I thinnk that all Gary's test proved is that his system doesn't have
> a crappy clock.

Yes, more testing required.  You got ideas what to test next?

> There is an additional worm in this can.  Some OSes with crappy
> clocks bumped the clock by a tiny bit each time you read it so that
> all clock-reads returned different results and you could use it for
> making unique IDs. 

This is a POSIX requirement for CLOCK_MONOTONIC.  Also unrelated to
how fast the clock can be read.

> If we are serious about getting rid of that code, I'll put
> investigating that area higher on my list.  I  think we have more
> important things to do.

Yes, not high on the list, but so easy to test, and so long to get
results, that it is worth thinking about.  Anytime we can remove
unneeded code and noise from ntpd it is good.  I figure we could
remove 100 LOC easy.

Some of these older systems, like G5 Macintosh, may be a good test.

Prolly should test in some VM's too.

In libntp/systime.c, I just make set_sys_fuzz() always set the sys_fuzz
to 0.0:

{
+   fuzz_val = 0.0;  /* GEM */
sys_fuzz = fuzz_val;

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgp9YxNMK9Nrn.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: sys_fuzzMime-Version: 1.0

2017-01-24 Thread Hal Murray

g...@rellim.com said:
> Makes no sense to me.  Adding randomness helps when you have hysteresis,
> stiction, friction, lash and some other things, but none of those apply to
> NTP.

The NTP case is roughly stiction.  Remember the age of this code.  It was 
working long before CPUs had instructions to read a cycle counter.  Back 
then, the system clock was updated on the scheduler interrupt.  There was no 
interpolation between ticks.

Mark/Eric: Can you guarantee that we will never run on a system with a crappy 
clock?  In this context, crappy means one that takes big steps.

I thinnk that all Gary's test proved is that his system doesn't have a crappy 
clock.

There is an additional worm in this can.  Some OSes with crappy clocks bumped 
the clock by a tiny bit each time you read it so that all clock-reads 
returned different results and you could use it for making unique IDs.
 
If we are serious about getting rid of that code, I'll put investigating that 
area higher on my list.  I  think we have more important things to do.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel