Re: sys_fuzzMime-Version: 1.0
An Intel Haswell E3-1225v3 w/ Intel GbE: [0.00] clocksource: refined-jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 7645519600211568 ns [0.00] clocksource: hpet: mask: 0x max_cycles: 0x, max_idle_ns: 133484882848 ns [0.00] hpet clockevent registered [0.463562] clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 764504178510 ns [0.534516] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [0.628467] clocksource: Switched to clocksource hpet [0.641618] clocksource: acpi_pm: mask: 0xff max_cycles: 0xff, max_idle_ns: 2085701024 ns [1.615081] rtc_cmos 00:02: setting system clock to 2017-01-26 17:55:29 UTC (1485453329) [2.528564] tsc: Refined TSC clocksource calibration: 3192.606 MHz [2.528597] clocksource: tsc: mask: 0x max_cycles: 0x2e050166e04, max_idle_ns: 440795273449 ns [3.684547] clocksource: Switched to clocksource tsc [4.527055] PTP clock support registered [4.698744] e1000e :00:19.0 :00:19.0 (uninitialized): registered PHC clock The PTP capability isn't used since I don't have a clockmaster or any other PTP capable interface at home. My rasPi B+: [0.29] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns [0.74] clocksource: timer: mask: 0x max_cycles: 0x, max_idle_ns: 1911260446275 ns [0.184683] clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 1911260446275 ns [0.280399] clocksource: Switched to clocksource timer The rasPi 2B: [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns [0.11] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns [0.076999] clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 1911260446275 ns [0.203692] clocksource: Switched to clocksource arch_sys_counter Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ DIY Stuff: http://Synth.Stromeko.net/DIY.html ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
> On Jan 24, 2017, at 4:59 PM, Gary E. Millerwrote: > > > Some of these older systems, like G5 Macintosh, may be a good test. > > Prolly should test in some VM's too. > I have a Mac mini G4 & 2 x Power Mac G5’s I’m willing to install any OS on for someone to use for testing or buildbot'ing. I also have a small ESXi cluster, with plenty of resources available. Anyone is welcome to have full access/control on the Mac’s or VM’s. If anyone’s interested, let me know what OS you want installed where, and I’ll pull them out of the closet. Thanks, Frank ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Yo Kurt! On Wed, 25 Jan 2017 18:25:09 +0100 Kurt Roeckxwrote: > All my real amd64 boxes show: > [0.540511] Switched to clocksource hpet > [3.327348] Switched to clocksource tsc Ditto for mine, My RasPI Ar [0.689988] clocksource: Switched to clocksource timer My RasPi 2 and 3: [0.207033] clocksource: Switched to clocksource arch_sys_counter RGDS GARY --- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 g...@rellim.com Tel:+1 541 382 8588 Veritas liberabit vos. -- Quid est veritas? "If you can’t measure it, you can’t improve it." - Lord Kelvin pgpfgmXfqIxNB.pgp Description: OpenPGP digital signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
On Wed, Jan 25, 2017 at 02:12:05AM -0800, Hal Murray wrote: > I think kernels went through 3 stages: > > Old old kernels were very coarse. They bumped the clock on an interrupt. > End of story. > > Old kernels use the TSC (or equivalent) to interpolate between interrupts. > (A comment from Mills clued me in. I was plotting temperature vs drift. It > got a lot cleaner when I moved the temperature probe from the CPU crystal > over to the RTC/TOY clock crystal. I haven't looked for the code.) > > Current kernels don't use interrupts for keeping time. It's all done with > the TSC. All my real amd64 boxes show: [0.540511] Switched to clocksource hpet [3.327348] Switched to clocksource tsc But my armel shows: [5.722800] Switching to clocksource orion_clocksource A KVM guest shows: [0.392136] Switched to clocksource kvm-clock Kurt ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Achim Gratz: > Eric S. Raymond writes: > > Is "unbiased and has a (relatively) white spectrum" equivalent to > > looking like symmetrical digital white noise around actual UTC, if you > > knew what it was? > > Yes, if you knew the error exactly, then looking at it as a signal in > its own right. The task of the PLL is to steer the error to zero and > the filtering that allows it to do this without undue overshoot or even > oscillations oscillations necessarily has a few assumptions about the > possible forms of error signal baked in. I'd expect that on mathematical first principles, even though I don't clearly understand how the "steering" works. To steer you have to have priors, some model of what "well-formed" looks like. > "Unbiased" means that various forms of averaging should converge to > zero. "Relatively White Spectrum" means that there shouldn't be any > concentrations of energy at specific frequencies within the loop > bandwidth of the PLL (equivalently that the Fourier spectrum in that > bandwidth is "flat"). Right, I got that part. I do have some grasp of Fourier transforms and frequency spectra, albeit mostly theoretical rather than practical. (I was a mathematician before I was a software engineer.) >Together these two conditions ensure, among other > things, that the average error converges to zero smoothly and that the > autocorrelation for the error signal stays close to zero for all time > lags. > > Viewed from the other side: if you had a biased error signal, the PLL > would converge to a fixed offset to UTC that was representative of that > bias. If the spectrum was not white, then the PLL would develop a > time-variable offset around UTC (which could end up as an oscillation). OK, that was *useful*. I had grasped the implications of bias, but I hadn't clearly visualized how a non-white error spectrum would cash out in the time domain. But it makes perfect sense to me now, yeah. Your oscillating error will correspond to where there's density in the error spectrum. Thanks. -- http://www.catb.org/~esr/;>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Achim Gratz: > > Therefore I *deduce* that the PLL correction (the one NTP does, not > > the in-kernel one Hal tells us is associated with PPS) requires a > > monotonically increasing clock. It's the simplest explanation for the > > way libntp/systime.c works, and it explains *everything* that has puzzled > > me about that code. > > The thing the PLL (more specifically the loop filter) should care about > is that the error estimate it makes is unbiased and has a (relatively) > white spectrum. That's exactly what doesn't happen when you have a > clock that jumps and you try to read it several times inbetween those > jumps. Is "unbiased and has a (relatively) white spectrum" equivalent to looking like symmetrical digital white noise around actual UTC, if you knew what it was? (I'm asking this question because my inituitions about analog-level signal processing are still weak.) -- http://www.catb.org/~esr/;>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Kurt Roeckx: > All my real amd64 boxes show: > [0.540511] Switched to clocksource hpet > [3.327348] Switched to clocksource tsc > > But my armel shows: > [5.722800] Switching to clocksource orion_clocksource > > A KVM guest shows: > [0.392136] Switched to clocksource kvm-clock Sorry, I don't know how to interpret this. -- http://www.catb.org/~esr/;>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
e...@thyrsus.com said: >> Mark/Eric: Can you guarantee that we will never run on >> a system with a crappy clock? In this context, crappy means >> one that takes big steps. > OK, now that I think I understand this issue I'm going to say "Yes, we can > assume this". > All x86 machines back to the Pentium (1993) have a hardware cycle counter; > it's called the TSC. As an interesting detail, this was a 64-bit register > even when the primary word size was 32 bits. > All ARM processors back to the ARM6 (1992) have one as well. A little web > searching finds clear indications of cycle counters on the UltraSparc (SPARC > V9), Alpha, MIPS, PowerPC, IA64 and PA-RISC. On ARM, you can't read it from user land unless a mode bit is set. Last time I tried, it wasn't set. I found directions on how to set it, but that required building a kernel module and I never got that far. > I also hunted for information on dedicated smartphone processors. I found > clear indication of a cycle counter on the Qualcomm Snapdragon and clouded > ones for Apple A-series processors. The Nvidia Tegra, MediaTek, HiSilicon > and Samsung HyExynos chips are all recent ARM variants and can therefore be > assumed to have an ARM %tick register. > Reading between the lines, it looks to me like this hardware feature became > ubiquitous in the early 1990s and that one of the drivers was > hardware-assisted crypto. It is therefore *highly* unlikely to be omitted > from any new design, even in low-power embedded. And if you have a TSC, > sampling it is a trivial handful of assembler instructions. What does that have to do with crypto? I've never used it for anything other than timing. > I think I can take it from here. -- Just to make sure we are all on the same wavelength... User code never reads that register. Modern kernels use it for timekeeping. I think kernels went through 3 stages: Old old kernels were very coarse. They bumped the clock on an interrupt. End of story. Old kernels use the TSC (or equivalent) to interpolate between interrupts. (A comment from Mills clued me in. I was plotting temperature vs drift. It got a lot cleaner when I moved the temperature probe from the CPU crystal over to the RTC/TOY clock crystal. I haven't looked for the code.) Current kernels don't use interrupts for keeping time. It's all done with the TSC. There is an interesting worm in this area. Most PCs fuzz the CPU frequency to meet EMI regulations. There used to be a separate clock-generator chip: crystal in, all-the-clocks-you-need out. It's all in the big-chip now, but you can get specs for the old chips. The logic controlling the PLL deliberately down modulated the CPU frequency by a 1/2% or so at a (handwave) 30KHz rate. -- Just because the hardware has a TSC (or equivalent), doesn't mean that the software uses it. I wouldn't be all that surprised if the OS for an IoT size device still had an old-old clock routine. We should write a hack program to collect data and make pretty histograms or whatever. If we are smart enough, we can probably make it scream and shout if it ever finds an old-old/coarse clock. If we are lucky, we can run that early in the install path. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Yo Fred! On Tue, 24 Jan 2017 18:48:43 -0800 (PST) Fred Wrightwrote: > > > If one is dithering, the amount of dither should be based on the > > > clock's actual resolution, *not* the time required to read it. > > > In a sampled system, one would add dither equal to the > > > quantization interval, in order to produce results statistically > > > similar to sampling with infinite resolution. For time values, > > > one would add dither equal to the clock's counting period, to > > > produce results statistically similar to a clock running at > > > infinite frequency. > > > > Possibly, but that is not how it works now. And would it be an > > improvement? Bring on the experiments! > > I didn't say that it's worthwhile on modern systems; in fact I said > exactly the opposite further down. But if one *is* going to dither, > then the clock period is the correct amount. That's a peak-to-peak > value, so if one is adding signed dither, then the magnitude should > be half that. Well, that is not what NTP does. NTP dithers on the smallest time to read the clock consecutively (sys_fuzz), then NTP ensures the time is unique and increasing. Given that the clock period is usually way longer than the sys_fuzz, I think sys_fuzz is the thing to dither on. > > > That's not uncommon, but it's a really bad idea. Demanding that a > > > clock always return unique values is an unwarranted extension of > > > the job description of a clock. > > > > Well then, you just said the current NTP implementation is a bad > > idea. > > No, what I said is that it's a bad idea for an *OS time function* to > corrupt the value in the name of uniqueness. That's what Hal was > talking about. I don't agree, but getting off topic. > > In practice, with nano Second resolution clocks doing > > CLOCK_MONOTOMIC is not hard. > > Not necessarily (assuming you're actually talking about uniqueness > rather than mere monotonicity), for a couple of reasons: Getting lost in words that mean different things depending on the context: POSIX, NTP, etc. So, not gonna put a word on it, since the word is not important. What matters is that NTP always uses unique and increasing time. > 1) Most clock counters don't really run at 1GHz, so they don't really > have nanosecond resolution. (in spite of what clock_getres() may > say). Ah, back to the dictionary! clock_gettime() does resolve to nano Second. Just read the man page. struct timespec { time_t tv_sec;/* seconds */ long tv_nsec; /* nanoseconds */ }; Them things there are nano Seconds. The reolution tells you nothing about how much it increments every time that it increments. Integer resolution, in this context, is just the smallest increment of time that can be represented, it says nothing about the precision, accuracy, increment or anthing else about the time that is represented. I can see this very easily on my RasPi's. NTP reads the clock to one nano Second, but it increments by much larger amounts. This shows the effect: https://pi4.rellim.com/day/#local_clock_time_offset_histogram > 2) Even if the clock really did run at 1GHz, if it could be read in > under 1ns it would still be "coarse". I'm not aware of any systems > that can *currently* do that, but it's certainly not beyond the realm > of possibility. Assuming that machines will never be faster than X > is one of those not-future-proof assumptions like Y2K. Not gonna worry about things that don't happen yet or soon. > Note that "monotonic" does not necessarily mean unique. I'll try to bring this back on topic again: NTP cares nothing about MONOTONIC. The word in a NOP in NTP land. No point bike shedding it. > > > The proper way to derive unique values > > > from a clock is to wrap it with something that fudges *its* > > > values as needed, without inflicting lies on the clock itself. > > > > Sorta circular since NTP reads the system clock, applies fudge, then > > adjusts the sysclock t match. > > Umm, I think you're assuming that "fudges" above means some kind of > NTP time adjustment. Yup, that thing that NTP does, whether or not we understand it, it clearly does it. > I used it in the generic "fudge factor" sense, > in this case meaning whatever adjustment is needed to ensure > uniqueness. Does not change my comment on your comment. > Suppose one has: You basically duplicated the Lamport rules, which is what NTP already does, but w/o the sys_fuzz thing that NTP does. > > > Also note that in some contexts it's reasonable to extend the > > > resolution of a "coarse" clock (without breaking "fine" clocks) by > > > reading the clock in a loop until the value changes. This > > > approach is completely neutered by a uniqueness kludge. > > > > I do not see how that helps NTP, just adds latency. > > Of course. But in *some contexts* it's useful, and it's broken if > the OS insists on corrupting the
Re: sys_fuzzMime-Version: 1.0
On Tue, 24 Jan 2017, Gary E. Miller wrote: > On Tue, 24 Jan 2017 15:22:20 -0800 (PST) > Fred Wrightwrote: > > If one is dithering, the amount of dither should be based on the > > clock's actual resolution, *not* the time required to read it. In a > > sampled system, one would add dither equal to the quantization > > interval, in order to produce results statistically similar to > > sampling with infinite resolution. For time values, one would add > > dither equal to the clock's counting period, to produce results > > statistically similar to a clock running at infinite frequency. > > Possibly, but that is not how it works now. And would it be an > improvement? Bring on the experiments! I didn't say that it's worthwhile on modern systems; in fact I said exactly the opposite further down. But if one *is* going to dither, then the clock period is the correct amount. That's a peak-to-peak value, so if one is adding signed dither, then the magnitude should be half that. > > > There is an additional worm in this can. Some OSes with crappy > > > clocks bumped the clock by a tiny bit each time you read it so that > > > all clock-reads returned different results and you could use it for > > > making unique IDs. > > > > That's not uncommon, but it's a really bad idea. Demanding that a > > clock always return unique values is an unwarranted extension of the > > job description of a clock. > > Well then, you just said the current NTP implementation is a bad idea. No, what I said is that it's a bad idea for an *OS time function* to corrupt the value in the name of uniqueness. That's what Hal was talking about. > In practice, with nano Second resolution clocks doing CLOCK_MONOTOMIC > is not hard. Not necessarily (assuming you're actually talking about uniqueness rather than mere monotonicity), for a couple of reasons: 1) Most clock counters don't really run at 1GHz, so they don't really have nanosecond resolution. (in spite of what clock_getres() may say). 2) Even if the clock really did run at 1GHz, if it could be read in under 1ns it would still be "coarse". I'm not aware of any systems that can *currently* do that, but it's certainly not beyond the realm of possibility. Assuming that machines will never be faster than X is one of those not-future-proof assumptions like Y2K. Note that "monotonic" does not necessarily mean unique. Mathematically, it means that values are either strictly nondecreasing or strictly nonincreasing. In the context of time, only the former interpretation makes sense, but it doesn't prohibit repeated values. Uniqueness and monotonicity are orthogonal properties. Nothing in the POSIX spec says that CLOCK_MONOTONIC values are guaranteed to be unique. See: http://pubs.opengroup.org/onlinepubs/9699919799/ It doesn't really say much of anything, except that the epoch is arbitrary and that it isn't adjusted by clock_settime(). The absence of backward steps from the latter is where the monotonicity comes from. > > The proper way to derive unique values > > from a clock is to wrap it with something that fudges *its* values as > > needed, without inflicting lies on the clock itself. > > Sorta circular since NTP reads the system clock, applies fudge, then > adjusts the sysclock t match. Umm, I think you're assuming that "fudges" above means some kind of NTP time adjustment. I used it in the generic "fudge factor" sense, in this case meaning whatever adjustment is needed to ensure uniqueness. Suppose one has: clock_val_t get_time(void); Then (ignoring thread safety) one could have something like: clock_val_t get_unique_time(void) { static clock_val_t last_time = 0; clock_val_t new_time = get_time(); return new_time > last_time ? (last_time = new_time) : ++last_time; } The result is both unique and monotonic, and differs from the actual time by the minimum amount necessary to meet those conditions. That code of course assumes that clock_val_t is an integer, and gets messier with multi-component time representations like "struct timespec". > > Also note that in some contexts it's reasonable to extend the > > resolution of a "coarse" clock (without breaking "fine" clocks) by > > reading the clock in a loop until the value changes. This approach > > is completely neutered by a uniqueness kludge. > > I do not see how that helps NTP, just adds latency. Of course. But in *some contexts* it's useful, and it's broken if the OS insists on corrupting the time in the name of uniqueness. > > The clock_getres() function is supposed to report the actual clock > > resolution, which is what should determine the amount of dither, but > > in practice it's rarely correctly implemented. E.g., in the Linux > > cases I've tested, it ignores the hardware properties and just retuns > > 1ns. > > And it probably can not even determine the hardware properties. It knows perfectly well what the actual (or at least nominal) oscillator
Re: sys_fuzzMime-Version: 1.0
Hal Murray: > > g...@rellim.com said: > > Makes no sense to me. Adding randomness helps when you have hysteresis, > > stiction, friction, lash and some other things, but none of those apply to > > NTP. > > The NTP case is roughly stiction. Remember the age of this code. It was > working long before CPUs had instructions to read a cycle counter. Back > then, the system clock was updated on the scheduler interrupt. There was no > interpolation between ticks. *blink* I think I just achieved enlightenment. Gary, Hal, please review the following carefully to ensure that I haven't updated my beliefs wrongly. Stiction in this context = "adjacent clock reads could get back the same value", is that right? Suddenly a whole bunch of things, like the implications of only updating the clock on a scheduler interrupt, make sense. And now I think I get (a) why Mills fuzzed the clock, and (b) why the code is so careful about checking for clock stepback. If your working assumption is that the clock will only update on a scheduler tick, and your PLL correction requires you to have a monotonically increasing clock, stiction is *bad*. You have no choice but to fuzz the clock, and the probabilistically least risky way to do it is by around half the tick interval, but because random is random you cannot guarantee when you have to do it twice between ticks that your second pseudosample will be greater than your first. You need what the code calls a "Lamport violation" check to throw out bad pseudosamples. Therefore I *deduce* that the PLL correction (the one NTP does, not the in-kernel one Hal tells us is associated with PPS) requires a monotonically increasing clock. It's the simplest explanation for the way libntp/systime.c works, and it explains *everything* that has puzzled me about that code. I love this project - it makes me learn new things. > Mark/Eric: Can you guarantee that we will never run on a system with a crappy > clock? In this context, crappy means one that takes big steps. OK, now that I think I understand this issue I'm going to say "Yes, we can assume this". All x86 machines back to the Pentium (1993) have a hardware cycle counter; it's called the TSC. As an interesting detail, this was a 64-bit register even when the primary word size was 32 bits. All ARM processors back to the ARM6 (1992) have one as well. A little web searching finds clear indications of cycle counters on the UltraSparc (SPARC V9), Alpha, MIPS, PowerPC, IA64 and PA-RISC. I also hunted for information on dedicated smartphone processors. I found clear indication of a cycle counter on the Qualcomm Snapdragon and clouded ones for Apple A-series processors. The Nvidia Tegra, MediaTek, HiSilicon and Samsung HyExynos chips are all recent ARM variants and can therefore be assumed to have an ARM %tick register. Reading between the lines, it looks to me like this hardware feature became ubiquitous in the early 1990s and that one of the drivers was hardware-assisted crypto. It is therefore *highly* unlikely to be omitted from any new design, even in low-power embedded. And if you have a TSC, sampling it is a trivial handful of assembler instructions. > I thinnk that all Gary's test proved is that his system doesn't have a crappy > clock. Yes. Agreed. > If we are serious about getting rid of that code, I'll put investigating that > area higher on my list. I think we have more important things to do. I think I can take it from here. -- http://www.catb.org/~esr/;>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
Yo Fred! On Tue, 24 Jan 2017 15:22:20 -0800 (PST) Fred Wrightwrote: > On Tue, 24 Jan 2017, Gary E. Miller wrote: > > > Last week we had a discussion on sys_fuzz and the value of adding > > random noise to some measurements. The code defi2nes sys_fuzz asL > > > > "* The sys_fuzz variable measures the minimum time to read the > > system > > * clock, regardless of its precision." > > > > Rondomness of half the sys_fuzz is then added to some values, like > > this: > > > > fuzz = ntp_random() * 2. / FRAC * sys_fuzz > > > > Makes no sense to me. Adding randomness helps when you have > > hysteresis, stiction, friction, lash and some other things, but > > none of those apply to NTP. > > Basing it on the time to *read* the clock definitely makes no sense, > although I suspect one would have to dig back fairly far in the > history to determine the source of that confusion. Just look at that commit, and compare to the bug report. The confusion is obvious there... > If one is dithering, the amount of dither should be based on the > clock's actual resolution, *not* the time required to read it. In a > sampled system, one would add dither equal to the quantization > interval, in order to produce results statistically similar to > sampling with infinite resolution. For time values, one would add > dither equal to the clock's counting period, to produce results > statistically similar to a clock running at infinite frequency. Possibly, but that is not how it works now. And would it be an improvement? Bring on the experiments! > > There is an additional worm in this can. Some OSes with crappy > > clocks bumped the clock by a tiny bit each time you read it so that > > all clock-reads returned different results and you could use it for > > making unique IDs. > > That's not uncommon, but it's a really bad idea. Demanding that a > clock always return unique values is an unwarranted extension of the > job description of a clock. Well then, you just said the current NTP implementation is a bad idea. In practice, with nano Second resolution clocks doing CLOCK_MONOTOMIC is not hard. > The proper way to derive unique values > from a clock is to wrap it with something that fudges *its* values as > needed, without inflicting lies on the clock itself. Sorta circular since NTP reads the system clock, applies fudge, then adjusts the sysclock t match. > Also note that in some contexts it's reasonable to extend the > resolution of a "coarse" clock (without breaking "fine" clocks) by > reading the clock in a loop until the value changes. This approach > is completely neutered by a uniqueness kludge. I do not see how that helps NTP, just adds latency. > 1) If it's a "coarse" clock, then dithering destroys monotonicity. Did you read the bug report? That is exactly what was happening, and worse, Thus the fix. > 2) Determining the proper amount of dither isn't necessarily easy. Yup. > The clock_getres() function is supposed to report the actual clock > resolution, which is what should determine the amount of dither, but > in practice it's rarely correctly implemented. E.g., in the Linux > cases I've tested, it ignores the hardware properties and just retuns > 1ns. And it probably can not even determine the hardware properties. > I'm not convinced that sub-microsecond dithering is worthwhile, > anyway. If the dithering code is retained at all, it might make sense > to have a configure test that reads clock_getres(), and only enables > dithering support if the result is more than a microsecond. That > test would be unaffected by the aforementioned lies in > clock_getres(). Though there'd need to be a way to force dithering > on for testing, since it's unlikely that any test platforms would use > it naturally. And those sorts of configure tests are problematic for > cross-building. Even the clock_getres() man page warns the return values may return "bogus results". Next... > BTW, if the only use for randomness is for computational dithering, > and not for security, then there's no need for crypto-quality > randomness. So far that looks like the case, plus adding a nonce in the LSB's of timestamps. But that will not last, the autokey replacement should be here 'soon'. This year or next. > In that case, why not just read /dev/urandom directly > and dispense with the whole libsodium mess? All ntpd uses libsodium for is to read /dev/urandom, and for the many cases that /dev/urandom does not exist, or better exists, then some other way. I'd rather see it go too, but I see no easy path to get there. RGDS GARY --- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 g...@rellim.com Tel:+1 541 382 8588 Veritas liberabit vos. -- Quid est veritas? "If you can’t measure it, you can’t improve it." - Lord Kelvin pgp6E8rMkP7mt.pgp Description: OpenPGP digital
Re: sys_fuzzMime-Version: 1.0
On Tue, 24 Jan 2017, Gary E. Miller wrote: > Last week we had a discussion on sys_fuzz and the value of adding > random noise to some measurements. The code defi2nes sys_fuzz asL > > "* The sys_fuzz variable measures the minimum time to read the system > * clock, regardless of its precision." > > Rondomness of half the sys_fuzz is then added to some values, like this: > > fuzz = ntp_random() * 2. / FRAC * sys_fuzz > > Makes no sense to me. Adding randomness helps when you have hysteresis, > stiction, friction, lash and some other things, but none of those apply > to NTP. Basing it on the time to *read* the clock definitely makes no sense, although I suspect one would have to dig back fairly far in the history to determine the source of that confusion. If one is dithering, the amount of dither should be based on the clock's actual resolution, *not* the time required to read it. In a sampled system, one would add dither equal to the quantization interval, in order to produce results statistically similar to sampling with infinite resolution. For time values, one would add dither equal to the clock's counting period, to produce results statistically similar to a clock running at infinite frequency. On Tue, 24 Jan 2017, Hal Murray wrote: > The NTP case is roughly stiction. Remember the age of this code. It was > working long before CPUs had instructions to read a cycle counter. Back > then, the system clock was updated on the scheduler interrupt. There was no > interpolation between ticks. Indeed. The interrupt was often derived from the power line, making the clock resolution 16.7ms or 20ms. With such crummy resolution, applying some "whitening" looks attractive. > Mark/Eric: Can you guarantee that we will never run on a system with a crappy > clock? In this context, crappy means one that takes big steps. There are two different time intervals involved - the interval between successive time values, and the time required to read the clock. I'd use the term "coarse" to describe a clock where the former is larger than the latter, such that it's possible to read the same value more than once. If you mean "big steps" in the absolute sense, then for some meaning of "big", the term "crappy" is warranted. :-) But note that a clock can be "coarse" without being "crappy". For example, a clock running at 10MHz isn't particularly "crappy", but if it can be read in 50ns, then it's still "coarse". > There is an additional worm in this can. Some OSes with crappy clocks bumped > the clock by a tiny bit each time you read it so that all clock-reads > returned different results and you could use it for making unique IDs. That's not uncommon, but it's a really bad idea. Demanding that a clock always return unique values is an unwarranted extension of the job description of a clock. The proper way to derive unique values from a clock is to wrap it with something that fudges *its* values as needed, without inflicting lies on the clock itself. Any clock classified as "coarse" by the above definition is corrupted by a uniqueness requirement, whether "crappy" or not. Also note that in some contexts it's reasonable to extend the resolution of a "coarse" clock (without breaking "fine" clocks) by reading the clock in a loop until the value changes. This approach is completely neutered by a uniqueness kludge. Getting back to the original issue, if dithering is warranted, then there are a couple of pitfalls: 1) If it's a "coarse" clock, then dithering destroys monotonicity. In *some* (mainly statistical) contexts, non-monotonic time values may be perfectly OK, but in any context involving intervals they can be disastrous. So one would probably need to keep both dithered and undithered time values. 2) Determining the proper amount of dither isn't necessarily easy. The clock_getres() function is supposed to report the actual clock resolution, which is what should determine the amount of dither, but in practice it's rarely correctly implemented. E.g., in the Linux cases I've tested, it ignores the hardware properties and just retuns 1ns. I'm not convinced that sub-microsecond dithering is worthwhile, anyway. If the dithering code is retained at all, it might make sense to have a configure test that reads clock_getres(), and only enables dithering support if the result is more than a microsecond. That test would be unaffected by the aforementioned lies in clock_getres(). Though there'd need to be a way to force dithering on for testing, since it's unlikely that any test platforms would use it naturally. And those sorts of configure tests are problematic for cross-building. BTW, if the only use for randomness is for computational dithering, and not for security, then there's no need for crypto-quality randomness. In that case, why not just read /dev/urandom directly and dispense with the whole libsodium mess? Fred Wright ___ devel mailing
Re: sys_fuzzMime-Version: 1.0
Yo Hal! On Tue, 24 Jan 2017 13:46:59 -0800 Hal Murraywrote: > g...@rellim.com said: > > Makes no sense to me. Adding randomness helps when you have > > hysteresis, stiction, friction, lash and some other things, but > > none of those apply to NTP. > > The NTP case is roughly stiction. Remember the age of this code. It > was working long before CPUs had instructions to read a cycle > counter. Back then, the system clock was updated on the scheduler > interrupt. There was no interpolation between ticks. You gotta squint real hard to see that as stiction, but not worth debating the proper word. So it may have mattered back then, but do we need to carry this legacy code. > Mark/Eric: Can you guarantee that we will never run on a system with > a crappy clock? In this context, crappy means one that takes big > steps. The has nothing to do with clock steps, this has to do with how fast the clock can be read. > I thinnk that all Gary's test proved is that his system doesn't have > a crappy clock. Yes, more testing required. You got ideas what to test next? > There is an additional worm in this can. Some OSes with crappy > clocks bumped the clock by a tiny bit each time you read it so that > all clock-reads returned different results and you could use it for > making unique IDs. This is a POSIX requirement for CLOCK_MONOTONIC. Also unrelated to how fast the clock can be read. > If we are serious about getting rid of that code, I'll put > investigating that area higher on my list. I think we have more > important things to do. Yes, not high on the list, but so easy to test, and so long to get results, that it is worth thinking about. Anytime we can remove unneeded code and noise from ntpd it is good. I figure we could remove 100 LOC easy. Some of these older systems, like G5 Macintosh, may be a good test. Prolly should test in some VM's too. In libntp/systime.c, I just make set_sys_fuzz() always set the sys_fuzz to 0.0: { + fuzz_val = 0.0; /* GEM */ sys_fuzz = fuzz_val; RGDS GARY --- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 g...@rellim.com Tel:+1 541 382 8588 Veritas liberabit vos. -- Quid est veritas? "If you can’t measure it, you can’t improve it." - Lord Kelvin pgp9YxNMK9Nrn.pgp Description: OpenPGP digital signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: sys_fuzzMime-Version: 1.0
g...@rellim.com said: > Makes no sense to me. Adding randomness helps when you have hysteresis, > stiction, friction, lash and some other things, but none of those apply to > NTP. The NTP case is roughly stiction. Remember the age of this code. It was working long before CPUs had instructions to read a cycle counter. Back then, the system clock was updated on the scheduler interrupt. There was no interpolation between ticks. Mark/Eric: Can you guarantee that we will never run on a system with a crappy clock? In this context, crappy means one that takes big steps. I thinnk that all Gary's test proved is that his system doesn't have a crappy clock. There is an additional worm in this can. Some OSes with crappy clocks bumped the clock by a tiny bit each time you read it so that all clock-reads returned different results and you could use it for making unique IDs. If we are serious about getting rid of that code, I'll put investigating that area higher on my list. I think we have more important things to do. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel