On 6/5/16, Eric S. Raymond <e...@thyrsus.com> wrote: > Unless you set up behavioral replicability (that is, an environment in > which a known sequence of clock readings, I/O events, and other > syscalls leads to another known sequence, or at least correct > recognition teatures of same like ntpq -p showing what you expect) you > don't have testing - because you don't know what output features > discriminate between success and failure pf the test.
So weaken your notion of replicability from bit-for-bit-consistent results, to statistical behavior of a linear time-invariant system. Report test results as p-values rather than pass/fail. If you're manually testing a client talking a server 10ms away, and after several queries you're still seeing deltas of 20ms, then you know something is horribly broken. If all your deltas are inside 2µs, that's damned suspicious too. The intuitions you're applying here can be made rigorous and your testing made replicable by collecting statistics on delta values from a believed-good baseline and then applying a KS test to see if the version you're testing follows the same distribution. You can automate testing like this entirely on real hardware without having to spoof any inputs at all. _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel