> If I remember correctly, last time someone tried to pursuade me to use > Maurer's test, my problem with it was that it was too memory intensive > and too CPU intensive to use in the kernel. [please] explain to me > Maurer's test and how to do it in English, and then try to pursuade me > that it's actually feasible to do it in the kernel. hiya, ted -- i too considered building maurer's test into the kernel, when i built a disk-noise trng as a pseudodev driver. as you say, maurer's test is only so effective as an rng test, and it doesn't really belong in the kernel, anyway. but, here's my effort to explain why maurer's test works, and how you might use it to advantage. recall that the test involves keeping a table of interarrival times for all of the possible sample-values that you can get. that is, for each possible value, you record the value's last- seen arrival-time T. when you get a new value at time T', you look up its last arrival-time T, take the difference T'-T, and tally the base-2 log of that difference. when you're done with a large batch of numbers, you get an approximate bit-rate, amazingly enough. first, i'll describe _why_ maurer's test should work at all, then i'll discuss how & why it might work in dev/random. finally, i'll mention some problems. why it works: when a sample value appears again after an interarrival time dT, dT's bits _are_ the value's noise-content. frequent appearances bring us little noise each time, while infrequent appearances bring lots of noise, but only rarely. log2(dT) is just the bit-length of the interarrival time, and so is a reasonable way to measure dT's noise-content. maurer's test yields a weighted average for these bit-lengths, since the test tallies the commonest sample-values' IAT bit-lengths more often. so, maurer's test plausibly yields an average bit-rate for the whole data-stream. the kernel: if you look at only the low-order 8 bits of each raw sample, then your interarrival-times can be 16 bits long, so your table of interarrival-times is only 512 bytes long. similarly, you can do the logarithms without floating-point arithmetic: use masking or shifting to get the integer part of the log, and use a short lookup-table to translate the four most-sig bits of the IAT into the four most-sig bits of the mantissa. you don't need full- accuracy logarithms at all. the log lookup-table might occupy 16 bytes for each integer value of the log, or just over 200 bytes. i think you could get good-enough results without the log lookup table, by just counting the IAT's leading zeroes to get the log's integer part. why bother?: my chief concern is not to estimate the entropy- pool exactly, but to sensitively detect the failure of (some of) your noise-sources. this entropy-estimate can do that. getting an accurate fuel-gauge is just a happy side-effect. this 8-bit test will not detect subtle failures, though, like correlations between your samples; it's _only_ a runtime sanity check, not a substitute for extensive bench testing. problems: maurer's test _must_ be applied to the raw noise- samples, not to the hashed data, because entropy-measures like maurer's are easily fooled by pseudo-random mixing. also, realize that the test's reported bit-rate will jitter, so that only a consistent run of low bit-rates is alarming. finally, abnormally high bit-rates may indicate RNG failure, too, since you know that the raw data should not be uniformly distributed. - don davis, boston -