random

Don Davis Tue, 31 Aug 1999 07:32:28 -0700
> If I remember correctly, last time someone tried to pursuade me to use
> Maurer's test, my problem with it was that it was too memory intensive
> and too CPU intensive to use in the kernel.  [please] explain to me
> Maurer's test and how to do it in English, and then try to pursuade me
> that it's actually feasible to do it in the kernel.

hiya, ted --

   i too considered building maurer's test into the kernel, when
i built a disk-noise trng as a pseudodev driver.  as you say,
maurer's test is only so effective as an rng test, and it doesn't
really belong in the kernel, anyway.  but, here's my effort to
explain why maurer's test works, and how you might use it to
advantage.

   recall that the test involves keeping a table of interarrival
times for all of the possible sample-values that you can get.
that is, for each possible value, you record the value's last-
seen arrival-time T.  when you get a new value at time T', you
look up its last arrival-time T, take the difference T'-T, and
tally the base-2 log of that difference.  when you're done with
a large batch of numbers, you get an approximate bit-rate,
amazingly enough.

   first, i'll describe _why_ maurer's test should work at all,
then i'll discuss how & why it might work in dev/random.  finally,
i'll mention some problems.

   why it works:  when a sample value appears again after an
interarrival time dT, dT's bits _are_ the value's noise-content.
frequent appearances bring us little noise each time, while
infrequent appearances bring lots of noise, but only rarely.
log2(dT) is just the bit-length of the interarrival time, and
so is a reasonable way to measure dT's noise-content.  maurer's
test yields a weighted average for these bit-lengths, since the
test tallies the commonest sample-values' IAT bit-lengths more
often.  so, maurer's test plausibly yields an average bit-rate
for the whole data-stream.

   the kernel:  if you look at only the low-order 8 bits of each
raw sample, then your interarrival-times can be 16 bits long, so
your table of interarrival-times is only 512 bytes long.  similarly,
you can do the logarithms without floating-point arithmetic:  use
masking or shifting to get the integer part of the log, and use a
short lookup-table to translate the four most-sig bits of the IAT
into the four most-sig bits of the mantissa.  you don't need full-
accuracy logarithms at all.  the log lookup-table might occupy 16
bytes for each integer value of the log, or just over 200 bytes.
i think you could get good-enough results without the log lookup
table, by just counting the IAT's leading zeroes to get the log's
integer part.

   why bother?:  my chief concern is not to estimate the entropy-
pool exactly, but to sensitively detect the failure of (some of)
your noise-sources.  this entropy-estimate can do that.  getting
an accurate fuel-gauge is just a happy side-effect.  this 8-bit
test will not detect subtle failures, though, like correlations
between your samples;  it's _only_ a runtime sanity check, not a
substitute for extensive bench testing.

   problems:  maurer's test _must_ be applied to the raw noise-
samples, not to the hashed data, because entropy-measures like
maurer's are easily fooled by pseudo-random mixing.  also, realize
that the test's reported bit-rate will jitter, so that only a
consistent run of low bit-rates is alarming.  finally, abnormally
high bit-rates may indicate RNG failure, too, since you know that
the raw data should not be uniformly distributed.

                                                - don davis, boston





-
(resend) Re: Summary re: /dev/random

Reply via email to