Re: rnd entropy estimate running low?

2017-01-31 Thread Taylor R Campbell
> Date: Tue, 31 Jan 2017 12:00:29 -0500
> From: Thor Lancelot Simon 
> 
> Maybe we should alert in a more sophisticated way -- monitor the failure
> rate and alert only if it is significantly above the expected rate.
> 
> I might even remember enough statistics to do that.

For production, if you believe rngtest is correctly implemented to
have false rejection rate 3/1 on the null hypothesis of uniform
random source data (I assume this mean average three failures in every
ten thousand trials, where each trial tests a 2-bit block of data
as specified in src/sys/sys/rngtest.h?), you might as well just tweak
it so it has a much lower false rejection rate.

However, if you are not sure rngtest as implemented has the intended
false rejection rate alpha = 3/1, and you want to test it
empirically, you could run it on what you know to be a uniform random
source and stochastically test the probabilistic program that rngtest
is, along the lines of

Alexey Radul, `On Testing Probabilistic Programs', Alexey Radul's
blog, April 29, 2016.
http://alexey.radul.name/ideas/2016/on-testing-probabilistic-programs/

A test of the null hypothesis alpha <= 3/1 that has false
rejection rate 5% and true rejection rate 80% under the alternative
hypothesis that alpha >= 4/1 (typical choices for statistical
significance and statistical power in undergrad statistics and
run-of-the-mill scientific publications), requires 213 294 trials,
according to Alexey's script there.

Depending on how fast rngtest runs, that may not be terrible for
automatic tests -- my laptop's CPU can generate enough source data
with software AES-CTR in 3 seconds -- except that if we put it into
the atf tests, on average one in twenty runs would spuriously fail,
which means we'd see spurious failures every day or two.

So we really want the false rejection rate to be closer to .001% --
one in a hundred thousand, or about three in a century at our current
rate of releng atf runs -- which requires 885 968 trials.  If we want
true rejection rate 80% under the stronger alternative hypothesis that
alpha >= 3.1/1, it requires 72 282 164 trials, which is quite
excessive on my laptop's CPU -- 10-15 minutes just to generate the
source data -- and is doubtless totally unacceptable on slower
hardware.


Re: rnd entropy estimate running low?

2017-01-31 Thread Martin Husemann
On Tue, Jan 31, 2017 at 05:54:37PM +0100, Martin Husemann wrote:
> On Tue, Jan 31, 2017 at 11:45:55AM -0500, Thor Lancelot Simon wrote:
> > The only time we've ever really dug into it, I believe, the user decided
> > the failures were right around the expected failure rate.  Can you help
> > gather more data?
> 
> Good point, and I am not sure this might cover my case as well - will have
> a look and start gathering better data.

Hmm, for a simple start:

Jan 31 16:00:16 night-owl /netbsd: Kernel RNG "7517 14 4" monobit test FAILURE: 
9709 ones
Jan 31 16:00:16 night-owl /netbsd: cprng 7517 14 4: failed statistical RNG test
Jan 31 20:20:12 night-owl /netbsd: Kernel RNG "11702 2 8" runs test FAILURE: 
too many runs of 4 0s (395 >= 384)
Jan 31 20:20:12 night-owl /netbsd: cprng 11702 2 8: failed statistical RNG test
Jan 31 20:20:25 night-owl /netbsd: Kernel RNG "16549 4 9" runs test FAILURE: 
too few runs of 2 0s (1107 <= 1114)
Jan 31 20:20:25 night-owl /netbsd: cprng 16549 4 9: failed statistical RNG test
Jan 31 21:21:05 night-owl /netbsd: Kernel RNG "7429 2 9" poker test failure: 
parameter X = 2.8640
Jan 31 21:21:05 night-owl /netbsd: cprng 7429 2 9: failed statistical RNG test
Jan 31 21:21:47 night-owl /netbsd: Kernel RNG "17166 149 10" long run test 
FAILURE: Run of 26 0s found
Jan 31 21:21:47 night-owl /netbsd: cprng 17166 149 10: failed statistical RNG 
test

This is my amd64 notebook while doing a few ssh sessions and a pkg_chk
rebuild. Nothing involved that really (AFAIU) needs serious ammounts
of entropy. Is there a simple way to get usage statistics?

Martin


Re: rnd entropy estimate running low?

2017-01-31 Thread Thor Lancelot Simon
On Tue, Jan 31, 2017 at 05:54:37PM +0100, Martin Husemann wrote:
> On Tue, Jan 31, 2017 at 11:45:55AM -0500, Thor Lancelot Simon wrote:
> > The only time we've ever really dug into it, I believe, the user decided
> > the failures were right around the expected failure rate.  Can you help
> > gather more data?
> 
> Good point, and I am not sure this might cover my case as well - will have
> a look and start gathering better data.

Maybe we should alert in a more sophisticated way -- monitor the failure
rate and alert only if it is significantly above the expected rate.

I might even remember enough statistics to do that.

-- 
 Thor Lancelot Simon  t...@panix.com

Ring the bells that still can ring.


Re: rnd entropy estimate running low?

2017-01-31 Thread Taylor R Campbell
> Date: Tue, 31 Jan 2017 16:55:38 +
> From: Taylor R Campbell 
> 
> This is roughly to be expected from any stochastic test that has
> nonzero false positive rate.  I have not computed exactly what the
> false rejection rate is under the null hypothesis of uniform random
> bits for these tests.  Someone^TM should do that!
> 
> (These are all classical frequentist hypothesis tests, mostly of
> elementary chi^2, Binomial, , statistics on streams of ones and
> zeros.  If anyone wants a little probability theory and statistics
> exercise, I'd be happy to point you in the right direction for how to
> do this.)

Well, apparently Thor already did this and came up with 3 false
rejections for every 1...bits? 32-bit words? or something, so
never mind!


Re: rnd entropy estimate running low?

2017-01-31 Thread Taylor R Campbell
> Date: Tue, 31 Jan 2017 17:16:33 +0100 (CET)
> From: Havard Eidnes 
> 
> rnd: WARNING! initial entropy low (0).
> rnd: entropy estimate 0 bits
> rnd: asking source callout for 512 bytes
> rnd: system-power attached as an entropy source (collecting)
> mainbus0 (root)
> cpu0 at mainbus0 core 0: 1536 MHz Cortex-A5 r0p1 (Cortex V7A core)
> ...
> 
> I'm assuming this is because this happens too early, the rng
> device hasn't been detected so early in the boot process, and
> there's no file system accessible either to re-initialize the
> kernel rng from either at this stage, and the boot loader doesn't
> have a way to work around this.

The boot loader on x86, at least, can read a seed from the file system
(/var/db/entropy-file) before the kernel even starts, and that should
be fed in quite early.

> (This is more a platform-specific problem, I think, and
> tangential to what I discussed initially.)

Right, but it's an important one!  It is only in the system
engineering that you can get the entropy initialization correct -- no
amount of software can locally massage the inputs it has into a
high-entropy state, if the inputs are no good.  So for this platform
we should try to see when the HWRNG devices attaches and whether
anything needs to use entropy before then.

> OK, I'll buy the crypto argument at face value.  However, our code
> still behaves differently depending on whether the entropy estimate
> is able to "satisfy" the request being processed or not.  So under
> this description that is also a holdover from older versions of this
> code?

Yes.  There are two useful functions for blocking on reads from
/dev/random:

1. Waiting for initial entropy after the system to be booted, which
may mean, e.g., waiting until the on-board HWRNG device has provided
enough data.

2. Exercising the blocking code paths in applications, which would
otherwise occur only sometimes at system boot.

My draft rewrite of the entropy pool that I think you saw at the
devsummit at EuroBSDcon in Stockholm changes the decision of when to
block: if not enough entropy is available, block until it does; if
enough entropy is available, and a coin toss comes up heads, block up
to a second.

> It may be coincidental, but this box when it sits otherwise
> mostly idle and only does ntp for a long while sometimes logs
> 
> Kernel RNG "231 0 1" monobit test FAILURE: 10300 ones
> cprng 231 0 1: failed statistical RNG test
> ...
> 
> Admittedly, these are spread over a larger time period, and a
> couple of them were the result of provocation by dumping data
> from /dev/random with dd.

This is roughly to be expected from any stochastic test that has
nonzero false positive rate.  I have not computed exactly what the
false rejection rate is under the null hypothesis of uniform random
bits for these tests.  Someone^TM should do that!

(These are all classical frequentist hypothesis tests, mostly of
elementary chi^2, Binomial, , statistics on streams of ones and
zeros.  If anyone wants a little probability theory and statistics
exercise, I'd be happy to point you in the right direction for how to
do this.)

However, if it happens repeatedly over a short period of time, you
should be concerned that something is hosed in your kernel or HWRNG.


Re: rnd entropy estimate running low?

2017-01-31 Thread Martin Husemann
On Tue, Jan 31, 2017 at 11:45:55AM -0500, Thor Lancelot Simon wrote:
> The only time we've ever really dug into it, I believe, the user decided
> the failures were right around the expected failure rate.  Can you help
> gather more data?

Good point, and I am not sure this might cover my case as well - will have
a look and start gathering better data.

Martin


Re: rnd entropy estimate running low?

2017-01-31 Thread Thor Lancelot Simon
On Tue, Jan 31, 2017 at 05:40:01PM +0100, Martin Husemann wrote:
> On Tue, Jan 31, 2017 at 11:38:02AM -0500, Thor Lancelot Simon wrote:
> > The statistical failures later in system run might indicate a memory
> > integrity issue, a race condition of some kind, or just be the expected
> > roughly 3/1 random occurrences.  Hard to say without more information.
> 
> I see statistical test failures on various hardware during later system run
> a lot. Others have stated the same, IIRC.

The only time we've ever really dug into it, I believe, the user decided
the failures were right around the expected failure rate.  Can you help
gather more data?

-- 
 Thor Lancelot Simon  t...@panix.com

Ring the bells that still can ring.


Re: rnd entropy estimate running low?

2017-01-31 Thread Martin Husemann
On Tue, Jan 31, 2017 at 11:38:02AM -0500, Thor Lancelot Simon wrote:
> The statistical failures later in system run might indicate a memory
> integrity issue, a race condition of some kind, or just be the expected
> roughly 3/1 random occurrences.  Hard to say without more information.

I see statistical test failures on various hardware during later system run
a lot. Others have stated the same, IIRC.

Martin


Re: rnd entropy estimate running low?

2017-01-31 Thread Thor Lancelot Simon
On Tue, Jan 31, 2017 at 05:16:33PM +0100, Havard Eidnes wrote:
> >> Meanwhile the hardware random generator sits there unused.
> >
> > Does it sit there completely unused, or did it get used a little at
> > boot time?
> 
> It generated some bits at boot time, but apparently not early
> enough, because on each reboot the kernel log looks like this:

It looks like nothing's actually calling for bits except the start-up
statistical test (which itself creates demand) before the hardware RNG
attaches, so there shouldn't be a practical problem.  The question is,
could the hardware RNG attach earlier or the statistical test happen
later -- or doesn't it matter?

The statistical failures later in system run might indicate a memory
integrity issue, a race condition of some kind, or just be the expected
roughly 3/1 random occurrences.  Hard to say without more information.

Thor



Re: rnd entropy estimate running low?

2017-01-31 Thread Havard Eidnes
>> Meanwhile the hardware random generator sits there unused.
>
> Does it sit there completely unused, or did it get used a little at
> boot time?

It generated some bits at boot time, but apparently not early
enough, because on each reboot the kernel log looks like this:

...
total memory = 1024 MB
avail memory = 1007 MB
sysctl_createv: sysctl_create(machine_arch) returned 17
rnd: callout attached as an entropy source (collecting)
rnd: initialised (4096) with counter
rnd: printf attached as an entropy source (collecting without estimation)
rnd: autoconf attached as an entropy source (collecting)
rnd: WARNING! initial entropy low (5).
rnd: starting statistical RNG test, entropy = 6.
rnd: statistical RNG test done, entropy = 6.
rnd: entropy estimate 0 bits
rnd: asking source callout for 512 bytes
rnd: WARNING! initial entropy low (0).
rnd: entropy estimate 0 bits
rnd: asking source callout for 512 bytes
rnd: system-power attached as an entropy source (collecting)
mainbus0 (root)
cpu0 at mainbus0 core 0: 1536 MHz Cortex-A5 r0p1 (Cortex V7A core)
...

I'm assuming this is because this happens too early, the rng
device hasn't been detected so early in the boot process, and
there's no file system accessible either to re-initialize the
kernel rng from either at this stage, and the boot loader doesn't
have a way to work around this.

(This is more a platform-specific problem, I think, and
tangential to what I discussed initially.)

>> I would have thought it would make more sense to keep the "bits
>> currently stored in pool" more "topped up", and that a re-fill
>> could with benefit be done before the estimate crept down towards
>> zero?  Especially if you have a half-way decent hardware random
>> generator at hand?
>
> Actually, no.  One basic conceit of modern symmetric cryptography is
> that from a single small uniform random 256-bit secret, you can derive
> an arbitrarily large uniform random secret.  `Entropy depletion' does
> not really exist as a meaningful concept in modern cryptography.
>
> The entropy accounting that we currently do is a holdover from days of
> yore when the folklore supported it, but the natural information-
> theoretic interpretation of the folklore actually leads to worse
> attacks in practice -- see the rnd(4) man page for details.  So while
> we haven't gotten rid of the kooky accounting, it doesn't really mean
> anything to see the numbers go down.
>
> There is a limit to the output produced by, e.g., AES-CTR, arising
> from the PRP approximation to a PRF and the birthday paradox, and
> there are some US federal government standards (NIST SP800-90A, in
> particular) about PRNG constructions that Thor wanted to make it easy
> to follow, which is why we rekey cprng(9) after a relatively small
> amount of output -- but that happens much slower than the entropy
> accounting you're looking at, and is not reported to userland.

OK, I'll buy the crypto argument at face value.  However, our
code still behaves differently depending on whether the entropy
estimate is able to "satisfy" the request being processed or not.
So under this description that is also a holdover from older
versions of this code?

It may be coincidental, but this box when it sits otherwise
mostly idle and only does ntp for a long while sometimes logs

Kernel RNG "231 0 1" monobit test FAILURE: 10300 ones
cprng 231 0 1: failed statistical RNG test
...
Kernel RNG "15965 0 4" runs test FAILURE: too many runs of 4 1s (386 >= 384)
cprng 15965 0 4: failed statistical RNG test
...
Kernel RNG "27778 0 3" poker test failure: parameter X = 2.9280
cprng 27778 0 3: failed statistical RNG test
...
Kernel RNG "6647 0 3" poker test failure: parameter X = 47.2720
cprng 6647 0 3: failed statistical RNG test
...
Kernel RNG "24153 0 3" long run test FAILURE: Run of 29 0s found
cprng 24153 0 3: failed statistical RNG test
...
Kernel RNG "2551 0 4" poker test failure: parameter X = 47.60320
cprng 2551 0 4: failed statistical RNG test
...

Admittedly, these are spread over a larger time period, and a
couple of them were the result of provocation by dumping data
from /dev/random with dd.

Regards,

- Håvard


Re: rnd entropy estimate running low?

2017-01-31 Thread Taylor R Campbell
> Date: Thu, 12 Jan 2017 21:13:03 +0100 (CET)
> From: Havard Eidnes 
>
> Meanwhile the hardware random generator sits there unused.

Does it sit there completely unused, or did it get used a little at
boot time?  That's the most important time to use it; otherwise it
doesn't really matter, unless you somehow know an attacker has
witnessed the state of the kernel entropy pool, but otherwise expect
the system to be uncompromised.

> I would have thought it would make more sense to keep the "bits
> currently stored in pool" more "topped up", and that a re-fill
> could with benefit be done before the estimate crept down towards
> zero?  Especially if you have a half-way decent hardware random
> generator at hand?

Actually, no.  One basic conceit of modern symmetric cryptography is
that from a single small uniform random 256-bit secret, you can derive
an arbitrarily large uniform random secret.  `Entropy depletion' does
not really exist as a meaningful concept in modern cryptography.

The entropy accounting that we currently do is a holdover from days of
yore when the folklore supported it, but the natural information-
theoretic interpretation of the folklore actually leads to worse
attacks in practice -- see the rnd(4) man page for details.  So while
we haven't gotten rid of the kooky accounting, it doesn't really mean
anything to see the numbers go down.

There is a limit to the output produced by, e.g., AES-CTR, arising
from the PRP approximation to a PRF and the birthday paradox, and
there are some US federal government standards (NIST SP800-90A, in
particular) about PRNG constructions that Thor wanted to make it easy
to follow, which is why we rekey cprng(9) after a relatively small
amount of output -- but that happens much slower than the entropy
accounting you're looking at, and is not reported to userland.


rnd entropy estimate running low?

2017-01-12 Thread Havard Eidnes
Hi,

on a couple of arm boxes I have I've been observing the
development of the entropy estimate, what "rndctl -s" calls "bits
currently stored in pool" over time.

I've also tried to read some of the code to understand the
behaviour.

If I understand correctly, randomness sources come in two basic
flavours: those which offer up randomness samples based on
(possibly external) events, and those which only provide samples
when "asked" to do so.  The hardware randomness generator on my
amlogic arm boards appears to fall into the last category.

On a system with little other active randomness sources (e.g. FS
activity or keyboard / mouse activity), it appears that the "bits
currently stored in pool" will be allowed to decrease quite close
to zero (or even *to* zero) before the polled sources are
queried, via e.g. rnd_extract() only triggering a rnd_getmore()
if it could not initially fulfill the request.  The same also
appears to hold for rnd_tryextract().

Meanwhile the hardware random generator sits there unused.

I would have thought it would make more sense to keep the "bits
currently stored in pool" more "topped up", and that a re-fill
could with benefit be done before the estimate crept down towards
zero?  Especially if you have a half-way decent hardware random
generator at hand?

(This has been observed with both 7.99.47 and 7.99.58, fwiw.)

Regards,

- Håvard