headless nodes

Robert G. Brown Sun, 27 Feb 2011 08:49:30 -0800

On Sat, 26 Feb 2011, Peter St. John wrote:

I've been following this conversation halfheartedly because of midterm
grades due last Friday, but I suppose I should chime in.


  /dev/random is, as noted, an entropy generator.  Usually it has enough
entropy after a boot and some network activity to start generating
random numbers.  However, it is always >>enormously slow<< and can only
generate a very few random numbers before it has to wait for more
entropy to be generated.  To run dieharder tests on it I had to do
things like stroke the mouse pad as though it was a kitty's neck for an
hour or so.  You can actually get it to where you can watch it making
new random numbers AS you do bursty things that it uses as part of the
entropy pool.  It is not, I repeat not, for production runs using very
large numbers of rands unless you want to die of old age before an e.g.
Monte Carlo run finishes.

  /dev/urandom is a mix of a software generator and /dev/random.
Entropy based rands are used to "diffuse" entropy into a software
generator, keeping it from being predictable (and giving it an infinite
period, as it were).  However, you then do have to worry about the
quality of the random numbers, and it is still slow.  dieharder doesn't
reveal in over flaws with the stream, but slow is in and of itself a
good reason not to use it for much.

The solution for nearly anyone needing large numbers of fast, high
quality random numbers is going to be:  Use a fast, high quality random
number generator from e.g. the Gnu Scientific Library, and >>seed<< it
from /dev/random, ensuring uniqueness of the seed(s) across the cluster
by any one of several means (storing used seeds in a shared table and
avoiding them in case you have a "birthdays" collision out of the four
billion 32 bit seeds /dev/random can produce).  The table is small, the
lookup occurs outside of the main loop as there is no good reason to
reseed a good rng in the middle of a computation (and for many
generators, there is at least one good reason not to).

So, what are the good generators?  Mersenne Twisters are good
(mt19937_1999, for example). Good and fast.  gfsr4 is pretty good,
pretty fast.  taus/taus2 are both good and fast.  ranlxd2 is good, but
not as fast.  KISS (not yet in the GSL, but I have a GSL-compatible
implementation if anybody needs it in the current dieharder that
eventually I'll send in to Brian for the GSL) is very good and very fast
(fastest).  Any of these, seeded from /dev/random (and yes, dieharder
has code to do that too) will work just fine, if just fine is "the best
you can do given the state of the art".

You too can test generators for yourself (and learn a lot about random
number generators and null hypotheses and testing) with dieharder,
available here:

 http://www.phy.duke.edu/~rgb/General/dieharder.php

I do have a slightly more current version with kiss, "superkiss" (a
mixture of kiss generators) and a straight up supergenerator that lets
you use a mixture of GSL-based generators, each separately seeded, in a
frame that returns numbers out of a shuffle table.  If the generators it
uses are fast, it is still acceptably fast, and if nothing else it hides
and obscures any sort of flaws any single generator might have.

    rgb

After scanning a bunch of man pages (incidentally, "man urandom" explains
the difference between random and urandom; "man random" does not) and
experimenting to produce my reply (above), I found this when google pointed
into wiki (sheesh):
"It is also possible to write to /dev/random. This allows any user to mix
random data into the pool. Non-random data is harmless, because only a
privileged user can issue the ioctl needed to increase the entropy estimate.
The current amount of entropy and the size of the Linux kernel entropy pool
are available in /proc/sys/kernel/random/." 
( http://en.wikipedia.org/wiki//dev/random )

So, yes re Mark's:

".. or even just have a boot script that stirs in some unique per-node data?
 (say low-order rdtsc salted with the node's
MAC.) ..."

So (from the wiki) piping dumb data into /dev/random is harmless since the
entropy measure wouldn't be fooled, so then yes, just anyone can pipe some
bytes in anytime. So yeah, the rtdsc, I just meant my 9 digits of
nanoseconds as something easy to try at boot time, and shuffling that with
the MAC is a good idea. (Since a light-nanosecond is about what 10 cm? the
lengths of cables in the server room would be enough to give every node
different boot times, in nanoseconds, right? or no, because your cables are
all standard lengths, but coiled as needed?). So just sticking in a script
that flushes such stuff into /dev/random at boot time (or anytime) should be
practicable and easy.

Peter

On Sat, Feb 26, 2011 at 5:32 PM, Mark Hahn <[email protected]> wrote:
      > nodes using CentOS 5.  One of our users just ran onto a
      problem with
      > /dev/random blocking due to the lack of entropy.

/dev/random should be reserved for generating non-ephemeral keys.

> Do others have this problem?  What do you do?

I've only ever heard of it on servers that do a lot of ssl
transactions,
and were configured to use /dev/random for transient keys or nonces.

> Do you modify network drivers to introduce entropy?
> Are there other suggested methods of adding entropy to /dev/random?

good questions.  I haven't been following the state of kernel entropy
gathering - I guess I assumed that they'd worked out a basic set of
reasonable sources and had a (eg) /proc interface for enabling others
that would be site-specific (such as eth).

> Are there ways to introduce entropy from the random number generator
> on some Intel systems?  Did Intel remove this from more recent
chips?

appears so:
http://software.intel.com/en-us/forums/showthread.php?t=66236

> How reliable is /dev/urandom without initial entropy?  We boot from

my understanding is urandom is a crypto hash of the entropy pool:
if the entropy pool never changes or is perfectly guessable,
urandom is only as good as the hash.  since the crypto hash is not
broken in general, I'd consider it plenty good.

> stateless disk images and don't carry any entropy over from previous
> boots.

boot scripts normally save and restore entropy, so why can't they do
so
to/from some server of yours?  or even just have a boot script that
stirs
in some unique per-node data?  (say low-order rdtsc salted with the
node's
MAC.)

this is a good question - not a burning issue I think, but something
to
not forget about.  we're starting to use NFS-root login nodes, for
instance,
which could conceivably run into entropy issues.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[email protected]

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] /dev/random entropy on stateless/headless nodes

Reply via email to