On Sat, 26 Feb 2011, Peter St. John wrote: I've been following this conversation halfheartedly because of midterm grades due last Friday, but I suppose I should chime in.
/dev/random is, as noted, an entropy generator. Usually it has enough entropy after a boot and some network activity to start generating random numbers. However, it is always >>enormously slow<< and can only generate a very few random numbers before it has to wait for more entropy to be generated. To run dieharder tests on it I had to do things like stroke the mouse pad as though it was a kitty's neck for an hour or so. You can actually get it to where you can watch it making new random numbers AS you do bursty things that it uses as part of the entropy pool. It is not, I repeat not, for production runs using very large numbers of rands unless you want to die of old age before an e.g. Monte Carlo run finishes. /dev/urandom is a mix of a software generator and /dev/random. Entropy based rands are used to "diffuse" entropy into a software generator, keeping it from being predictable (and giving it an infinite period, as it were). However, you then do have to worry about the quality of the random numbers, and it is still slow. dieharder doesn't reveal in over flaws with the stream, but slow is in and of itself a good reason not to use it for much. The solution for nearly anyone needing large numbers of fast, high quality random numbers is going to be: Use a fast, high quality random number generator from e.g. the Gnu Scientific Library, and >>seed<< it from /dev/random, ensuring uniqueness of the seed(s) across the cluster by any one of several means (storing used seeds in a shared table and avoiding them in case you have a "birthdays" collision out of the four billion 32 bit seeds /dev/random can produce). The table is small, the lookup occurs outside of the main loop as there is no good reason to reseed a good rng in the middle of a computation (and for many generators, there is at least one good reason not to). So, what are the good generators? Mersenne Twisters are good (mt19937_1999, for example). Good and fast. gfsr4 is pretty good, pretty fast. taus/taus2 are both good and fast. ranlxd2 is good, but not as fast. KISS (not yet in the GSL, but I have a GSL-compatible implementation if anybody needs it in the current dieharder that eventually I'll send in to Brian for the GSL) is very good and very fast (fastest). Any of these, seeded from /dev/random (and yes, dieharder has code to do that too) will work just fine, if just fine is "the best you can do given the state of the art". You too can test generators for yourself (and learn a lot about random number generators and null hypotheses and testing) with dieharder, available here: http://www.phy.duke.edu/~rgb/General/dieharder.php I do have a slightly more current version with kiss, "superkiss" (a mixture of kiss generators) and a straight up supergenerator that lets you use a mixture of GSL-based generators, each separately seeded, in a frame that returns numbers out of a shuffle table. If the generators it uses are fast, it is still acceptably fast, and if nothing else it hides and obscures any sort of flaws any single generator might have. rgb
After scanning a bunch of man pages (incidentally, "man urandom" explains the difference between random and urandom; "man random" does not) and experimenting to produce my reply (above), I found this when google pointed into wiki (sheesh): "It is also possible to write to /dev/random. This allows any user to mix random data into the pool. Non-random data is harmless, because only a privileged user can issue the ioctl needed to increase the entropy estimate. The current amount of entropy and the size of the Linux kernel entropy pool are available in /proc/sys/kernel/random/." ( http://en.wikipedia.org/wiki//dev/random ) So, yes re Mark's: ".. or even just have a boot script that stirs in some unique per-node data? (say low-order rdtsc salted with the node's MAC.) ..." So (from the wiki) piping dumb data into /dev/random is harmless since the entropy measure wouldn't be fooled, so then yes, just anyone can pipe some bytes in anytime. So yeah, the rtdsc, I just meant my 9 digits of nanoseconds as something easy to try at boot time, and shuffling that with the MAC is a good idea. (Since a light-nanosecond is about what 10 cm? the lengths of cables in the server room would be enough to give every node different boot times, in nanoseconds, right? or no, because your cables are all standard lengths, but coiled as needed?). So just sticking in a script that flushes such stuff into /dev/random at boot time (or anytime) should be practicable and easy. Peter On Sat, Feb 26, 2011 at 5:32 PM, Mark Hahn <[email protected]> wrote: > nodes using CentOS 5. One of our users just ran onto a problem with > /dev/random blocking due to the lack of entropy. /dev/random should be reserved for generating non-ephemeral keys. > Do others have this problem? What do you do? I've only ever heard of it on servers that do a lot of ssl transactions, and were configured to use /dev/random for transient keys or nonces. > Do you modify network drivers to introduce entropy? > Are there other suggested methods of adding entropy to /dev/random? good questions. I haven't been following the state of kernel entropy gathering - I guess I assumed that they'd worked out a basic set of reasonable sources and had a (eg) /proc interface for enabling others that would be site-specific (such as eth). > Are there ways to introduce entropy from the random number generator > on some Intel systems? Did Intel remove this from more recent chips? appears so: http://software.intel.com/en-us/forums/showthread.php?t=66236 > How reliable is /dev/urandom without initial entropy? We boot from my understanding is urandom is a crypto hash of the entropy pool: if the entropy pool never changes or is perfectly guessable, urandom is only as good as the hash. since the crypto hash is not broken in general, I'd consider it plenty good. > stateless disk images and don't carry any entropy over from previous > boots. boot scripts normally save and restore entropy, so why can't they do so to/from some server of yours? or even just have a boot script that stirs in some unique per-node data? (say low-order rdtsc salted with the node's MAC.) this is a good question - not a burning issue I think, but something to not forget about. we're starting to use NFS-root login nodes, for instance, which could conceivably run into entropy issues. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[email protected]
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
