Philip Rowlands <[EMAIL PROTECTED]> writes: > I'm still interested to read what Paul considers to be the > difficulties of such an implementation?
Suppose you're randomizing an input file of 10 million lines. And suppose you want to approximate a "truly random" key by using a 128-bit random key for each input line. Then you'll need about 1.3 billion random bits. On my desktop, /dev/random generates only about 200 random bits per second, and it'll take me about 2.5 months to randomize the file. If you're clever you can tune things so you need only about 0.22 billion random bits: this is because the log base 2 of (10 million factorial) is about 220 million (I used Stirling's approximation). But even with that optimization it'll still take me a couple of weeks to randomize the file. One way to be clever is to use the Knuth shuffle (a.k.a. the Fisher-Yates shuffle). However, this doesn't easily generalize to an algorithm that will work efficiently if the input is too large to fit into main memory. There are other ways to be clever even for large files, but I haven't had time to think them through. And don't forget: even if we're clever, it still takes me a couple of weeks to randomize a 10-million line file. For more on this subject, please see: http://en.wikipedia.org/wiki/Knuth_shuffle _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils