On Thu, 25 Aug 2011, Vincent Diepeveen wrote: > I noticed that most generated semi-random numbers with software generators, > had the habit to truely adress a search space of n always in O (n log n). > > So if you draw from most software RNG's a number and do it modulo n, > with n being not too tiny, say quite some millions or even billions, then > every > slot in your 'hashtable' will get hit at least once by the RNG, whereas data > in reality simply happens to not have that habit simply. > > So true random numbers versus generated noise is in this manner easy > to distinguish by this. Now i didn't study literature whether some other chap > some long time ago already had invented this. That would be most interesting > to know.
Some other chap named George Marsaglia (and to some extent another chap named Donald Knuth) have already invented this. A number of tests of the tails of random number generators are already in dieharder. All "good" modern rngs pass these tests. The Martingale betting system you are looking at is even older (at least Marsaglia and Knuth are still alive). It dates back to the 18th century, and is well known to be flawed for a variety of reasons, not the least of which is that gamblers don't have the infinite wealth necessary to make this >>even<< a zero-sum strategy and casinos have betting limits that de facto make it impossible to pursue the requisite number of steps and in roulette in particular have 0 and/or 00 slots and aren't zero-sum to begin with. You can read a decent analysis of outcomes based on the presumed binomial distribution of a zero-sum game here: http://en.wikipedia.org/wiki/Martingale_%28betting_system%29 Your test below is interesting, though. The only real problems I can see with actually using it in dieharder are: a) One would need a theoretical estimate of the distribution of filling given n log n draws on an n-slotted table (for largish n). That is, for a perfect rng, what SHOULD the distribution of success/failure be. b) One would then need the CDF for this distribution, to be able to turn the results of N trials (of n log n pulls each) into a p-value under the null hypothesis -- the probability of obtaining the particular number of successes and failures presuming a perfectly random generator. That way dieharder could apply it rigorously to its 70 or 80 embedded rngs or to any user's outboard generator. There probably is theoretical statistical support for the PD and/or CDF -- you're analyzing the tails of a poissonian process -- but finding it or doing it yourself (or myself), aye, that's the rub. One cannot just say "high degree of certainty that it is an RNG" (by which one means that the rng in question fails the test for randomness) in the test. HOW high? Perfect rngs or perfectly random processes will sometimes fill your table, but how often? How can you differentiate an "accident" when one does from an actual failure? All of those questions require a more rigorous theory and quantitative result embedded in a test that can be systematically cranked up to more clearly resolve failures until they are unambiguous, not marginal maybe yes maybe no. I suspect that the failures this test would reveal are already more than covered in dieharder, in particular by the bit distribution tests and the monkey tests, but I'm not terribly happy with the monkey tests and would be perfectly thrilled to have a simpler to compute test that revealed precisely this sort of flaw, systematically. And it doesn't hurt at all to have partially or fully redundant tests as long as the test themselves are rigorously valid. If you can find or compute the CDF for your test below, I'd be happy to wrap it up and add it to dieharder, in other words. One can always SIMULATE a CDF, of course, but that requires a known good generator and sort of begs the question if you don't think that e.g. AES or threefish or KISS are good generators that would actually pass your test. Even hardware/quantum sources of random bits are suspect -- they often are generated by a process that leaves in the traces of an underlying distribution. I'm not convinced that >>any<< process in the real world is >>truly<< random. Physics is ambiguous on the issue -- the quantum description of a closed system is just as deterministic as the classical one, and Master equation unpredictability on open subsets of a large closed system reflects entropy/ignorance, not actual randomness (hence Einstein's famous "doesn't play dice" remark). But lots of this are sufficiently random that one cannot detect any failure of randomness, modern crypto class generators being a prime example. rgb > > In semi pseudo code, let's take an array of size a billion as an example, > though usually a few million is more than ok: > > n = 2^30; // 2 to the power 30 > > Function TestNumbersForRandomness(RNG,n) { > declare array hashtable[size n]; > > guessednlogn = 2 * (log n / log 2) * n; > > for( i = 0 ; i < n ; i++ ) > hashtable[i] = FALSE; > > ndraws = filledn = 0; > while( ndraws < guessednlogn ) { > randomnumber = RNG(); > r = randomnumber % n; // randomnumber = r (mod n) > if( hashtable[r] == FALSE ) { > hashtable[r] = TRUE; > filledn++; > if( filledn >= n ) > break; > > } > ndraws++; > } > > if( filledn >= n ) > print "With high degree of certainty data generated by a RNG\n"); > else > print "Not so sure it's a RNG\n"; > > } > > > > > > Regards, > Vincent > > > > >> -- both unpredictable and >> flat/decorrelated at all orders, and even though there aren't really >> enough of them for my purposes, I've used them as one of the (small) >> "gold standard" sources for testing dieharder even as I test them. For >> all practical purposes threefish or aes are truly random as well and >> they are a lot faster and easier to use as gold standard generators, >> though. >> >> I don't quite understand why the single site restriction is important -- >> this site has been up for years and I don't expect it to go away soon; >> it is quite reliable. I don't think there is anything secret about how >> the numbers are generated, and I'll certify that the numbers it produces >> don't make dieharder unhappy. So 1 is fixable with a bit of effort on >> your part; 6 I don't really understand but the guy who runs the site is >> clearly willing to construct a custom feed for cash customers, if there >> is enough value in whatever it is you are trying to do to pay for >> access. If it's just a lottery, well, lord, I can think of a dozen ways >> to make numbers so random that they'd be unimpeachable for any sort of >> lottery, both unpredictable and uncorrelated, and they don't any of them >> require any significant amount of entropy to get started. >> >> I will add one warning -- "randomness" is a rather stringent >> mathematical criterion, and is generally tested against the null >> hypothesis. Amateurs who want to make random number generators out of >> supposedly "random" data streams or fancy algorithms almost invariably >> fail, sometimes spectacularly so. There are a half dozen or more >> really, really good pseudorandom number generators out there and it is >> easy to hotwire them together into an xor-based high entropy stream that >> basically never repeats (feeding it a bit of real entropy now and then >> as it operates). I would strongly counsel you against trying to take >> e.g. weather data and make something "random" out of it. Unless you >> really know what you are doing, you will probably make something that >> isn't at all random and may not even be unpredictable. Even most >> sources of "quantum" randomness (which is at least possibly "truly >> random", although I doubt it) aren't flat, so that they carry the >> signature of their generation process unless/until you manage to >> transform them into something flat (difficult unless you KNOW the >> distribution they are producing). Pseudorandom number generators have >> the serious advantage of being amenable to at least some theoretical >> analysis (so you can "guarantee" flatness out to some high >> dimensionality, say) as well as empirical testing with e.g. dieharder. >> >> HTH, >> >> rgb >> >>> >>> Thanks, >>> >>> David Mathog >>> [email protected] >>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> >> >> Robert G. Brown http://www.phy.duke.edu/~rgb/ >> Duke University Dept. of Physics, Box 90305 >> Durham, N.C. 27708-0305 >> Phone: 1-919-660-2567 Fax: 919-660-2525 email:[email protected] >> >> >> _______________________________________________ >> Beowulf mailing list, [email protected] sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[email protected] _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
