Wolfram on randomness and RNGs

Steve Schear Fri, 06 Sep 2002 13:59:24 -0700

Background
Stephen Wolfram's book, "A New Kind of Science," is nothing if not 
interesting.  This encyclopedia-sized volume traces how his fascination 
with cellular automata, beginning in the 1970s, led him to spend decades 
exploring the significance of complexity created from simple rules.


I hope the following will not be too wordy and generate interest in the 
cryptographic implications of his work.

Intrinsic Generation of Randomness
In the chapter "Mechanisms and Programs in Nature," pp 297 - 361, he 
presents his case that behavioral similarities between certain simple 
programs and systems in nature are no coincidence but reflect a deep 
correspondence.  In this section he explores three mechanisms for 
randomness: external input (noise) captured in so-called stochastic models, 
those related to initial conditions (e.g., chaos theory), and those based 
on the behavior of simple programs described in the book and which believes 
are the most common in nature.

Under the section "The Intrinsic Generation of Randomness" he presents 
evidence for his third mechanism in which no random input from the outside 
is needed, and in which the randomness is instead generated inside the 
systems themselves.

"When one says that something seems random, what one usually means in 
practice is that one cannot see any regularities in it. So when we say that 
a particular phenomenon in nature seems random, what we mean is that none 
of our standard methods of analysis have succeeded in finding regularities 
in it. To assess the randomness of a sequence produced by something like a 
cellular automaton, therefore, what we must do is to apply to it the same 
methods of analysis as we do to natural systems"

... some of these methods have been well codified in standard mathematics 
and statistics, while others are effectively implicit in our processes of 
visual and other perception. But the remarkable fact is that none of these 
methods seem to reveal any real regularities whatsoever in the rule 30 
cellular automaton sequence. And thus, so far as one can tell, this 
sequence is at least as random as anything we see in nature.

But is it truly random?

Over the past century or so, a variety of definitions of true randomness 
have been proposed. And according to most of these definitions, the 
sequence is indeed truly random. But there are a certain class of 
definitions which do not consider it truly random.

For these definitions are based on the notion of classifying as truly 
random only sequences which can never be generated by any simple procedure 
whatsoever. Yet starting with a simple initial condition and then applying 
a simple cellular automaton rule constitutes a simple procedure. And as a 
result, the center column of rule 30 cannot be considered truly random 
according to such definitions.

But while definitions of this type have a certain conceptual appeal, they 
are not likely to be useful in discussions of randomness in nature. For as 
we will see later in this book, it is almost certainly impossible for any 
natural process ever to generate a sequence which is guaranteed to be truly 
random according to such definitions.

For our purposes more useful definitions tend to concentrate not so much on 
whether there exists in principle a simple way to generate a particular 
sequence, but rather on whether such a way can realistically be recognized 
by applying various kinds of analysis to the sequence. And as discussed 
above, there is good evidence that the center column of rule 30 is indeed 
random according to all reasonable definitions of this kind.

So whether or not one chooses to say that the sequence is truly random, it 
is, as far as one can tell, at least random for all practical purposes. And 
in fact sequences closely related to it have been used very successfully as 
sources of randomness in practical computing.

For many years, most kinds of computer systems and languages have had 
facilities for generating what they usually call random numbers. And in 
Mathematica  ever since it was first released Random [integer] has 
generated 0's and 1's using exactly the rule 30 cellular automaton.

The way this works is that every time Random [Integer] is called, another 
step in the cellular automaton evolution is performed, and the value of the 
cell in the center is returned. But one difference from the picture two 
pages ago is that for practical reasons the pattern is not allowed to grow 
wider and wider forever. Instead, it is wrapped around in a region that is 
a few hundred cells wide.

One consequence of this, as discussed on page 259, is that the sequence of 
0's and 1's that is generated must then eventually repeat. But even with 
the fastest foreseeable computers, the actual period of repetition will 
typically be more than a billion billion times the age of the universe.

Another issue is that if one always ran the cellular automaton from page 
315 with the particular initial condition shown there, then one would 
always get exactly the same sequence of 0's and 1's. But by using different 
initial conditions one can get completely different sequences. And in 
practice if the initial conditions are not explicitly specified, what 
Mathematica does, for example, is to use as an initial condition a 
representation of various features of the exact state of the computer 
system at the time when Random was first called.

The rule 30 cellular automaton provides a particularly clear and good 
example of intrinsic randomness generation. But in previous chapters we 
have seen many other examples of systems that also intrinsically produce 
apparent randomness. And it turns out that one of these is related to the 
method used since the late 1940s for generating random numbers in almost 
all practical computer systems.

The pictures on the facing page show what happens if one successively 
multiplies a number by various constant factors, and then looks at the 
digit sequences of the numbers that result. As we first saw on page 119, 
the patterns of digits obtained in this way seem quite random. And the idea 
of so-called linear congruential random number generators is precisely to 
make use of this randomness.

For practical reasons, such generators typically keep only, say, the 
rightmost 31 digits in the numbers at each step. Yet even with this 
restriction, the sequences generated are random enough that at least until 
recently they were almost universally what was used as a source of 
randomness in practical computing.

So in a sense linear congruential generators are another example of the 
general phenomenon of intrinsic randomness generation. But it turns out 
that in some respects they are rather unusual and misleading.

Keeping only a limited number of digits at each step makes it inevitable 
that the sequences produced will eventually repeat. And one of the reasons 
for the popularity of linear congruential generators is that with fairly 
straightforward mathematical analysis it is possible to tell exactly what 
multiplication factors will maximize this repetition period.

It has then often been assumed that having maximal repetition period will 
somehow imply maximum randomness in all aspects of the sequence one gets. 
But in practice over the years, one after another linear congruential 
generator that has been constructed to have maximal repetition period has 
turned out to exhibit very substantial deviations from perfect randomness.

A typical kind of failure, illustrated in the pictures on the next page, is 
that points with coordinates determined by successive numbers from the 
generator turn out to be distributed in an embarrassingly regular way. At 
first, such failures might suggest that more complicated schemes must be 
needed if one is to get good randomness. And indeed with this thought in 
mind all sorts of elaborate combinations of linear congruential and other 
generators have been proposed. But although some aspects of the behavior of 
such systems can be made quite random, deviations from perfect randomness 
are still often found.

And seeing this one might conclude that it must be essentially impossible 
to produce good randomness with any kind of system that has reasonably 
simple rules. But the rule 30 cellular automaton that we discussed above 
demonstrates that in fact this is absolutely not the case. Indeed, the 
rules for this cellular automaton are in some respects much simpler than 
for even a rather basic linear congruential generator. Yet the sequences it 
produces seem perfectly random, and do not suffer from any of the problems 
that are typically found in linear congruential generators.

So why do linear congruential generators not produce better randomness? 
Ironically, the basic reason is also the reason for their popularity. The 
point is that unlike the rule 30 cellular automaton that we discussed 
above, linear congruential generators are readily amenable to detailed 
mathematical analysis. And as a result, it is possible for example to 
guarantee that a particular generator will indeed have a maximal repetition 
period.

Almost inevitably, however, having such a maximal period implies a certain 
regularity. And in fact, as we shall see later in this book, the very 
possibility of any detailed mathematical analysis tends to imply the 
presence of at least some deviations from perfect randomness.

But if one is not constrained by the need for such analysis, then as we saw 
in the cellular automaton example above, remarkably simple rules can 
successfully generate highly random behavior.

And indeed the existence of such simple rules is crucial in making it 
plausible that the general mechanism of intrinsic randomness generation can 
be widespread in nature. For if the only way for intrinsic randomness 
generation to occur was through very complicated sets of rules, then one 
would expect that this mechanism would be seen in practice only in a few 
very special cases.

But the fact that simple cellular automaton rules are sufficient to give 
rise to intrinsic randomness generation suggests that in reality it is 
rather easy for this mechanism to occur. And as a result, one can expect 
that the mechanism will be found often in nature.

So how does the occurrence of this mechanism compare to the previous two 
mechanisms for randomness that we have discussed?

The basic answer, I believe, is that whenever a large amount of randomness 
is produced in a short time, intrinsic randomness generation is 
overwhelmingly likely to be the mechanism responsible.

We saw in the previous section that random details of the initial 
conditions for a system can lead to a certain amount of randomness in the 
behavior of a system. But as we discussed, there is in most practical 
situations a limit on the lengths of sequences whose randomness can 
realistically be attributed to such a mechanism. With intrinsic randomness 
generation, however, there is no such limit: in the cellular automaton 
above, for example, all one need do to get a longer random sequence is to 
run the cellular automaton for more steps.

But it is also possible to get long random sequences by continual 
interaction with a random external environment, as in the first mechanism 
for randomness discussed in this chapter.

The issue with this mechanism, however, is that it can take a long time to 
get a given amount of goodquality randomness from it. And the point is that 
in most cases, intrinsic randomness generation can produce similar 
randomness in a much shorter time.

Indeed, in general, intrinsic randomness generation tends to be much more 
efficient than getting randomness from the environment. The basic reason is 
that intrinsic randomness generation in a sense puts all the components in 
a system to work in producing new randomness, while getting randomness from 
the environment does not.

Thus, for example, in the rule 30 cellular automaton discussed above, every 
cell in effect actively contributes to the randomness we see. But in a 
system that just amplifies randomness from the environment, none of the 
components inside the system itself ever contribute any new randomness at 
all. Indeed, ironically enough, the more components that are involved in 
the process of amplification, the slower it will typically be to get each 
new piece of random output. For as we discussed two sections ago, each 
component in a sense adds what one can consider to be more inertia to the 
amplification process.

But with a larger number of components it becomes progressively easier for 
randomness to be generated through intrinsic randomness generation. And 
indeed unless the underlying rules for the system somehow explicitly 
prevent it, it turns out in the end that intrinsic randomness generation 
will almost inevitably occuroften producing so much randomness that it 
completely swamps any randomness that might be produced from either of the 
other two mechanisms.

Yet having said this, one can ask how one can tell in an actual experiment 
on some particular system in nature to what extent intrinsic randomness 
generation is really tie mechanism responsible for whatever seemingly 
random behavior one observed.

The clearest sign is a somewhat unexpected phenomenon: that details of the 
random behavior can be repeatable from one run of the experiment to 
another. It is not surprising that general features of the behavior will be 
the same. But what is remarkable is that if intrinsic randomness generation 
is the mechanism at work, then the precise details of the behavior can also 
be repeatable.

In the mechanism where randomness comes from continual interaction with the 
environment, no repeatability can be expected. For every time the 
experiment is run, the state of the environment will be different, and so 
the behavior one sees will also be correspondingly different. And 
similarly, in the mechanism where randomness comes from the details of 
initial conditions, there will again be little, if any, repeatability. For 
the details of the initial conditions are typically affected by the 
environment of the system, and cannot realistically be kept the same from 
one run to another.

But the point is that with the mechanism of intrinsic randomness 
generation, there is no dependence on the environment. And as a result, so 
long as the setup of the system one is looking at remains the same, the 
behavior it produces will be exactly the same. Thus for example, however 
many times one runs a rule 30 cellular automaton, starting with a single 
black cell, the behavior one gets will always be exactly the same. And so 
for example the sequence of colors of the center cell, while seemingly 
random, will also be exactly the same."

Wolfram on randomness and RNGs

Reply via email to