Universal cellular automata --------------------------- There's a rumor from Stephen Wolfram that certain one-dimensional two-state three-cell-neighborhood cellular automata are Turing-complete, in particular rule 110. Here's a simple Python script for running such 1-D CAs:
#!/usr/bin/python import sys def display(line): sys.stdout.write(''.join([' #'[x] for x in line]) + '\n') def run(n=110, ng=1000, nc=400, display=display): cells = [0] * nc cells[nc/2] = 1 for ii in range(ng): display(cells) newcells = [0] * len(cells) for jj in range(1, len(newcells)-1): index = 4 * cells[jj-1] + 2 * cells[jj] + cells[jj+1] newcells[jj] = n & (1 << index) and 1 or 0 cells = newcells if __name__ == '__main__': run() Supposing this rumor is correct, and rule 110 is indeed a universal computer (which seems plausible even from the output from this program), it's quite simple. Computing rule 110 ------------------ The truth table, is 01101110, is as follows in a Karnaugh map: right (1's) 0 1 center center (2's) 0 1 1 0 1 0 1 0 1 left (4's)0 0 1 1 1 Hope I got that diagram right. Here are some sum-of-products forms it suggests: ~R & C | ~L & C | R & ~C ( equivalently (R ^ C) | ~L & C ) ~R & C | R & ~L | R & ~C ( equivalently (R ^ C) | ~L & R ) I used the program at http://lists.canonical.org/pipermail/kragen-hacks/2003-February/000365.html (which takes the truth table backwards from the Python program and how I've listed it above) and it finds a number of five-NAND-gate solutions, the first of which is H = -(-(-(C*A)*B)*-(-(B*A)*C)): A: 00001111 B: 00110011 C: 01010101 D=~(C*A): 11111010 (delay 1) E=~(D*B): 11001101 (delay 2) F=~(B*A): 11111100 (delay 1) G=~(F*C): 10101011 (delay 2) H=~(G*E): 01110110 (delay 3) (correct) None of the five-NAND-gate solutions are shallower (i.e. faster) than that three-propagation-delay item. (I'm a little dubious about this because, when the truth table was reversed, it needed 7 gates to find solutions, e.g. E = -(B*B); G = -(E*-(A*A)); J = -(-(-(G*C)*G)*-(E*C)); so I wonder if the program is buggy.) A suggested embodiment: barely subcritical lasers ------------------------------------------------- You could imagine that you might be able to find a physical system that exhibits this precise behavior on streams of incoming encoded bits. For example, some funny-shaped laser. It merely needs four internal states, or three different inputs with slightly different delay lengths. Suppose you have a 10-meter spool of fiber-optic cable carrying 1-picosecond laser pulses, coming out of a smallish laser that's kept very close to the population inversion threshold. (It can't be too much bigger than 0.3mm or it gets hard to generate such quick pulses.) The spool acts as a delay-line memory containing about 33 kilobits of information. It feeds back into the laser in some unspecified way, determining with its small energy input whether the laser is above or below the population inversion threshold, in a way carefully designed to produce the above truth-table on the last three bits fed into the laser. (It might be smart to encode bits in the timing of the pulse rather than in its amplitude. For example, as in the Internet-0 protocol, each bit-time could be divided into a zero half and a one half, and a pulse could be present in either one half or the other.) (This hypothetical computational machine takes advantage of an often-lamented feature of current optical technology: communication and storage are very easy, while logic and switching are very difficult.) So this hypothetical optical system is transforming bits at 1THz, although it is in effect simulating 33000 machines running in parallel at 30MHz each. Femtosecond laser pulses have been demonstrated, so pumping up the speed to a petahertz (and consequently the storage to 33 megabits) isn't implausible. A modern high-performance CPU has on the order of 100 million gates which change state on the order of 4 billion times a second, so if it operated bit-serially, it would have to change bit states at around 400 petahertz. It's plausible that our hypothetical femtosecond laser could perform useful computations either faster or slower than a standard CPU. I'm pretty sure someone else has done work using the chaotic behavior of lasers close to the critical threshold for computation, but I can't remember who it is. Bucket brigades or pipelines provide linear scaling --------------------------------------------------- If instead of routing the laser's output back to itself after some delay, we routed it to another such laser, whose output was routed back to the first, the bits would run through two generations on each round trip. This can be repeated arbitrarily, the limitations being how many laser assemblies you can afford (with money and reliability) and how close together you can put them. Input and output ---------------- I/O is an interesting concern. If the laser generates enough power that we can spy on its memory state as it goes by, we can run it through some other finite-state machine to look for certain patterns that indicate "data to be output", and similarly "data to be input". I don't have any experience programming useful computation on such a machine at present, so I don't know what form this would take. In an N-bucket bucket-brigade configuration, presumably the opportunity for I/O would only happen in one place in the brigade, so only one generation out of every N would have the opportunity to participate in I/O. Nuclear reactions as CA computational elements ---------------------------------------------- Nuclear reactions are another possible computational element that could be used in this way. If you can build some substance that has four distinct states among its nuclei and that can be persuaded to, for example, fission and emit more particles (ideally alpha particles or other charged particles) when bombarded with incoming particles in the correct sequence, you could store your machine state in a series of pulses in a particle beam. I don't know how fast nuclear reactions happen, although I've seen abstracts (such as http://www.iop.org/EJ/abstract/0954-3899/23/10/024 "Near- and sub-barrier fission fragment anisotropies and the failure of the statistical theory of fission decay rates", J P Lestone et al 1997 J. Phys. G: Nucl. Part. Phys. 23 1349-1357) suggesting they're on the order of 10^-20 seconds; chemical reactions are on the order of femtoseconds (10^-15 seconds) already, and I assume some nuclear reactions are even faster, since they involve the same mass moving much less distance. If a single nuclear-powered cellular-automaton engine could process 10^20 cells per second, it would process 10 000 times as many bits per second as a conventional semiconductor-based machine; and there's nothing preventing 100 or 1000 of them from being placed in a bucket brigade. However, I suspect that each element would have to be no more than 10^-20 seconds in diameter to avoid fuzzing out the signal, which is around 3 picometers --- smaller than most atomic diameters. Current X-ray lithography techniques produce features of 45 nanometers in size; that's only 1.5 * 10^-16 seconds across at the speed of light, and there are existing lasers that produce pulses around 10^-15 to 10^-14 seconds. So it might be practically difficult to achieve these theoretical speed advantages of computing with nuclear reactions. Simulating 2-D CAs with 1-D CAs ------------------------------- If we scan across a current state of a 2-D CA in a raster fashion, feeding each cell state into such a 1-D CA as an additional input to its state machine, we can perhaps program the 1-D CA to store the current state of a couple of rows of the 2-D CA, and produce new states of the 2-D CA at intervals, much as the Cytocomputer did. Related work ------------ A lot of this posting recapitulates my "really simple computers" kragen-tol posting from February 2003: http://lists.canonical.org/pipermail/kragen-tol/2003-February/000738.html The Cytocomputer, by the Environmental Research Institute of Michigan, used a technique related to this for machine vision with two-dimensional cellular automata, using a pipeline of ASICs, each of which contained two rows of the image plus three cells. Each clock cycle, each stage would output the 8-bit state of the cell currently being computed, discard the upper-left-hand corner of that cell's 9-cell neighborhood, and read the lower-right-hand corner of the next cell's neighborhood from the previous stage. See (expired) US Patents 4,665,554 and 4,665,551, by Stanley R. Sternberg, and related patents for details. There's an unpublished paper showing that an arbitrary number of delay elements and an OR gate comprise a universal computer. (In my previous article, I wrote: Miles Murdocca, at Bell Labs, in an unpublished paper, between 1983 and 1993, according to an article by J. Storrs Hall in "Extropy", #10, vol. 4, no. 2, Winter/Spring 1993, entitled, "Nanocomputers: 21st century hypercomputing." I have a very vague memory of corresponding with someone (perhaps Dr. Murdocca?) afterwards on this, but unfortunately I cannot find any such correspondence in my archives. Probably bits and pieces of this work are subconsciously embedded in the above.)