computing with really simple machines

kragen Thu, 03 Nov 2005 00:37:36 -0800

Universal cellular automata
---------------------------

There's a rumor from Stephen Wolfram that certain one-dimensional
two-state three-cell-neighborhood cellular automata are
Turing-complete, in particular rule 110.  Here's a simple Python
script for running such 1-D CAs:


    #!/usr/bin/python
    import sys

    def display(line):
        sys.stdout.write(''.join([' #'[x] for x in line]) + '\n')

    def run(n=110, ng=1000, nc=400, display=display):
        cells = [0] * nc
        cells[nc/2] = 1
        for ii in range(ng):
            display(cells)
            newcells = [0] * len(cells)
            for jj in range(1, len(newcells)-1):
                index = 4 * cells[jj-1] + 2 * cells[jj] + cells[jj+1]
                newcells[jj] = n & (1 << index) and 1 or 0
            cells = newcells

    if __name__ == '__main__':
        run()

Supposing this rumor is correct, and rule 110 is indeed a universal
computer (which seems plausible even from the output from this
program), it's quite simple.  

Computing rule 110
------------------

The truth table, is 01101110, is as follows in a Karnaugh map:
      
          right (1's)      
          0    1           
                          
       center center  (2's)
        0  1  1  0   
                     
     1  0  1  0  1   
left                
(4's)0  0  1  1  1   


Hope I got that diagram right.  Here are some sum-of-products forms it
suggests:

~R & C  |  ~L & C  |  R & ~C     ( equivalently (R ^ C) | ~L & C )

~R & C  |  R & ~L  |  R & ~C     ( equivalently (R ^ C) | ~L & R )

I used the program at
http://lists.canonical.org/pipermail/kragen-hacks/2003-February/000365.html
(which takes the truth table backwards from the Python program and how
I've listed it above) and it finds a number of five-NAND-gate
solutions, the first of which is H = -(-(-(C*A)*B)*-(-(B*A)*C)):

          A: 00001111
          B: 00110011
          C: 01010101
   D=~(C*A): 11111010  (delay 1)
   E=~(D*B): 11001101  (delay 2)
   F=~(B*A): 11111100  (delay 1)
   G=~(F*C): 10101011  (delay 2)
   H=~(G*E): 01110110  (delay 3) (correct)

None of the five-NAND-gate solutions are shallower (i.e. faster) than
that three-propagation-delay item.

(I'm a little dubious about this because, when the truth table was
reversed, it needed 7 gates to find solutions, e.g. E = -(B*B); G =
-(E*-(A*A)); J = -(-(-(G*C)*G)*-(E*C)); so I wonder if the program is
buggy.)

A suggested embodiment: barely subcritical lasers
-------------------------------------------------

You could imagine that you might be able to find a physical system
that exhibits this precise behavior on streams of incoming encoded
bits.  For example, some funny-shaped laser.  It merely needs four
internal states, or three different inputs with slightly different
delay lengths.

Suppose you have a 10-meter spool of fiber-optic cable carrying
1-picosecond laser pulses, coming out of a smallish laser that's kept
very close to the population inversion threshold.  (It can't be too
much bigger than 0.3mm or it gets hard to generate such quick pulses.)
The spool acts as a delay-line memory containing about 33 kilobits of
information.  It feeds back into the laser in some unspecified way,
determining with its small energy input whether the laser is above or
below the population inversion threshold, in a way carefully designed
to produce the above truth-table on the last three bits fed into the
laser.  (It might be smart to encode bits in the timing of the pulse
rather than in its amplitude.  For example, as in the Internet-0
protocol, each bit-time could be divided into a zero half and a one
half, and a pulse could be present in either one half or the other.)

(This hypothetical computational machine takes advantage of an
often-lamented feature of current optical technology: communication
and storage are very easy, while logic and switching are very
difficult.)

So this hypothetical optical system is transforming bits at 1THz,
although it is in effect simulating 33000 machines running in parallel
at 30MHz each.

Femtosecond laser pulses have been demonstrated, so pumping up the
speed to a petahertz (and consequently the storage to 33 megabits)
isn't implausible.  A modern high-performance CPU has on the order of
100 million gates which change state on the order of 4 billion times a
second, so if it operated bit-serially, it would have to change bit
states at around 400 petahertz.  It's plausible that our hypothetical
femtosecond laser could perform useful computations either faster or
slower than a standard CPU.

I'm pretty sure someone else has done work using the chaotic behavior
of lasers close to the critical threshold for computation, but I can't
remember who it is.

Bucket brigades or pipelines provide linear scaling
---------------------------------------------------

If instead of routing the laser's output back to itself after some
delay, we routed it to another such laser, whose output was routed
back to the first, the bits would run through two generations on each
round trip.  This can be repeated arbitrarily, the limitations being
how many laser assemblies you can afford (with money and reliability)
and how close together you can put them.

Input and output
----------------

I/O is an interesting concern.  If the laser generates enough power
that we can spy on its memory state as it goes by, we can run it
through some other finite-state machine to look for certain patterns
that indicate "data to be output", and similarly "data to be input".
I don't have any experience programming useful computation on such a
machine at present, so I don't know what form this would take.

In an N-bucket bucket-brigade configuration, presumably the
opportunity for I/O would only happen in one place in the brigade, so
only one generation out of every N would have the opportunity to
participate in I/O.

Nuclear reactions as CA computational elements
----------------------------------------------

Nuclear reactions are another possible computational element that
could be used in this way.  If you can build some substance that has
four distinct states among its nuclei and that can be persuaded to,
for example, fission and emit more particles (ideally alpha particles
or other charged particles) when bombarded with incoming particles in
the correct sequence, you could store your machine state in a series
of pulses in a particle beam.

I don't know how fast nuclear reactions happen, although I've seen
abstracts (such as http://www.iop.org/EJ/abstract/0954-3899/23/10/024
"Near- and sub-barrier fission fragment anisotropies and the failure
of the statistical theory of fission decay rates", J P Lestone et al
1997 J. Phys. G: Nucl. Part. Phys. 23 1349-1357) suggesting they're on
the order of 10^-20 seconds; chemical reactions are on the order of
femtoseconds (10^-15 seconds) already, and I assume some nuclear
reactions are even faster, since they involve the same mass moving
much less distance.  If a single nuclear-powered cellular-automaton
engine could process 10^20 cells per second, it would process 10 000
times as many bits per second as a conventional semiconductor-based
machine; and there's nothing preventing 100 or 1000 of them from being
placed in a bucket brigade.  However, I suspect that each element
would have to be no more than 10^-20 seconds in diameter to avoid
fuzzing out the signal, which is around 3 picometers --- smaller than
most atomic diameters.

Current X-ray lithography techniques produce features of 45 nanometers
in size; that's only 1.5 * 10^-16 seconds across at the speed of
light, and there are existing lasers that produce pulses around 10^-15
to 10^-14 seconds.  So it might be practically difficult to achieve
these theoretical speed advantages of computing with nuclear
reactions.

Simulating 2-D CAs with 1-D CAs
-------------------------------

If we scan across a current state of a 2-D CA in a raster fashion,
feeding each cell state into such a 1-D CA as an additional input to
its state machine, we can perhaps program the 1-D CA to store the
current state of a couple of rows of the 2-D CA, and produce new
states of the 2-D CA at intervals, much as the Cytocomputer did.

Related work
------------

A lot of this posting recapitulates my "really simple computers"
kragen-tol posting from February 2003:
http://lists.canonical.org/pipermail/kragen-tol/2003-February/000738.html

The Cytocomputer, by the Environmental Research Institute of Michigan,
used a technique related to this for machine vision with
two-dimensional cellular automata, using a pipeline of ASICs, each of
which contained two rows of the image plus three cells.  Each clock
cycle, each stage would output the 8-bit state of the cell currently
being computed, discard the upper-left-hand corner of that cell's
9-cell neighborhood, and read the lower-right-hand corner of the next
cell's neighborhood from the previous stage.  See (expired) US Patents
4,665,554 and 4,665,551, by Stanley R. Sternberg, and related patents
for details.

There's an unpublished paper showing that an arbitrary number of delay
elements and an OR gate comprise a universal computer.  (In my
previous article, I wrote: Miles Murdocca, at Bell Labs, in an
unpublished paper, between 1983 and 1993, according to an article by
J. Storrs Hall in "Extropy", #10, vol. 4, no. 2, Winter/Spring 1993,
entitled, "Nanocomputers: 21st century hypercomputing."  I have a very
vague memory of corresponding with someone (perhaps Dr. Murdocca?)
afterwards on this, but unfortunately I cannot find any such
correspondence in my archives.  Probably bits and pieces of this work
are subconsciously embedded in the above.)

computing with really simple machines

Reply via email to