When I refer to a quantity of information, I mean its algorithmic complexity, 
the size of the smallest program that generates it.  So yes, the Mandelbrot set 
contains very little information.  I realize that algorithmic complexity is not 
obtainable in general.  When I express AI or language modeling in terms of 
compression, I mean that the goal is to get as close to this unobtainable limit 
as possible.

Algorithmic complexity can apply to either finite or infinite series.  For 
example, the algorithmic complexity of a string of n zero bits is log n + C for 
some constant C that depends on your choice of universal Turing machine.  The 
complexity of an infinite string of zero bits is a (small) constant C.

When I talk about Kauffman's assertion that complex systems evolve toward the 
boundary between stability and chaos, I mean a discrete approximation of these 
concepts.  These are defined for dynamic systems in real vector spaces 
controlled by differential equations.  (Chaos requires at least 3 dimensions).  
A system is chaotic if its Lyapunov exponent is greater than 1, and stable if 
less than one.  Extensions to discrete systems have been described.  For 
example, the logistic map x := rx(1 - x), 0 < x < 1, goes from stable to 
chaotic as r grows from 0 to 4.  For discrete spaces, pseudo random number 
generators are simple examples of chaotic systems.  Kauffman studied chaos in 
large discrete systems (state machines with randomly connected logic gates) and 
found that the systems transition from stable to chaotic as the number of 
inputs per gate is increased from 2 to 3.  At the boundary, the number of 
discrete attractors (repeating cycles) is about the square root of the
 number of variables.  Kauffman noted that gene regulation can be modeled this 
way (gene combinations turn other genes on or off) and that the number of human 
cell types (254) is about the square root of the number of genes (he estimated 
100K, but actually 30K).  I noted (coincidentally?) that vocabulary size is 
about the square root of the size of a language model.

The significance of this to AI is that I believe it bounds the degree of 
interconnectedness of knowledge.  It cannot be so great that small updates to 
the AI result in large changes in behavior.  This places limits on what we can 
build.  For example, in a neural network with feedback loops, the weights would 
have to be kept small.

We should not confuse symbols with meaning.  A language model associates 
patterns of symbols with other patterns of symbols.  It is not grounded.  A 
model does not need vision to know that the sky is blue.  They are just words.  
I believe that an ungrounded model (plus a discourse model, which has a sense 
of time and who is speaking) can pass the Turing test.
 
I don't believe all of the conditions are in place for a hard takeoff yet.  You 
need:
1. Self replicating computers.
2. AI smart enough to write programs from natural language specifications.
3. Enough hardware on the Internet to support AGI.
4. Execute access.



1. Computer manufacturing depends heavily on computer automation but you still 
need humans to make it all work.

2. AI language models are now at the level of a toddler, able to recognize 
simple sentences of a few words, but they can already learn in hours or days 
what takes a human years.

3. I estimate an adult level language model will fit on a PC but it would take 
3 years to train it.  A massively parallel architecure like Google's MapReduce 
could do it in an hour, but it would require a high speed network.  A 
distributed implementation like GIMPS or SETI would not have enough 
interconnection speed to support a language model.  I think you need about a 
1Gb/s connection with low latency to distribute it over a few hundred PCs.

4. Execute access is one buffer overflow away.


-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----
From: Mike Dougherty <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Saturday, November 18, 2006 1:32:05 AM
Subject: Re: [agi] A question on the symbol-system hypothesis

I'm not sure I follow every twist in this thread.  No... I'm sure I don't 
follow every twist in this thread.

I have a question about this compression concept.  Compute the number of pixels 
required to graph the Mandelbrot set at whatever detail you feel to be a 
sufficient for the sake of example.  Now describe how this 'pattern' is 
compressed.  Of course the ideal compression is something like 6 bytes.  Show 
me a 6 byte jpg of a mandelbrot  :)


Is there a concept of compression of an infinite series?  Or was the term 
"bounding" being used to describe the attractor around which the values tends 
to fall?  chaotic attractor, statistical median, etc.  they seem to be 
describing the same tendency of human pattern recognition of different types of 
data.


Is a 'symbol' an idea, or a handle on an idea?  Does this support the mechanics 
of how concepts can be built from agreed-upon ideas to make a new token we can 
exchange in communication that represents the sum of the constituent ideas?   
If this symbol-building process is used to communicate ideas across a highly 
volatile link (from me to you) then how would these symbols be used by a single 
computation machine?  (Is that a hard takeoff situation, where the near zero 
latency turns into an exponential increase in symbol complexity per unit time?)


If you could provide some feedback as a reality check on these thoughts, I'd 
appreciate the clarification... thanks.


This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to