RE: [agi] IBM, Los Alamos scientists claim fastest computer

Ed Porter Fri, 13 Jun 2008 05:26:00 -0700

Matt,

Thank you for your reply.  For me it is very thought provoking.

-----Original Message-----
From: Matt Mahoney [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 12, 2008 7:23 PM
To: agi@v2.listbox.com
Subject: RE: [agi] IBM, Los Alamos scientists claim fastest computer

--- On Thu, 6/12/08, Ed Porter <[EMAIL PROTECTED]> wrote:

> I think processor to memory, and inter processor
> communications are currently far short

Your concern is over the added cost of implementing a sparsely connected
network, which slows memory access and requires more memory for
representation (e.g. pointers in addition to a weight matrix). We can
alleviate much of the problem by using connection locality.
[Ed Porter] this would certainly be true if it worked

The brain has about 10^11 neurons with 10^4 synapses per neuron. If we
divide this work among 10^6 processors, each representing 1 mm^3 of brain
tissue, then each processor must implement 10^5 neurons and 10^9 synapses.
By my earlier argument, there can be at most 10^6 external connection
assuming 1-2 micron nerve fiber diameter, 

[Ed Porter] -- Why couldn't each of the 10^6 fibers have multiple
connections along its length within the cm^3 (although it could be
represented as one row in the matrix, with individual connections
represented as elements in such a row)

so half of the connections must be local. This is true at any scale because
when you double the size of a cube, you increase the number of neurons by 8
but increase the number of external connections by 4. Thus, for any size
cube, half of the external connections are to neighboring cubes and half are
to more distant cubes.

[Ed Porter] -- I am getting lost here.  Why are half the connections local.
You implied there are 10^6 external connections in the cm^3, and 10^9
synapses, which are the connections.  Thus the 10^6 external connections you
mention are only 1/1000 of 10^9 total connections you mention in the cm^3,
not one half as you say.  I understand that there are likely to be as many
connections leaving the cube as going into it, which is related, but not
that same thing as saying half the connection in the cm^3 are external. 

[Ed Porter] -- It is true that the rate of change for each doubling in scale
of the ratio surface to volume remains 1/2, but the actual ratio of surface
to volume decreases by 1/2 at each such doubling of scale, meaning the ratio
actually DOES CHANGE with scaling, rather remaining constant, as indicated
above.

A 1 mm^3 cube can be implemented as a fully connected 10^5 by 10^5 matrix of
10^10 connections. This could be implemented as a 1.25 GB array of bits with
5% of bits set to 1 representing a connection. 

[Ed Porter] a synaps would have multiple weights, such as short term and
long term weights and they would each be more than one bit.  Plus some are
excitatory and others or inhibitory, so they would have differing signs. So
multiple bits, probably at least two bytes would be necessary per element in
the matrix.

[Ed Porter] -- Also you haven't explained how you efficiently do the
activation between cubes (I assume it would be by having a row for each
neuron that projects axons into the cube, and a column for each neuron that
projects a dendrite into it).  This could still be represented by the
matrix, but it would tend to increase its sparseness.

[Ed Porter] -- Learning changes in which dendrites and axons projected into
a cube would require changing the matrix, which is doable, but can make
things more complicated. Another issue is how many other cubes would each
cm^3 communicate with. Are we talking 10, 100, 10^3, 10^4, 10^5, or 10^6.
The number could have a significant impact on communication costs.

[Ed Porter] -- I don't think this system would be good for my current model
for AGI representation, which is based on a type of graph matching, rather
than just a simple summing of synaptic inputs.

The internal computation bottleneck is the vector product which would be
implemented using 128 bit AND instructions in SSE2 at full serial memory
bandwidth. External communication is at most one bit per connected neuron
every cycle (20-100 ms), because the connectivity graph does not change
rapidly. A randomly connected sparse network could be described compactly
using hash functions.

[Ed Porter] -- It is interesting to think that this actually could be used
to speed up the processing of simple neural models.  I understand how the
row values associated with the axon synapses of a given neuron could be read
rapidly in a serial manner.  And how run-length encoding, or some other
means could be used to more compactly represent a sparse matrix.  I also
understand how the contributions for to the activation for each of the 10^5
columns made by each row could be stored in L2 cache at a rate of about
100mHz. 

[Ed Porter] -- L2 Cache write commonly take about 10 to 20clock cycles. ---
perhaps you could write them into memory blocks in L1 cache, which might
only take about two instructions per write. --- perhaps with SSE2 it would
be even faster (I don't know how fast it is).  So I don't know if you would
have time to calculate all of the 10^9 connections associated with the
entire matrix, particulary at higher resolution than one bit per synapse,
plus do all the communication with other processors in 20 to 100 ms.  I
don't know how much processing would be required to go from compressed
sparse representation to a format useable by SSE2.  

[Ed Porter] -- But more than one bit would be required to represent the
product of the activation of the activating neuron times the sum of the two
weigths of the synapse with the driven neuron.

Also, there are probably more efficient implementations of AGI than modeling
the brain because we are not constrained to use slow neurons. For example,
low level visual feature detection could be implemented serially by sliding
a coefficient window over a 2-D image rather than by maintaining sets of
identical weights for each different region of the image like the brain
does. I don't think we really need 10^15 bits to implement the 10^9 bits of
long term memory that Landauer says we have.

-- Matt Mahoney, [EMAIL PROTECTED]

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

<<attachment: winmail.dat>>

RE: [agi] IBM, Los Alamos scientists claim fastest computer

Reply via email to