I am replying only to the AGI list because I was put on moderation for
trolling on the OpenCog list. Sorry, I guess I was.

On Wed, Apr 10, 2013 at 4:55 AM, Ben Goertzel <[email protected]> wrote:
> A single modern GPU card is enough to run a DeSTIN hierarchy
> processing a video feed with
> reasonably high resolution, in real time.   Or so it seems.  An open
> question is how many DeSTIN
> centroids are needed in each DeSTIN node, to achieve a high level of
> recognition ability.   The
> answer to this question will tell us how big an OpenCog Atomspace
> needs to be, in order to
> appropriately coordinate with DeSTIN...

My estimate is 10^6 to 10^7 for human level vision, but there are
probably some interesting problems that could be solved with a lot
less. For example, reading text, recognizing faces, or driving a car.
There are two ways we can estimate this number.

1. The number of objects we can visually recognize is greater than our
vocabulary because there are a lot of things that we can see but are
unable to convey in words. Human faces is an example.

2. The information content stored in a neural network (I realize that
DeSTIN is different) is on the order of 1 bit per connection.
Therefore it must be big enough to represent the salient knowledge in
the training data. We get about 10^7 bits per second from 10^6 optic
nerves over 10^8 or 10^9 seconds by age 3 or 30. It is not clear how
much of this we remember or need. If it is 1% then we would require
10^14 connections or at least 10^7 fully connected neurons.

A centroid in DeSTIN detects a feature, same as a neuron. In a neural
net, features can be learned without supervision by using lateral
inhibition to form winner-take-all networks. I don't think that the
clustering algorithm used in DeSTIN is greatly more or less efficient.

There are some optimizations we can do on fast sequential machines
that we can't do with slow neurons in parallel. For example, the lower
layers of the visual cortex contain arrays of feature detectors for
lines and edges that vary only in rotation and position relative to
the fovea. In a computer, this could be simulated by scanning a block
of filter coefficients over the image. This saves memory, but does not
save computation. Also, this trick would not work for the higher
layers where we detect more complex objects like printed words or
faces.

In DeSTIN and many neural architectures, the number of features
decreases as you go up the hierarchy. I don't think this is the case
with human level vision where you have to be able to detect millions
of different objects. For the easier problems I mentioned, the number
of features would be less. We note that bees can navigate in flight
with far smaller brains than we have.

Anyway, my suggestion is:
1. Devise specific tests and measurement criteria, for example
precision and recall on ImageNet, or reading text.
2. Estimate the computation required and decide if the goal is feasible.
3. Test the algorithm and publish results.

It seems that the last result produced by OpenCog (other than
commercial projects that nobody knows about) is intelligent game
characters in a simulated world several years ago. This tests none of
the hard problems in AI like language, vision, art, or robotics. All
of the work has been on software development, and none on basic
research. How do you know you are on the right path without ever doing
tests or experiments along the way?

--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to